Assembly Language (any cpu) | Networking with assembler?

Author	Message	Time
BlackLotus	I was wondering if it would be more efficent to use assembler-based networking than C sockets. I'm sure you can cap off almost any bandwidth with c sockets, but could there still be a use? and if so, could someone give me an example? Much appreciated.	August 11, 2004, 2:26 AM
iago	It would be the same. First of all, although it depends on the OS, you'll gnerally never have direct access to the hardware, so you'll hvae to go through the OS's wrappers (winsock or Berkeley Sockets or whatever). And that's almost exactly what C does, it calls the OS's function. Secondly, even if you could save a little time, the amount of time saved would be insignificant compared to the amount of time it takes for the information to travel over the wire to the remote computer and back.	August 11, 2004, 3:10 AM
Kp	I concur with iago; about the only thing you could that would grant any speed-up is that you could inline the OS transition. On most systems, the function your C code calls will end up invoking some instruction (such as sysenter on x86) to transfer control to the OS, which will carry out the actual copy and such. So, you could get a very minor performance gain if you inlined the body of the call, such that your code invoked sysenter directly. However, as iago said, the savings is insignificant and therefore not worth the effort, except as a learning exercise. :)	August 11, 2004, 3:15 AM
BlackLotus	Awesome and thanks a bunch	August 12, 2004, 1:24 AM
St0rm.iD	WHAT IF - we implemented a pure assembler/C, no-filehandle, user-mode, raw socket TCP stack? Can you say, incredible scalability?	August 12, 2004, 2:38 AM
Skywing	[quote author=$t0rm link=board=7;threadid=8118;start=0#msg75149 date=1092278322] WHAT IF - we implemented a pure assembler/C, no-filehandle, user-mode, raw socket TCP stack? Can you say, incredible scalability? [/quote] Raw sockets generally require some kind of file handle for the socket, so I don't think you will have any luck there. In any case, what would being user mode or not using file handles have to do with scalability?	August 12, 2004, 2:49 AM
St0rm.iD	I mean, one file handle for the raw socket, and the individual TCP connections are NOT done with file handles. That way, there's no overhead to the OS in terms of file handles.	August 12, 2004, 2:53 AM
Skywing	[quote author=$t0rm link=board=7;threadid=8118;start=0#msg75156 date=1092279189] I mean, one file handle for the raw socket, and the individual TCP connections are NOT done with file handles. That way, there's no overhead to the OS in terms of file handles. [/quote] I don't think that much of the overhead associated with most TCP stacks is directly related to file handles. What TCP stack are you talking about in particular, though, for comparison purposes?	August 12, 2004, 2:55 AM
St0rm.iD	Linux and Windows. FreeBSD cheats and doesn't count.	August 12, 2004, 2:58 AM
Skywing	For those interested, I wrote a simple test app for the NT TCP stack (though it requires Windows XP or Windows Server 2003, for lack of ConnectEx on Windows 2000) - you can grab it here.	August 12, 2004, 4:11 AM
Adron	[quote author=$t0rm link=board=7;threadid=8118;start=0#msg75156 date=1092279189] I mean, one file handle for the raw socket, and the individual TCP connections are NOT done with file handles. That way, there's no overhead to the OS in terms of file handles. [/quote] #1: The OS would RST your connections for you automagically. #2: All the tcp connection handling would be done in user mode, incurring a performance penalty over doing it in the kernel. #3: This might possibly scale to a larger number of active connections than the kernel supports, if you can write something more memory efficient than the kernel code. But with that number of connections, chances are you'll be limited by speed, not memory. You're supposed to actually do something with those connections too?	August 12, 2004, 9:49 AM
St0rm.iD	Would not doing it in the kernel incur overhead? Wouldn't they both run at the same internal clock speed?	August 12, 2004, 2:54 PM
Skywing	[quote author=$t0rm link=board=7;threadid=8118;start=0#msg75208 date=1092322446] Would not doing it in the kernel incur overhead? Wouldn't they both run at the same internal clock speed? [/quote] For one, you guarantee that at least one more context switch must occur if you do it in user mode than kernel mode.	August 12, 2004, 3:21 PM
St0rm.iD	<newbie> What's a context switch? </newbie>	August 12, 2004, 10:50 PM
Myndfyr	[quote author=$t0rm link=board=7;threadid=8118;start=0#msg75280 date=1092351031] <newbie> What's a context switch? </newbie> [/quote] Given the context of the discussion, I believe the context of the running code must switch from user mode to kernel mode. ;) Granted, I'm not sure.	August 13, 2004, 1:26 AM
iago	[quote author=MyndFyre link=board=7;threadid=8118;start=0#msg75293 date=1092360398] [quote author=$t0rm link=board=7;threadid=8118;start=0#msg75280 date=1092351031] <newbie> What's a context switch? </newbie> [/quote] Given the context of the discussion, I believe the context of the running code must switch from user mode to kernel mode. ;) Granted, I'm not sure. [/quote] Yes, that's correct. User mode is what most stuff runs in, and it's very restricted. Kernel mode is also called "supervisor mode" and has access to everything.	August 13, 2004, 6:57 AM
Skywing	The relevant part here would be that you would not need to do things like switch the process context (e.g. reload the page tables and flush tlbs, and so on) to handle a TCP/IP message in kernel mode. You might see something like this: Network card -> driver ISR -> DPC -> [intermediate NDIS layers] -> tcpip.sys If you were doing this in user mode, it would look more like: Network card -> driver ISR -> DPC -> [intermediate NDIS layers] -> tcpip.sys (yes, for raw sockets) -> afd.sys -> queue a completion notification to waiting user thread -> (thread dispatcher, sometime in the future selects user thread to run, which can now handle incoming data on the raw socket). Of course, then you also have to route TCP message to whoever was actually associated with that socket, which is probably going to be in a different process, so you have to wait for the dispatcher to then run the sleeping application thread before the TCP-utilizing application can receive it's data. If you were handling TCP in kernel mode, you might be able to go to the right user mode process directly instead of adding a secondary process and thread that has to be woken to handle the TCP protocol itself.	August 14, 2004, 6:03 AM

Author

Message

Time

BlackLotus

I was wondering if it would be more efficent to use assembler-based networking than C sockets. I'm sure you can cap off almost any bandwidth with c sockets, but could there still be a use? and if so, could someone give me an example? Much appreciated.

August 11, 2004, 2:26 AM

iago

It would be the same.

First of all, although it depends on the OS, you'll gnerally never have direct access to the hardware, so you'll hvae to go through the OS's wrappers (winsock or Berkeley Sockets or whatever). And that's almost exactly what C does, it calls the OS's function.

Secondly, even if you could save a little time, the amount of time saved would be insignificant compared to the amount of time it takes for the information to travel over the wire to the remote computer and back.

August 11, 2004, 3:10 AM

I concur with iago; about the only thing you could that would grant any speed-up is that you could inline the OS transition. On most systems, the function your C code calls will end up invoking some instruction (such as sysenter on x86) to transfer control to the OS, which will carry out the actual copy and such. So, you could get a very minor performance gain if you inlined the body of the call, such that your code invoked sysenter directly. However, as iago said, the savings is insignificant and therefore not worth the effort, except as a learning exercise. :)

August 11, 2004, 3:15 AM

BlackLotus

Awesome and thanks a bunch

August 12, 2004, 1:24 AM

St0rm.iD

WHAT IF - we implemented a pure assembler/C, no-filehandle, user-mode, raw socket TCP stack? Can you say, incredible scalability?

August 12, 2004, 2:38 AM

Skywing

[quote author=$t0rm link=board=7;threadid=8118;start=0#msg75149 date=1092278322]
WHAT IF - we implemented a pure assembler/C, no-filehandle, user-mode, raw socket TCP stack? Can you say, incredible scalability?
[/quote]
Raw sockets generally require some kind of file handle for the socket, so I don't think you will have any luck there.

In any case, what would being user mode or not using file handles have to do with scalability?

August 12, 2004, 2:49 AM

St0rm.iD

I mean, one file handle for the raw socket, and the individual TCP connections are NOT done with file handles. That way, there's no overhead to the OS in terms of file handles.

August 12, 2004, 2:53 AM

Skywing

August 12, 2004, 2:55 AM

St0rm.iD

Linux and Windows. FreeBSD cheats and doesn't count.

August 12, 2004, 2:58 AM

Skywing

For those interested, I wrote a simple test app for the NT TCP stack (though it requires Windows XP or Windows Server 2003, for lack of ConnectEx on Windows 2000) - you can grab it here.

August 12, 2004, 4:11 AM

Adron

[quote author=$t0rm link=board=7;threadid=8118;start=0#msg75156 date=1092279189]
I mean, one file handle for the raw socket, and the individual TCP connections are NOT done with file handles. That way, there's no overhead to the OS in terms of file handles.
[/quote]

#1: The OS would RST your connections for you automagically.

#2: All the tcp connection handling would be done in user mode, incurring a performance penalty over doing it in the kernel.

#3: This might possibly scale to a larger number of active connections than the kernel supports, if you can write something more memory efficient than the kernel code. But with that number of connections, chances are you'll be limited by speed, not memory. You're supposed to actually do something with those connections too?

August 12, 2004, 9:49 AM

St0rm.iD

Would not doing it in the kernel incur overhead? Wouldn't they both run at the same internal clock speed?

August 12, 2004, 2:54 PM

Skywing

[quote author=$t0rm link=board=7;threadid=8118;start=0#msg75208 date=1092322446]
Would not doing it in the kernel incur overhead? Wouldn't they both run at the same internal clock speed?
[/quote]
For one, you guarantee that at least one more context switch must occur if you do it in user mode than kernel mode.

August 12, 2004, 3:21 PM

St0rm.iD

<newbie>
What's a context switch?
</newbie>

August 12, 2004, 10:50 PM

Myndfyr

[quote author=$t0rm link=board=7;threadid=8118;start=0#msg75280 date=1092351031]
<newbie>
What's a context switch?
</newbie>
[/quote]

Given the context of the discussion, I believe the context of the running code must switch from user mode to kernel mode. ;) Granted, I'm not sure.

August 13, 2004, 1:26 AM

iago

[quote author=MyndFyre link=board=7;threadid=8118;start=0#msg75293 date=1092360398]
[quote author=$t0rm link=board=7;threadid=8118;start=0#msg75280 date=1092351031]
<newbie>
What's a context switch?
</newbie>
[/quote]

Given the context of the discussion, I believe the context of the running code must switch from user mode to kernel mode. ;) Granted, I'm not sure.
[/quote]

Yes, that's correct. User mode is what most stuff runs in, and it's very restricted. Kernel mode is also called "supervisor mode" and has access to everything.

August 13, 2004, 6:57 AM

Skywing

The relevant part here would be that you would not need to do things like switch the process context (e.g. reload the page tables and flush tlbs, and so on) to handle a TCP/IP message in kernel mode.

You might see something like this:
Network card -> driver ISR -> DPC -> [intermediate NDIS layers] -> tcpip.sys

If you were doing this in user mode, it would look more like:
Network card -> driver ISR -> DPC -> [intermediate NDIS layers] -> tcpip.sys (yes, for raw sockets) -> afd.sys -> queue a completion notification to waiting user thread -> (thread dispatcher, sometime in the future selects user thread to run, which can now handle incoming data on the raw socket).

Of course, then you also have to route TCP message to whoever was actually associated with that socket, which is probably going to be in a *different* process, so you have to wait for the dispatcher to then run the sleeping application thread before the TCP-utilizing application can receive it's data.

If you were handling TCP in kernel mode, you might be able to go to the right user mode process directly instead of adding a secondary process and thread that has to be woken to handle the TCP protocol itself.

August 14, 2004, 6:03 AM

Valhalla Legends Forums Archive | Assembly Language (any cpu) | Networking with assembler?