General Discussion | Performance of OS's and architectures

Author	Message	Time
Adron	Came across this link: http://john.redmood.com/osfastest.html I wonder how accurate that is - does anyone have more links to comparisons of multithreaded vs asynchronous network access?	June 11, 2003, 3:12 PM
Raven	People tend not to compare the two because the approaches are just so different.	June 11, 2003, 4:09 PM
herzog_zwei	Nice article. I'm a bit surprised that Linux did so much better than BSD, especially against the 2.2 Linux kernel. I'm a bit disappointed that they did all that work but didn't specify how many CPUs each machine tested had, but I get the impression that it was only a single CPU. I think the reason you don't see many comparisons of it is due to: asynchronous I/O not being portable (or even non-existant) across different OSes, one task per thread being easier to code for, and keeping an important part of their implementatoin a secret so their company has a scalability edge over their competition. You can make a portable "asynchronous" I/O user-space library to deal with non-portable/non-existant AI/O APIs, but not everyone needs portability nor would want to spend time making it when you could just spawn a new thread to handle the task. The article does affirm my idea of which architecture would be best for an MMOG network engine. I never did any extensive testing to confirm it but that article helped. The point I got from that article is that a(n I/O bound) server should be many tasks per thread to scale, not that it needs an asynchronous design to scale. Also, you do not need to use non-blocking I/O to design a decent user-space AI/O (though you do miss out on being able to do certain things asynchronously like connect). Technically, it's still synchronous at the user-space level, but code-wise, it's mostly asynchronous so you can think of it as thus. IMO, a program designed with kernel based AI/O that handled all tasks in one thread wouldn't be as efficient as a program that had multiple threads handling multiple tasks each using faked AI/O. The reason why many tasks per thread would be a good design is due to context switching. For each new connection, you need to perform (extra and expensive) context switches for each thread. If you change it so each thread handles multiple connects per pass instead of just a single one, you can cut down on some of those context switches. If you have multiple CPUs, having more threads (with each handling multiple requests) would (generally) improve performance , whereas having more threads on a single CPU can degrade performance. You might also want to limit how many tasks each thread can handle so it wouldn't degrade performance for algorithms that are O(tasks) or worse (which would be often if you do something like go through a list of tasks to see if an I/O occurred for that task). Another thing that might be interesting to compare is how a non-preemptive/user-space threading implementation (such as GNU PTh) would compare to a preemptive/kernel-space threading one in a single-task per thread environment and a multiple-tasks per thread one. My opinion is that NP threads would perform better for I/O bound servers due to the number of kernel based context switches it saves (though it still needs to perform user space context switches). The best would probably be a hybrid of both NP and P threading so it'd have the benefit of using NP threads (which are easier to program for) and it'd scale for multiple CPUs due to P threading.	June 12, 2003, 12:10 AM

Nice article. I'm a bit surprised that Linux did so much better than BSD, especially against the 2.2 Linux kernel. I'm a bit disappointed that they did all that work but didn't specify how many CPUs each machine tested had, but I get the impression that it was only a single CPU.

I think the reason you don't see many comparisons of it is due to: asynchronous I/O not being portable (or even non-existant) across different OSes, one task per thread being easier to code for, and keeping an important part of their implementatoin a secret so their company has a scalability edge over their competition. You can make a portable "asynchronous" I/O user-space library to deal with non-portable/non-existant AI/O APIs, but not everyone needs portability nor would want to spend time making it when you could just spawn a new thread to handle the task.

The article does affirm my idea of which architecture would be best for an MMOG network engine. I never did any extensive testing to confirm it but that article helped.

The point I got from that article is that a(n I/O bound) server should be many tasks per thread to scale, not that it needs an asynchronous design to scale. Also, you do not need to use non-blocking I/O to design a decent user-space AI/O (though you do miss out on being able to do certain things asynchronously like connect). Technically, it's still synchronous at the user-space level, but code-wise, it's mostly asynchronous so you can think of it as thus. IMO, a program designed with kernel based AI/O that handled all tasks in one thread wouldn't be as efficient as a program that had multiple threads handling multiple tasks each using faked AI/O.

The reason why many tasks per thread would be a good design is due to context switching. For each new connection, you need to perform (extra and expensive) context switches for each thread. If you change it so each thread handles multiple connects per pass instead of just a single one, you can cut down on some of those context switches. If you have multiple CPUs, having more threads (with each handling multiple requests) would (generally) improve performance , whereas having more threads on a single CPU can degrade performance. You might also want to limit how many tasks each thread can handle so it wouldn't degrade performance for algorithms that are O(tasks) or worse (which would be often if you do something like go through a list of tasks to see if an I/O occurred for that task).

Another thing that might be interesting to compare is how a non-preemptive/user-space threading implementation (such as GNU PTh) would compare to a preemptive/kernel-space threading one in a single-task per thread environment and a multiple-tasks per thread one. My opinion is that NP threads would perform better for I/O bound servers due to the number of kernel based context switches it saves (though it still needs to perform user space context switches). The best would probably be a hybrid of both NP and P threading so it'd have the benefit of using NP threads (which are easier to program for) and it'd scale for multiple CPUs due to P threading.

Valhalla Legends Forums Archive | General Discussion | Performance of OS's and architectures