Saturday, August 14, 2010

Don't Rely on Environment.ProcessorCount

One of the most hidden knowledge in multithreaded programming is the question "How many threads I should use in my application to achieve the best performance from my available hardware?". The answer to this question may vary since it depends on the characteristics of the application. Whether it's CPU bound, or IO bound? How much of the work is being paralleled? and so on... At the bottom line, all of these "formulas" are based upon the number of available processors.

Usually when an application runs it's initialization routine and gets ready to create a collection of worker threads, it will check how many processors are installed on the machine, so it could decide how many threads exactly it should create. To get the number of installed processors, you would usually call Environment.ProcessorCount. This property simply calls the Win32 GetSystemInfo function, and returns the dwNumberOfProcessors field to the caller. The problem with this value is that it doesn't necessarily represent the number of processors available to our process. In a certain scenario, the user that launched our application might have decided to give it a certain Processor Affinity, that will cause our application's threads to execute only on the processors set by the given affinity, instead of all the installed processors. The problem in this case is that the application will completely ignore the processor affinity, and create a larger amount of threads than it should (in an even worse case, it could assign the threads with affinities set to processors that aren't even available to the process).
What will eventually happen is that we'll have too many threads running on our subset of available processors. Causing us to suffer from a performance penalty caused by excessive context switching.
The scenario in which we set our process to run on only a subset of processors is far from being far-fetched since in situations where we have a couple of applications that are designed to take advantage of all of the available processors, and are meant to run in parallel on the same machine (without using some kind of VMware), then we would probably like to split our installed processors in half, giving each process only a set or processors to use. If the applications won't be aware to this subset, then we'll find ourselves wasting valuable CPU cycles, for no good reason.

In .Net, you could get the processor affinity by using the property Process.ProcessorAffinity. It will return a bitmask where each bit represents a logical CPU where our process can execute (in case the mask is set to 0, the scheduler will decide which processors our application will use. So basically, all of the processors are available).
Current versions of Windows provide support for more than 64 logical CPUs. In order to address all of those CPUs, they are divided into Groups, where each group can address up to 64 processors. So when looking at the processor affinity bitmask, you should be aware that it belongs only to a specific group (by default, only a single group is used).
So next time you're checking how many processors are available to your process, remember to count the number of lit bits in your processor affinity bitmask.

 static void PrintAffinitizedProcessors()
 {
     // gets the number of affinitized proccesors in the
     // current processor group (up to 64 logical processors)
     Process currentProcess = Process.GetCurrentProcess();
     long affinityMask = (long)currentProcess.ProcessorAffinity;
 
     if (affinityMask == 0)
         affinityMask = (long)Math.Pow(Environment.ProcessorCount, 2) - 1;
 
     const int BITS_IN_BYTE = 8;
     int numberOfBits = IntPtr.Size * BITS_IN_BYTE;
 
     int counter = 0;
 
     for (int i = 0; i < numberOfBits; i++)
     {
         if ((affinityMask >> i & 1) == 1)
         {
             Console.WriteLine(i);
             counter++;
         }
     }
 
     Console.WriteLine("Total: " + counter);
 }

No comments:

Post a Comment