Tuesday, July 20, 2010

Monitor's Locking Primitive

Lately, a discussion in a C# user group raised the question "In which synchronization primitive does the CLR uses when I call Monitor.Enter?". Does a Mutex is being created by the OS? Maybe an Event? or perhaps it's a user-mode primitive such as CriticalSection? Apparently there's some vagueness in the subject, so in this post I will demonstrate how can we find the answer to the question using the tools available to us.
In general, the CLR's object synchronization ability is implemented by allocating a SyncBlock for every object that we attempt to lock. Looking the the object's header, the CLR can find the corresponding SyncBlock  object that belongs to that object. It's the SyncBlock's responsibility to synchronize the locking requests to that object.
One needs to remember that those synchronization abilities are a feature of the CLR, in the sense that they are implemented in the CLR, and not necessarily in (or using) the operating system's synchronization primitives. This matches the sense that "theoretically", a managed thread doesn't have to be based on an operating system's kernel thread. So basically, no one can guarantee that the CLR will always use one synchronization primitive or another. Today this isn't the case, and in the meanwhile it doesn't seem like things are going to change in the near or far future.

After a quick review of the documentation available to us in the MSDN, one can impressed as if there's no real documentation about the basic primitive being used. But since we are talking about a synchronization primitive, we can remember that IHostSyncManager interface that is exposed to us by the CLR's hosting ability. One of this interface's functionalities is the ability to replace the implementation of the synchronization primitive being used by the Monitor class. This ability is exposed by the method CreateMonitorEvent.
Even at this stage we may pay attention to what is being said under the remarks paragraph:
CreateMonitorEvent returns an IHostAutoEvent  that the CLR uses in its implementation of the managed System.Threading.Monitor type. This method mirrors the Win32 CreateEvent function, with a value of false specified for the bManualReset parameter.
Even though, the keyword here is "Mirrors" so there isn't a true guarantee about what is happening in the CLR's internal implementation. In order to verify the thick hint we've just received here, we are going to have to pull out the big guns, and use WinDbg.
In favor of the test, I wrote up an application that results in an endless contention:

static void Main()
      Thread t1 = new Thread(() => { lock ("A") { while (true);} });

      lock ("A") { while (true);}

After the application already runs in the background, we could launch WinDbg and attach the appropriate process.
After loading SOS, our first step will be to find the thread that lost the race for acquiring the lock. To do so, we will print the managed stacks of all of our threads, and when we'll find a "suitable" stack trace, we'll move to its context:

>~*e!clrstack // execute !clrstack on all of the threads 
OS Thread Id: 0xf80 (0) Child SP IP       Call Site
0012f3ec 030300fe ConsoleApplication1.Program.Main(System.String[]) [...]
0012f648 791421db [GCFrame: 0012f648]
OS Thread Id: 0xf4c (1)
Unable to walk the managed stack. The current thread is likely not a
managed thread. You can run !threads to get a list of managed threads in
the process
OS Thread Id: 0x840 (2)
Child SP IP       Call Site
02ccfe68 7c90e514 [DebuggerU2MCatchHandlerFrame: 02ccfe68]
OS Thread Id: 0xbe0 (3)
Child SP IP       Call Site
0313f67c 7c90e514 [GCFrame: 0313f67c]
0313f794 7c90e514 [GCFrame: 0313f794]
0313f7b0 7c90e514 [HelperMethodFrame_1OBJ: 0313f7b0] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
0313f808 79b2e0c4 System.Threading.Monitor. Enter(System.Object, Boolean ByRef)
0313f818 03030163 ConsoleApplication1.Program.< Main>b__3() [...]
0313f848 79b2ae5b System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
0313f858 79ab7ff4 System.Threading. ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
0313f87c 79ab7f34 System.Threading. ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
0313f898 79b2ade8 System.Threading.ThreadHelper. ThreadStart()
0313fabc 791421db [GCFrame: 0313fabc]
0313fd80 791421db [DebuggerU2MCatchHandlerFrame: 0313fd80]
OS Thread Id: 0xe60 (4)
Unable to walk the managed stack. The current thread is likely not a
managed thread. You can run !threads to get a list of managed threads in
the process

>~3s // change the current thread to 3

After finding the correct thread, we will have to check its native stack, so we'll use the kb commend that will also display the first 3 parameters that were passed to each method:

ChildEBP RetAddr  Args to Child             
0313f3c8 7c90df4a 7c809590 00000001 0313f3f4 ntdll!KiFastSystemCallRet
0313f3cc 7c809590 00000001 0313f3f4 00000001 ntdll!ZwWaitForMultipleObjects+0xc
0313f468 791f516a 00000001 001820bc 00000000 KERNEL32!WaitForMultipleObjectsEx+0x12c
0313f4cc 791f4f98 00000001 001820bc 00000000 clr!WaitForMultipleObjectsEx_SO_TOLERANT+0x56
0313f4ec 791f4dd8 00000001 001820bc 00000000 clr!Thread::DoAppropriateAptStateWait+0x4d
0313f580 791f4e99 00000001 001820bc 00000000 clr!Thread::DoAppropriateWaitWorker+0x17d
0313f5ec 791f4f17 00000001 001820bc 00000000 clr!Thread::DoAppropriateWait+0x60
0313f640 7919d409 ffffffff 00000001 00000000 clr!CLREvent::WaitEx+0x106
0313f654 792e0160 ffffffff 00000001 00000000 clr!CLREvent::Wait+0x19
0313f6e4 792e0256 001818a0 ffffffff 8079c412 clr!AwareLock::EnterEpilogHelper+0xa8
0313f724 792e029b 001818a0 001818a0 79142c0d clr!AwareLock::EnterEpilog+0x42
0313f744 792c7729 8079cb36 0313f830 00b3c368 clr!AwareLock::Enter+0x5f
0313f800 79b2e0c4 79161f8e 00941f02 0313f840 clr!JIT_MonReliableEnter_Portable+0x104
0313f840 79b2ae5b 00b3c3ec 01b3101c 0313f86c mscorlib_ni+0x2ae0c4
0313f850 79ab7ff4 00b3e010 00000000 00b3c3b8 mscorlib_ni+0x2aae5b
0313f86c 79ab7f34 00000000 00b3c3b8 00000000 mscorlib_ni+0x237ff4
0313f88c 79b2ade8 00b3c3b8 00000000 001818a0 mscorlib_ni+0x237f34
0313f8a4 791421db 000001a7 0313fae0 0313f930 mscorlib_ni+0x2aade8
0313f8b4 79164a2a 0313f980 00000000 0313f950 clr!CallDescrWorker+0x33
0313f930 79164bcc 0313f980 00000000 0313f950 clr!CallDescrWorkerWithHandler+0x8e

At this point we can already see that the last thing that the thread did before we've interrupted him, is to call to WaitForMultipleObjectsEx where the first parameter was 1 and the second is 0x001820BC. Having this information, we can understand that we are waiting on a single Handle object, since the first parameter specifies the affective size of the array that was passed as the second parameter. So all we've got left to do is to understand which object hides behind that Handle that was passed to the function.

>dp 0x001820BC 0x001820BC
001820bc  000006c8 // our handle's value
>!handle 000006c8 F // pass "F" as bitmask to display all of the relevant data
Handle 6c8
  Type             Event
  Attributes       0
  GrantedAccess    0x1f0003:
  HandleCount      2
  PointerCount     4
  Object Specific Information
    Event Type Auto Reset
    Event is Waiting

And so, this was our last step. We have confirmed that Monitor's synchronization primitive is in fact an Event object of an AutoReset type.
Whoever still wants to view the creation and usage of the Even in code, can open up the sync.cpp file under the SSCLI's implementation, and see how a call to CLREvent::CreateMonitorEvent triggers a call to UnsafeCreateEvent (that actually is a typedef to the familiar CreateEvent function).

Even so, one have to remember that this is only a partial answer. Since as I've mentioned at the beginning of this post, there's no guarantee that once we'll call Monitor.Enter we will always find ourselves accessing some kernel object. In fact, in one of his posts, Joe Duffy makes sure to mention that in the CLR's implementation, when a thread encounters contention, it will attempt to spin a little before re-trying to acquire the lock, without leaving the user-mode and waiting for some kernel object. So even if the CLR doesn't give a full blown implementation of a synchronization primitive, it may still provide some optimizations over the supplied services of the operating system (much like the CriticalSection type).

Monday, July 5, 2010

Behind The .locals init Flag

As we all probably know, the C# language specification demands that every declared local variable will be assigned-to before its usage.
Even so, whoever used ILDASM to peek into the compiler's generated IL code, must have noticed that right after the deceleration for a method, there's some peculiar line that starts with .local init. Actually something like that:

.method private hidebysig static void  Main(string[] args) cil managed
  // Code size       10 (0xa)
  .maxstack  1
  .locals init ([0] int32 x) <--- localsinit flag 
  IL_0000:  ldc.i4.4
  IL_0001:  stloc.0
  IL_0002:  ldloc.0
  IL_0003:  call       void [mscorlib]System.Console::WriteLine(int32) 

This line represents the existence of the CorILMethod_InitLocals flag in the current method's header. This flag effectively guarantees that the CLR will initialize all of the local variables declared in the method's body to they're default values. Meaning, regardless to which default value you have chosen to set your local variable to (in our case, the variable x gets the value of 4), the platform will always make sure that before our code will execute, the local variable x will necesserily be initialized to its legal, default value (in this case, 0).

In Microsoft's implementation of the CLI, this flag always exists in the method's header (assuming there are local variables declared in its body). This could make us wonder why would the compiler insist on reporting an error every time a programmer forgets the initialize its local variables before using them. This constraint held by the compiler may seem redundant, but in fact, there's several reason to why it is quite necessary.

Before diving into the meaning of the .locals init flag, let's review again the issue of the so called "duplicate assignment" performed by the compiler and the CLR.
Looking at the IL code from before, one may think that every time we declare a local variable we have to pay some "initialization overhead" since both the compiler and the CLR insist on initializing it to some "default value". Even though this overhead is quite minimal, it still gives us some "bad vibe" since it just seems redundent.
But in fact, this duplicate assignment never really occurs. The reason for that lays inside the way that the .locals init flag guarantees the default values assignment. All it does, is to make sure the JIT compiler will generate code that will initialize the variable before its usage. In our case, the JIT will have to generate a mov instruction that will set our x variable to 0.
And so, the assembler code we get during run-time (without using optimizations), confirms it:

Normal JIT generated code
Begin 00e20070, size 30
00E20070 push        ebp
00E20071 mov         ebp,esp
00E20073 sub         esp,8
00E20076 mov         dword ptr [ebp-4],ecx
00E20079 cmp         dword ptr ds:[00942E14h],0
00E20080 je          00E20087
00E20082 call        7A0CA6C1 (JitHelp: CORINFO_HELP_DBG_IS_JUST_MY_CODE)

-------------------- Generated code due to the LocalsInit flag  ----------------
00E20087 xor         edx,edx                 // zero out the EDX register
00E20089 mov         dword ptr [ebp-8],edx // assign the value of EDX to the location of 'X'

--------------------- Our own application's code ---------------------------------
00E2008C mov         dword ptr [ebp-8],4   // assign the value 4 to the location of 'X'

00E20093 mov         ecx,dword ptr [ebp-8]
00E20096 call        79793E74 (System.Console.WriteLine(Int32), mdToken: 060007c3)
00E2009B nop
00E2009C mov         esp,ebp
00E2009E pop         ebp
00E2009F ret

If so, in this code example one can defenitly see the effect of the localsInit flag on the generated assembler code. And also, we can see the existence of the "duplicate-assignment" phenomena as mentioned before.
However, one should remember that this code was generated without any usage of JIT optimizations. Once we will allow these optimizations, we would see that the JIT is able to identify the duplicate assignment, and treats it as dead code since it has no effect on the variable. As a result, the first assignment is completely removed, and only the user's value initialization appears in the generated code:

Normal JIT generated code
Begin 00c80070, size 19
00c80070 push    ebp
00c80071 mov     ebp,esp
00c80073 call    mscorlib_ni+0x22d2f0 (792ed2f0) (System.Console.get_Out(), mdToken: 06000772)
00c80078 mov     ecx,eax
00c8007a mov   edx,4  // assign 4 to the "virtual representation" of X 
00c8007f mov     eax,dword ptr [ecx]
00c80081 call    dword ptr [eax+0BCh]

The first thing we notice is that we don't have a representation of x in our application's main memory. Instead, there's a CPU register that holds its "virtual" representation. But more importantly, we can now see that there's no remnant of the previous duplicate assignment we've witnessed due to the usage of the localsInit flag.

Now, we can check what in fact is the meaning behind the usage of localsInit and the compilers constraint for initializing local variables before they're usage.
Microsoft's argument regarding using the Definite Assignment mechanism is that most of the times programmers don't to initialize they're local variables are cause by logical bugs, and not by the fact that the programmer relies on its compiler to set the variable to its default value. In one of Eric Lippert's comments on his blog, he says it himself:
"The reason we require definite assignment is because failure to definitely assign a local is probably a bug. We do not want to detect and then silently ignore your bug! We tell you so that you can fix it."
The importance of the localsInit flag, can be summarized in one word: Verification.
Verification is the process in which the CLI makes sure that all of the CIL code that exit in the application is "safe". This includes making sure that all of that methods we use are able the receive the number of parameters we pass them, the parameter's types, that all of the local variables are properly initialized, etc.
In case the CLR detects a piece of code that fails the verification process, a VerficationException will be thrown.
Nontheless, not every IL code needs to be verifiable, as mentioned in Partition III of the standard:
"It is perfectly acceptable to generate correct CIL code that is not verifiable, but which is known to be memory safe by the compiler writer. Thus, correct CIL  might not be verifiable, even though the producing compiler might know that it is memory safe."
Having said that, every time we write some unverifiable code, we have to update the proper permissions using the SecurityPermissionAttribute, and explicitly tell the CLR not the perform verification on our code using the SkipVerfication property (the CLR won't perform definite assignment analysis on our code). One of the most common times in which we would want to this is when we write unsafe code. In such cases, we will have to explicitly mark the required check-box in the project's properties, thus, allowing to compiler to compile our unsafe code blocks, and making it to add the UnverifiableCodeAttribute to the generated assembly, telling the CLR that this module is unverifiable.

The verification process requires that every local variable will be initialized. To be exact, it requires that in case no one requested to skip over the verification process, the localsInit flag must be present in the generated IL. For this reason, looking at the CIL instructions reference, you may encounter remarks such as this:
"Local variables are initialized to 0 before entering the method only if the localsinit on the method is true (see Partition I) ... System.VerificationException is thrown if the the localsinit bit for this method has not been set, and the assembly containing this method has not been granted
System.Security.Permissions.SecurityPermission.SkipVerification (and the CIL does not perform automatic definite-assignment analysis) "
Later on, the document address the overhead issue of performing the definite assignment analysis during runtime:
"Performance measurements on C++ implementations (which do not require definite-assignment analysis) indicate that adding this requirement has almost no impact, even in highly optimized code. Furthermore, customers incorrectly attribute bugs to the compiler when this zeroing is not performed, since such code often fails when small, unrelated changes are made to the program."