Monday, October 25, 2010

The Case of NUnit Hanging During Startup

Recently, a coworker of mine encountered a strange behavior in NUnit. Every time he'd open NUnit's graphical interface, it would freeze and stop responding. Even though a test haven't even started to execute, the application simply froze, leaving the user only to wait patiently and stare helplessly at the screen.

In order to investigate the issue, I've fired up Windbg and attached it to the hanged process.
After loading SOS, the !clrstack command was issued so I could get a better understanding to what was keeping the application busy. Since NUnit's graphical interface stopped from responding, one could already assume that the main thread stopped handling messages from the message pump for some reason.
Reviewing the command's output confirmed that assumption. (The outputs in the post were edited for brevity).
0:007> ~*e!clrstack
PDB symbol for mscorwks.dll not loaded
OS Thread Id: 0xe4 (0)
ESP       EIP     
0012ee44 7c90e4f4 Win32Native.GetFileAttributesEx(...)
0012ee58 792e03f6 System.IO.File.FillAttributeInfo(...)
0012eeec 7927ff71 System.IO.File.InternalExists(System.String)
0012ef20 792e96a6 System.IO.File.Exists(System.String)
0012ef4c 03a214c3 NUnit.Util.RecentFileEntry.get_Exists()
0012ef54 00e0fb7a NUnit.Gui.NUnitForm.NUnitForm_Load(Object, EventArgs)
0012ef8c 7b1d4665 System.Windows.Forms.Form.OnLoad(EventArgs)
0012efc0 7b1d4105 System.Windows.Forms.Form.OnCreateControl()
0012efcc 7b1c6d11 System.Windows.Forms.Control.CreateControl(Boolean)
0012f008 7b1c6b14 System.Windows.Forms.Control.CreateControl()
0012f020 7b1d2fc8 System.Windows.Forms.Control.WmShowWindow(Forms.Message)
0012f05c 7b1c8906 System.Windows.Forms.Control.WndProc(Forms.Message)
0012f060 7b1d1d6a [InlinedCallFrame: 0012f060] 
...
0012f418 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0012f42c 00e00643 NUnit.Gui.AppEntry.Main(System.String[])
0012f464 00e00076 NUnit.Gui.Class1.Main(System.String[])
0012f688 79e71b4c [GCFrame: 0012f688] 
OS Thread Id: 0x1464 (1)
Unable to walk the managed stack. The current thread is likely not a 
managed thread. You can run !threads to get a list of managed threads in
the process
OS Thread Id: 0xdac (2)
Failed to start stack walk: 80004005 
Evidently, NUnit is performing a synchronous IO operation on its main thread, which for some reason seems to refuse to complete. It happens to be a good example to why it's considered to be a bad practice to perform "heavy-weight lifting" on the main UI thread, or synchronous IO in general.

After obtaining this piece of information, it was required to understand why the operation doesn't complete. Perhaps the path of the requested file could shed some light on the observed behavior.
Since the path is passed across several managed methods before entering the Win32 API, I've attempted to dump the values of the parameters in the managed call stack using the !clrstack -p command.
0:000> !clrstack -p
OS Thread Id: 0xe4 (0)
ESP       EIP     
0012ee44 7c90e4f4 Win32Native.GetFileAttributesEx(...)
0012ee58 792e03f6 File.FillAttributeInfo(System.String, ...)
    PARAMETERS:
        path = no data
        data = 0x0012eeec
        tryagain = no data
        returnErrorOnNotFound = 0x00000001

0012eeec 7927ff71 System.IO.File.InternalExists(System.String)
    PARAMETERS:
        path = no data

0012ef20 792e96a6 System.IO.File.Exists(System.String)
    PARAMETERS:
        path = no data

0012ef4c 03a214c3 NUnit.Util.RecentFileEntry.get_Exists()
    PARAMETERS:
        this = no data

0012ef54 00e0fb7a NUnit.Gui.NUnitForm.NUnitForm_Load(Object, EventArgs)
    PARAMETERS:
        this = 0x013349ec
        sender = no data
        e = no data
Unfortunately, SOS wasn't able to retrieve the address of the managed string (thus, stating no data). It's likely that either the variable was stored in a register that was already overwritten by some later executing code, or perhaps we're just witnessing some of SOS's good old "buggy nature".

At this point, I take a small step into the world of native debugging and issue the kb command, to display the native call stack, including the first 3 parameters passed to each method.
0:000> kb
ChildEBP RetAddr  Args to Child              
0012ed84 7c90d79c 7c8111ff 0012eddc 0012eda4 ntdll!KiFastSystemCallRet
0012ed88 7c8111ff 0012eddc 0012eda4 00151bb8 ntdll!ZwQueryFullAttributesFile+0xc
0012ee08 0097a310 0139272c 00000000 0012eeec KERNEL32!GetFileAttributesExW+0x84
WARNING: Frame IP not in any known module. Following frames may be wrong.
0012ee2c 792e03f6 0012ee5c 00000000 01392764 +0x97a30f
0012eedc 7927ff71 00000001 00000000 00000000 mscorlib_ni+0x2203f6
0012ef18 792e96a6 00000000 00000000 00000000 mscorlib_ni+0x1bff71
0012ef80 7b1d4665 01336c40 0134f25c 00000000 mscorlib_ni+0x2296a6
0012efb8 7b1d4105 00000004 0012f000 7b1c6d11 System_Windows_Forms_ni+0x204665
0012efc4 7b1c6d11 0133797c 013349ec 00000001 System_Windows_Forms_ni+0x204105
0012f000 7b1c6b14 00000000 00000000 013349ec System_Windows_Forms_ni+0x1f6d11
0012f018 7b1d2fc8 013349ec 00000000 00000000 System_Windows_Forms_ni+0x1f6b14
0012f054 7b1c8906 3cc1af52 79e7a6b8 0012f2d4 System_Windows_Forms_ni+0x202fc8
0012f0ac 7b1d1d6a 013349ec 0012f0c0 7b1d1d20 System_Windows_Forms_ni+0x1f8906
0012f0b8 7b1d1d20 0012f0d0 7b1d2f11 0012f10c System_Windows_Forms_ni+0x201d6a
0012f0c0 7b1d2f11 0012f10c 013349ec 0012f0e4 System_Windows_Forms_ni+0x201d20
0012f0d0 7b1d1af4 00e111ec 0012f10c 01335448 System_Windows_Forms_ni+0x202f11
0012f0e4 7b1c8640 0012f100 7b1c85c1 00000000 System_Windows_Forms_ni+0x201af4
0012f0ec 7b1c85c1 00000000 01335448 0012f130 System_Windows_Forms_ni+0x1f8640
0012f100 7b1c849a 01335448 00291106 00000018 System_Windows_Forms_ni+0x1f85c1
0012f164 7e418734 00291106 00000018 00000001 System_Windows_Forms_ni+0x1f849a
As you can see, the thread eventually made its way into the kernel, calling GetFileAttributesExW.
Knowing that the first parameter that gets passed to the method (lpFileName) represent the name of the file or directory in hand, we are only left to dump its value using the du command.
0:000> du 0139272c
0139272c  "\\123.123.123.123\foo.dll"
With that, the culprit is found. Apparently, some time ago a UT assembly was loaded into NUnit from a certain IP address that is no longer available in the network, causing the thread to stall indefinitely.

In order to fix the issue, the problematic path was needed to be removed from the RecentProjects list in NUnit's settings file (located at %AppData%\NUnit\NUnitSettings.xml).

Monday, October 11, 2010

Hot/Cold Data Separation Considered Harmful in Multithreaded Environments

When optimizing your code for performance, a quite important step in the way is to check that your application correctly utilizes its processor's cache. Of course, the magnitude of importance here is solely depended on your own personal scenario, but in general, it's always a good idea to keep your cache usage patterns in mind when attempting to resolve a performance bottleneck or perhaps for when you're just looking for possible ways to enhance you're application's performance.

Due to the relative high cost of accessing the main memory, modern processors make use of a data and an instruction cache in an attempt to lower the latency involved in accessing the main memory.
A processor may choose to keep in its cache the values of frequently accessed memory addresses, or perhaps prefetch the values of adjacent memory addresses for possible future usage.
When the processor attempts to access a memory address, it first check to see whether it's already exist in one of its caches (L1, L2, L3 etc.). If the address isn't found, then we have a "cache miss" and the processor we'll have to perform a round-trip to the main memory in order to obtain the required data. Wasting valuable CPU cycles on the way.

Not too long ago, one of the most popular guidelines in designing cache-aware data structures was to split them into "hot" and "cold" parts. This kind of separation makes sense since it could lead to a more efficient utilization of the cache. Frequently used fields (the hot part) are grouped together in the same cache lines, instead of being mixed with infrequently used fields (the cold part). This way, the cache is able to contain more hot data, and cache lines aren't wasted to hold cold data.
For those of you who are interested, the subject is covered in greater detail in the article Splitting Data Objects to Increase Cache Utilization by Michael Franz and Thomas Kistler.

 Red: Hot data, Green: Cold data

While following this guideline could lead to a more efficient cache utilization, it won't necessarily lead to a performance improvement in a multithreaded environment. In fact, it's more likely that you'll face a performance degradation than gaining any performance improvement at all.
The problem in hand is that highly utilized caches could easily send your valuable CPU cycles down the drain, due to the effects of false sharing. Once a single cache line holds several frequently written-to fields, the chance it will get invalidated by another processor gets greater. Sharing a single cache line with multiple frequently modified fields could easily have a negative effect on your cache's locality.
In multithreaded environments, one could benefit from sparsely allocating frequently used fields in the cache. There's an obvious trade-off between cache utilization (memory usage) and locality (performance). While the cache will contain less hot data (which could result in more round-trips to the main memory), it would benefit from better locality, hence better scalability across multiple processors that might attempt to modify the same object simultaneously.

When designing a cache-aware data structure, you don't necessarily have to order your fields in such way that a hot field will always get followed by a cold field. Instead, "artificial" padding could be used to fill the excessive space left in the cache line which holds the structure. In .Net, decorating the type with an StructLayout(LayoutKind.Explicit) attribute and assigning each field with its appropriate offset would do the job.