Monday, October 25, 2010

The Case of NUnit Hanging During Startup

Recently, a coworker of mine encountered a strange behavior in NUnit. Every time he'd open NUnit's graphical interface, it would freeze and stop responding. Even though a test haven't even started to execute, the application simply froze, leaving the user only to wait patiently and stare helplessly at the screen.

In order to investigate the issue, I've fired up Windbg and attached it to the hanged process.
After loading SOS, the !clrstack command was issued so I could get a better understanding to what was keeping the application busy. Since NUnit's graphical interface stopped from responding, one could already assume that the main thread stopped handling messages from the message pump for some reason.
Reviewing the command's output confirmed that assumption. (The outputs in the post were edited for brevity).
0:007> ~*e!clrstack
PDB symbol for mscorwks.dll not loaded
OS Thread Id: 0xe4 (0)
ESP       EIP     
0012ee44 7c90e4f4 Win32Native.GetFileAttributesEx(...)
0012ee58 792e03f6 System.IO.File.FillAttributeInfo(...)
0012eeec 7927ff71 System.IO.File.InternalExists(System.String)
0012ef20 792e96a6 System.IO.File.Exists(System.String)
0012ef4c 03a214c3 NUnit.Util.RecentFileEntry.get_Exists()
0012ef54 00e0fb7a NUnit.Gui.NUnitForm.NUnitForm_Load(Object, EventArgs)
0012ef8c 7b1d4665 System.Windows.Forms.Form.OnLoad(EventArgs)
0012efc0 7b1d4105 System.Windows.Forms.Form.OnCreateControl()
0012efcc 7b1c6d11 System.Windows.Forms.Control.CreateControl(Boolean)
0012f008 7b1c6b14 System.Windows.Forms.Control.CreateControl()
0012f020 7b1d2fc8 System.Windows.Forms.Control.WmShowWindow(Forms.Message)
0012f05c 7b1c8906 System.Windows.Forms.Control.WndProc(Forms.Message)
0012f060 7b1d1d6a [InlinedCallFrame: 0012f060] 
...
0012f418 7b195911 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
0012f42c 00e00643 NUnit.Gui.AppEntry.Main(System.String[])
0012f464 00e00076 NUnit.Gui.Class1.Main(System.String[])
0012f688 79e71b4c [GCFrame: 0012f688] 
OS Thread Id: 0x1464 (1)
Unable to walk the managed stack. The current thread is likely not a 
managed thread. You can run !threads to get a list of managed threads in
the process
OS Thread Id: 0xdac (2)
Failed to start stack walk: 80004005 
Evidently, NUnit is performing a synchronous IO operation on its main thread, which for some reason seems to refuse to complete. It happens to be a good example to why it's considered to be a bad practice to perform "heavy-weight lifting" on the main UI thread, or synchronous IO in general.

After obtaining this piece of information, it was required to understand why the operation doesn't complete. Perhaps the path of the requested file could shed some light on the observed behavior.
Since the path is passed across several managed methods before entering the Win32 API, I've attempted to dump the values of the parameters in the managed call stack using the !clrstack -p command.
0:000> !clrstack -p
OS Thread Id: 0xe4 (0)
ESP       EIP     
0012ee44 7c90e4f4 Win32Native.GetFileAttributesEx(...)
0012ee58 792e03f6 File.FillAttributeInfo(System.String, ...)
    PARAMETERS:
        path = no data
        data = 0x0012eeec
        tryagain = no data
        returnErrorOnNotFound = 0x00000001

0012eeec 7927ff71 System.IO.File.InternalExists(System.String)
    PARAMETERS:
        path = no data

0012ef20 792e96a6 System.IO.File.Exists(System.String)
    PARAMETERS:
        path = no data

0012ef4c 03a214c3 NUnit.Util.RecentFileEntry.get_Exists()
    PARAMETERS:
        this = no data

0012ef54 00e0fb7a NUnit.Gui.NUnitForm.NUnitForm_Load(Object, EventArgs)
    PARAMETERS:
        this = 0x013349ec
        sender = no data
        e = no data
Unfortunately, SOS wasn't able to retrieve the address of the managed string (thus, stating no data). It's likely that either the variable was stored in a register that was already overwritten by some later executing code, or perhaps we're just witnessing some of SOS's good old "buggy nature".

At this point, I take a small step into the world of native debugging and issue the kb command, to display the native call stack, including the first 3 parameters passed to each method.
0:000> kb
ChildEBP RetAddr  Args to Child              
0012ed84 7c90d79c 7c8111ff 0012eddc 0012eda4 ntdll!KiFastSystemCallRet
0012ed88 7c8111ff 0012eddc 0012eda4 00151bb8 ntdll!ZwQueryFullAttributesFile+0xc
0012ee08 0097a310 0139272c 00000000 0012eeec KERNEL32!GetFileAttributesExW+0x84
WARNING: Frame IP not in any known module. Following frames may be wrong.
0012ee2c 792e03f6 0012ee5c 00000000 01392764 +0x97a30f
0012eedc 7927ff71 00000001 00000000 00000000 mscorlib_ni+0x2203f6
0012ef18 792e96a6 00000000 00000000 00000000 mscorlib_ni+0x1bff71
0012ef80 7b1d4665 01336c40 0134f25c 00000000 mscorlib_ni+0x2296a6
0012efb8 7b1d4105 00000004 0012f000 7b1c6d11 System_Windows_Forms_ni+0x204665
0012efc4 7b1c6d11 0133797c 013349ec 00000001 System_Windows_Forms_ni+0x204105
0012f000 7b1c6b14 00000000 00000000 013349ec System_Windows_Forms_ni+0x1f6d11
0012f018 7b1d2fc8 013349ec 00000000 00000000 System_Windows_Forms_ni+0x1f6b14
0012f054 7b1c8906 3cc1af52 79e7a6b8 0012f2d4 System_Windows_Forms_ni+0x202fc8
0012f0ac 7b1d1d6a 013349ec 0012f0c0 7b1d1d20 System_Windows_Forms_ni+0x1f8906
0012f0b8 7b1d1d20 0012f0d0 7b1d2f11 0012f10c System_Windows_Forms_ni+0x201d6a
0012f0c0 7b1d2f11 0012f10c 013349ec 0012f0e4 System_Windows_Forms_ni+0x201d20
0012f0d0 7b1d1af4 00e111ec 0012f10c 01335448 System_Windows_Forms_ni+0x202f11
0012f0e4 7b1c8640 0012f100 7b1c85c1 00000000 System_Windows_Forms_ni+0x201af4
0012f0ec 7b1c85c1 00000000 01335448 0012f130 System_Windows_Forms_ni+0x1f8640
0012f100 7b1c849a 01335448 00291106 00000018 System_Windows_Forms_ni+0x1f85c1
0012f164 7e418734 00291106 00000018 00000001 System_Windows_Forms_ni+0x1f849a
As you can see, the thread eventually made its way into the kernel, calling GetFileAttributesExW.
Knowing that the first parameter that gets passed to the method (lpFileName) represent the name of the file or directory in hand, we are only left to dump its value using the du command.
0:000> du 0139272c
0139272c  "\\123.123.123.123\foo.dll"
With that, the culprit is found. Apparently, some time ago a UT assembly was loaded into NUnit from a certain IP address that is no longer available in the network, causing the thread to stall indefinitely.

In order to fix the issue, the problematic path was needed to be removed from the RecentProjects list in NUnit's settings file (located at %AppData%\NUnit\NUnitSettings.xml).

2 comments:

  1. Really good post.
    I have the same situation, but unfortunately "du" doesn't get file name in my case :(

    ReplyDelete