Showing posts with label GC. Show all posts
Showing posts with label GC. Show all posts

Friday, August 12, 2011

Writing a Manual Memory Manager in C#

Garbage collection. Aye? or Nay?
As usual, it depends. That is, on which developer you might ask. Some like to have as much control as possible over the way their code executes, while others simply love the fact that they don't have to deal with the "mundane" job of keeping track on their memory allocations.
Since there aren't really any "absolute truths" in anything related to programming, in reality you'd sometimes want to have complete control over your memory management, while at other times you wouldn't really care about it "as long as it gets done".
Since we're mostly discussing .Net and here, we could say that we've got the "as long as it gets done" part covered quite well, by the CLR's garbage collection mechanism. So it's time to see how we could approach implementing a manual memory manager in C#.

What we've eventually would like to have, is an API that would enable us to allocate and deallocate typed memory on demand. Of course C# doesn't natively support the new and delete keywords we so kindly remember from C++, so we'll have come up with our own utility functions to do the job.
Eventually, our code should look something similar to this:

static void Main()
    ITypeAllocator mgr = new ManalocManager();

    IFooData obj = mgr.New<IFooData>();

    obj.Bar = 1;
    obj.Car = 2;
    obj.Jar = 3;
    obj.Tar = 4;

Disabling the garbage collector completely is an unreasonable thing to do in a platform such as .Net. Doing so would probably miss the platform's purpose. Anyone who truly wants to have complete control over the execution of its program wouldn't bother using C# anyway (or any other managed language for that matter).
However, while using C#, there might be some times that we'll want to manage our own memory, instead of having the garbage collector doing it for us. And even if not, it's still a subject interesting enough to explore and mostly play with.

In order to demonstrate how we could do achieve manual memory management in C#, lets have a look at the following interface:

public interface IFooData
    int Bar { get; set; }
    long Jar { get; set; }
    double Car { get; set; }
    byte Tar { get; set; }

The classic method to implement this interface would be to create a class with four members that will match the coresponding properties. However, doing so will result in a 21 bytes structure that will reside in the GC heap (not counting padding and the preceding object header).
Instead, we could allocate the required memory block in the native heap (using AllocHGlobal) and modify out propertie's to access the native memory at the required offsets (e.g. 0 for Bar, 4 for Jar and 12 for Car). Using a Delete method, we could free the native memory block on demand, when we please.

public class FooData : IFooData
    private unsafe byte* _native;

    public FooData()
            _native = (byte*)(Marshal.AllocHGlobal(21).ToPointer());

    public int Bar
        get { unsafe { return *(int*)&_native[0]; } }
        set { unsafe { *(int*)&_native[0] = value; } }

    public long Jar
        get { unsafe { return *(long*)&_native[4]; } }
        set { unsafe { *(long*)&_native[4] = value; } }

    public double Car
        get { unsafe { return *(double*)&_native[12]; } }
        set { unsafe { *(double*)&_native[12] = value; } }

    public byte Tar
        get { unsafe { return *(byte*)&_native[20]; } }
        set { unsafe { *(byte*)&_native[20] = value; } }

    public void Delete()
            Marshal.FreeHGlobal(new IntPtr(((void*)(_native))));
            _native = (byte*)(IntPtr.Zero);
The problem with such implementation is that it could be very tedious to code and implement. Even for simple structures like IFooData, the resulting implementation could be quite taunting.
Fortunately enough, we can automate the implementation process by adding a code generator that will implemenet our interfaces on the fly, during runtime.
The following interface should loosely describe the capabilities our manual memory manager should support:
public interface ITypeAllocator
    T New();
    void Delete(T instance);

    void PreGenerate(params Type[] types);

The generic parameter T accepts user-defined data representing interfaces such as IFooData.
Once the New method is called, our manager should generate code, compile it and instancate it during runtime. The resulting instance is then returned to the caller for it to be used. Once it finishes using it, and wants to release its memory, it calls the Delete method.
The PreGenerate method's purpose is the optimize the code's generation/compilation process. Once the user pre-generates a type, it won't have to wait on the first call to the New method (much like the process of forcing the JIT compiler to execute on your assemblies).

When it comes to code generation, there are basically two ways to choose from: CodeDOM and Templates. Each one of them has its pros and cons, personaly I tend to prefer the CodeDOM way of doing things. While using it could result in quite verbose code, I believe that its easier to maintain in larger projects than templates.
Unforutantly, .Net's CodeDOM model doesn't support unsafe code, so I had to resort to using some workarounds to represent all of the unsafe code blocks.
This should be a good time to mention the Refly library which wraps around .Net CodeDOM API, making it much simpler and innutative to use.
The demonstrated implementation is very naive and limited regarding the kind of types it is able to generate, though it should illustrate the discussed concept.

public class ManalocManager : ITypeAllocator
    // key: userType, value: generatedType
    private Dictionary<Type, Type> m_generatedTypesCache;

    public ManalocManager()
        m_generatedTypesCache = new Dictionary<Type, Type>();

    public void Delete(T instance)
        if (!(instance is IManalocGeneratedType))
            throw new ArgumentException("Attempted to delete an unexpected type");

        IManalocGeneratedType generatedType = (IManalocGeneratedType)instance;

    public void PreGenerate(params Type[] types)
        foreach (Type curUserType in types)

    public T New()
        Type userType = typeof(T);

        Type generatedType;
        bool alreadyGenerated = m_generatedTypesCache.TryGetValue(userType, out generatedType);
        if (!alreadyGenerated)
            generatedType = generateAndAddToCache(userType);

        object result = Activator.CreateInstance(generatedType);
        return (T)result;

    private Type generateAndAddToCache(Type userType)
        Type generatedType = generateProxy(userType);
        m_generatedTypesCache.Add(userType, generatedType);

        return generatedType;

    private Type generateProxy(Type userType)
        NamespaceDeclaration ns;
        string typeName = createType(userType, out ns);

        string sourceFile = generateCode(ns);
        Assembly compiledAssembly = compile(userType, sourceFile);

        Type compiledType = compiledAssembly.GetType(typeName);

        return compiledType;

    private string createType(Type userType, out NamespaceDeclaration namespaceDec)
        PropertyInfo[] userProperties = userType.GetProperties();

        namespaceDec = new NamespaceDeclaration("Manaloc.AutoGenerated");
        ClassDeclaration classDec = namespaceDec.AddClass(userType.Name + "_Manaloc_AutoGenerated");

        FieldDeclaration nativeMember = classDec.AddField("unsafe byte*", "native");

        addConstructor(userProperties, classDec, nativeMember);
        addDeleteMethod(classDec, nativeMember);
        addProperties(userProperties, classDec, nativeMember);

        string typeName = namespaceDec.Name + "." + classDec.Name;
        return typeName;

    private void addConstructor(PropertyInfo[] userProperties, ClassDeclaration classDec, FieldDeclaration nativeMember)
        int totalSize = sumSize(userProperties);

        ConstructorDeclaration ctor = classDec.AddConstructor();

            Expr.Cast(typeof(byte*), Expr.Type(typeof(Marshal)).Method("AllocHGlobal").Invoke(Expr.Prim(totalSize)).


    private void addDeleteMethod(ClassDeclaration classDec, FieldDeclaration nativeMember)
        MethodDeclaration disposeMethod = classDec.AddMethod("Delete");
        disposeMethod.Attributes = MemberAttributes.Final | MemberAttributes.Public;

            Expr.New(typeof(IntPtr), Expr.Cast("void*", Expr.This.Field(nativeMember)))));
            Expr.Cast("byte*", Expr.Type(typeof(IntPtr)).Field("Zero")));

    private void addProperties(PropertyInfo[] userProperties, ClassDeclaration classDec, FieldDeclaration nativeMember)
        int offset = 0;
        foreach (PropertyInfo curProperty in userProperties)
            Type propType = curProperty.PropertyType;
            int propSize = Marshal.SizeOf(propType);

            PropertyDeclaration propDec = classDec.AddProperty(propType, curProperty.Name);
            propDec.Attributes = MemberAttributes.Final | MemberAttributes.Public;

            if (curProperty.CanRead)
                addGetter(nativeMember, offset, propType, propDec);

            if (curProperty.CanWrite)
                addSetter(nativeMember, offset, propType, propDec);

            offset += propSize;

    private void addSetter(FieldDeclaration nativeMember, int offset, Type propType, PropertyDeclaration propDec)
        propDec.Set.Add(Stm.Snippet("*(" + propType.Name + "*)&"));
        propDec.Set.AddAssign(Expr.This.Field(nativeMember).Item(offset), Expr.Value);

    private void addGetter(FieldDeclaration nativeMember, int offset, Type propType, PropertyDeclaration propDec)
        propDec.Get.Add(Stm.Snippet("return *(" + propType.Name + "*)&"));

    private string generateCode(NamespaceDeclaration ns)
        string sourceFile = null;

        const string outDir = "ManalocAutoGenerated";
        if (!Directory.Exists(outDir))

        Refly.CodeDom.CodeGenerator generator = new Refly.CodeDom.CodeGenerator();
        generator.CreateFolders = false;
        generator.FileCreated += (object sender, StringEventArgs args) => { sourceFile = args.Value; };

        generator.GenerateCode(outDir, ns);

        if (sourceFile == null)
            throw new Exception("Faliled to generate source file");

        return sourceFile;

    private Assembly compile(Type userType, string sourceFile)
        CompilerParameters compilerParams = new CompilerParameters();
        compilerParams.CompilerOptions = "/unsafe /optimize";
        CompilerResults result =
            Refly.CodeDom.CodeGenerator.CsProvider.CompileAssemblyFromFile(compilerParams, new string[] { sourceFile });

        Assembly compiledAssembly = result.CompiledAssembly;
        return compiledAssembly;

    private int sumSize(PropertyInfo[] userProperties)
        int size = 0;
        foreach (PropertyInfo curProperty in userProperties)
            size += Marshal.SizeOf(curProperty.PropertyType);

        return size;

public interface IManalocGeneratedType
    void Delete();

Saturday, August 7, 2010

Accurately Measuring GC Suspensions

When you analyze the performance of a managed application, and look for the application's main bottlenecks, one place you should always check is the amount of time your application spend in GC. In order to get that information, you can always run Perfmon and get a general analysis of how the GC behaves in your application (where the most of the time, you'll be looking at the %Time in GC counter).
While looking at those data sets can give us a good overview about how the GC affects our application's performance, it's just not enough in order to get more in-depth insights about observed "fezzes" in the application. For instance, even if we'll see that in 90% of the time we only spend 2% in GC collections, it still doesn't mean that during some critical moment in our application, there wasn't any generation 2 collection that caused our application to freeze for about 200ms, which could cause some serious damage in some scenarios.
The problem is that GC collections don't just cause the application to "run more slowly", but it puts the application into a complete stop for unknown amounts of time. So solely looking at the %Time in GC graph doesn't tells us for sure what was the real damage of each collection. Also, remember that perfmon's highest sampling resolution is only 1 second. So in case we've suffered from a single generation 2 collection, we've might even won't notice it as we should. More than that, its almost impossible to make sure that perfmon is runs, gathers and records the correct data on every machine that runs your application at any given time.

This is why we need a reliable way to monitor when the GC decides to perform a collection, and when it does, check the collection's generation, and the amount of time it takes to complete it.
Today, the System.GC class doesn't expose us an accurate way to collect this data. However, by writing a custom CLR Host, we could integrate our code with the CLR and receive from it the proper notifications that will tell us exactly when each and every GC starts and end in the process.
In order to do so, we will need to implement the IHostGCManager interface, and use one out of its three available callback functions. The first is SuspensionStarting, and the second is SuspensionEnding, which also passes us a single DWORD parameter that represent the number of generation that ended. So as you might have already figured out, the CLR makes sure to call SuspensionStarting right before it starts the collection, and afterwords, it calls SuspensionEnding.
One thing to pay attention about SuspensionEnding is that the documentation available on MSDN isn't entirely accurate, and might confuse users about what are the possible triggers to call the callback. This is what the documentation says today:

"...Notifies the host that the common language runtime (CLR) is resuming execution of tasks on threads that had been suspended for a garbage collection.
[parameter: generation] The garbage collection generation that is just finishing, from which the thread is resuming."
So according to the documentation, the callback will only be invoked due to freezings caused by a GC. However, this isn't the only trigger for invoking this callback. Actually, the CLR will invoke it also after it continuous execution after other kinds of "stop-the-world" operations that it might perform (e.g, loading and unloading of AppDomains). If we'll look in the SSCLI's implementation to where that callback is invoked, we could notice the following code:

if (pGCThreadControl)
    // If we the suspension was for a GC, tell the host what generation GC.
    DWORD   Generation = (bFinishedGC
        ? GCHeap::GetGCHeap()->GetCondemnedGeneration()
        : ~0U);


So we can see that for every freezing that wasn't caused due to a GC, the generation parameter that is passed to the callback will contain the value of UINT_MAX, so when implementing your CLR Host, you should remember checking for this special value.

As for the measuring itself, we'll use the QueryPerformanceCounter function (in .Net, wrapped by the Stopwatch class), to achieve the highest possible time resolution for every collection.
Since in most of the time, the collections that we'll encounter will be very short ones (mostly ephemeral collections, that could take only a few millisecond per collection), we'll likely want to avoid spending time recording the data (so we'll avoid unnecessary IO). In such case it could be useful to use a logging framework that will filter collections according to their severity (e.g, Debug for very short collections, Info for more notable collections, and Warn for more lengthy collections that might indicate a problem). After attaching an appender that writes all of our logs to the console window, and running an application that constantly allocates memory, we could get an output such as this:

As a reference, I'm including a sample CLR Host that monitors collections, and writes to the console their duration:


using namespace std;

#define APP_STARTUP_EXE L"TestApplication.exe"
#define APP_ENTRY_TYPE L"SomeNamespace.Program"
#define APP_ENTRY_METHOD L"Main"

class MyCLRHost : public IHostControl, public IHostGCManager
    LONG m_refCount;
    LARGE_INTEGER m_lastGCStart;
    LARGE_INTEGER m_frequency;

    MyCLRHost() { QueryPerformanceFrequency(&m_frequency); }

    // IHostControl
    HRESULT __stdcall GetHostManager(REFIID riid, void** ppObject)
        if(riid == IID_IHostGCManager)
            *ppObject = static_cast(this);
            return S_OK;

        *ppObject = NULL;
        return E_NOINTERFACE;

    // IUnknown
    HRESULT __stdcall QueryInterface(REFIID riid, void** ppvObject)
        if (riid == IID_IHostGCManager)
            *ppvObject = static_cast(this);
            return S_OK;

        *ppvObject = NULL;
        return E_NOINTERFACE;

    HRESULT __stdcall SetAppDomainManager(DWORD appDomain, IUnknown* domainManager)
        return S_OK;

    ULONG __stdcall AddRef() { return InterlockedIncrement(&m_refCount); }
    ULONG __stdcall Release() { return InterlockedDecrement(&m_refCount); }

    // IHostGCManager
    HRESULT __stdcall ThreadIsBlockingForSuspension() { return S_OK; }

    HRESULT __stdcall SuspensionStarting()

        return S_OK;

    HRESULT __stdcall SuspensionEnding(DWORD gen)
        LARGE_INTEGER gcEnd;
        double duration = ((gcEnd.QuadPart - m_lastGCStart.QuadPart))
            * 1000.0 / (double)m_frequency.QuadPart;

        if(gen != UINT_MAX)
            cout<<"GC generation "<<<" ended: "<<<"ms"<
            cout<<"CLR suspension ended: "<<<" ms"<

        return S_OK;

int _tmain(int argc, _TCHAR* argv[])
    ICLRRuntimeHost* pCLR;
    HRESULT hr = CorBindToRuntimeEx(L"v2.0.50727", L"wks", startupFlags,
        CLSID_CLRRuntimeHost, IID_ICLRRuntimeHost, (LPVOID*)&pCLR);

    MyCLRHost customHost;
    hr = pCLR->SetHostControl(&customHost);

    hr = pCLR->Start();

    DWORD retcode;
    hr = pCLR->ExecuteInDefaultAppDomain(APP_STARTUP_EXE,
        APP_ENTRY_TYPE, APP_ENTRY_METHOD, L"" , &retcode);

    return 0;