Monday, June 28, 2010

String.Format Isn't Suitable for Intensive Logging

One of the first rules that every .Net programmer learns to memorize is "Never use the + operator to combine strings. Only use the StringBuilder class to do so". If you'll ask a random programmer why should we do so, the typical answer would be "Since strings in .Net are immutable, every time we use the + operator we actually cause a memory allocation, so in order to avoid we should use the StringBuilder".
This answer is a common misconception. Does StringBuilder is necessarily better than using the + operator, or perhaps other means for combining strings? No, not necessarily.

First of all, let's look at the + operator's behavior. Basically, what it does is to allocate a new string, and copy the characters from the two older strings into the new one. So theoretically, there's a serious negative impact on memory usage, since when combining multiple strings, we could find ourselves allocation redundant strings that will only be used to contain "parts" of targeted final string.
This is only partialy true, since the C# compiler will detect combinations of 5 strings and above using the + operator, and will convert it to use string.Concat.
For example, this code:

string str2 = 1.ToString() + 2.ToString() + 3.ToString() + 4.ToString() + 5.ToString();

Will be compiled into this code:

string[] CS$0$0000 = new string[] { 1, 2, 3, 4, 5 }; // edit: removed ToString
string text1 = string.Concat(CS$0$0000);

Regarding string.Concat, you shouldn't be worried about excessive memory allocations, since the first thing the method does is to calculate the combined string's total length, creates the string using a call to AllocateFastString, and then simply copies the content of the old strings to the new one using wstrcpy (efficient raw bytes copying using unsafe code).

One of the more common usages for combining strings, is in log formatting. In case your application does lots of intensive logging, then there's a good chance that it's spending some serious amounts of time in formatting and combining strings that will eventually arrive to the logging framework.
The situation might change a little depending on which logging framework you use, but I'll demonstrate using log4net. The case in log4net is that you may choose to use the simple logging methods without formatting (e.g, Deubg/Info/Warn..), or, you may also choose to use the ones that support formatting (e.g, DebugFormat/InfoFormat/WarnFormat..). While the first option receives a parameterless string, the other receives a formatted string, with additional paramters that should be placed inside the string, using string.Format. Now this could introduce a problem.
The issue here is that every time we would like to insert some parameter into our log, we are most likely to use log4net's Format method (that uses string.Format) instead of using string.Concat or perhaps string.Join. Just because this API seems more convenient, and approachable.
The following benchmark demonstrates the differences between the three alternatives:
    Stopwatch sw = Stopwatch.StartNew();

    for (int i = 0; i < 500000; i++)
        // 960ms
        string str1 = string.Format("{0}, {1}, {2}, {3}, {4}", 1, 2, 3, 4, 5);
        // 665ms
        string str2 = string.Concat(1, ", ", 2, ", ",3, ", ", 4, ", ", 5);
        // 566ms
        string str3 = string.Join(", ", new string[] { 1, 2, 3, 4, 5 });


In this case, even though I've use a fairly simple log, the differences between the methods are definite. When comparing the performance of string.Format to string.Concat, we see that we achieved a 31% of improvement. And in comparison to string.Join, we achieve a substantial 62% of performance gain (string.Join's implementation doesn't use string.Concat under the covers, but works in a similar way by calling FastAllocateString and using the UnSafeCharBuffer class to construct the new string).
Don't think that this performance improvement comes with the cost of more intensive memory usage, since in this area, Concat and Join still achieve much better results than string.Format.
These results dosn't mean that StringBuilder will always produce lesser performance, since in cases in which you don't always know the number of concatenations you will need to perform, or perhaps in cases in which you could optimize memory allocations by modifying the capacity variable in the StringBuilder's constructor, you could actually achieve better results than these.

Of course that if you perform your logs only "once in a while", this performance difference is almost meaningless. And the fact that using FooFormat might be more comfortable to us might be just enough to justify its usage. But, in cases in which your application performs some intensice logging, these performance differences might be substantial for you.

Sunday, June 13, 2010

Headaches with Prefix and Temporary Variables

No less than 6 years ago, Luca Bolognese wrote a post that dealt with an issue raised in the C# User Group. The question was "What should be the value of 'x' after executing the following lines of code?"

int x = 3;
x += x++;

If we're aware of the differences between Prefix and Postfix, then we should realize that the value of x will be 6 since the last increment is redundant. In fact, we could simplify the expression by writing x = x + x, since we'll still get the same result (It's important to pay attention and understand that this is the case specifically for C#. In C++ for example, the behavior of this expression is undefined).

So after we've passed this starting point, allow me to take one step forward and suggest the following, highly readable, lines of code:

int x = 10;
x = --x + x + --x;

This is the point in which things get a little more interesting. What would be the value of x after execution? This expression can be quite confusing, so I suggest taking a couple of minutes to make sure you're not missing anything.
Ready? If your answer was "26", then you're correct. Why? well, there's a subtle hint in the post's header.

What's confusing about this expression is that we always need to keep track of where our variable's value is stored. Whether its in the main memory? a CPU register? or perhaps in a totally different location? In fact, during execution, the program actually reserves 4 bytes on the thread's stack to keep the result of part of the computation.
In the first step, we decrement x's value by 1, and update it's value in main memory (could effectively be in the CPU cache, but this part isn't relevant to our case). At this point, x's value is set to 9. Next, we multiply x's value by 2 (x+x) and store the resulting value on a new variable located on the stack. Do not confuse, we don't update the value of the original x variable at this time. Afterwords, we decrement x's value again by 1 (now x is set to 8), then then combine it to the value of the temporary variable. The result of this last computation goes to x. So by the end of this single line of code, x is set to 26.

Looking at the generated assembler code, things might look a bit clearer:
00000019  mov  dword ptr [ebp-4],0Ah   // x = 10
00000020  dec  dword ptr [ebp-4]       // x = 9
00000023  mov  eax, dword ptr [ebp-4]  // x = 9, eax = 9
00000026  add  eax, dword ptr [ebp-4]  // x = 9, eax = 18
00000029  mov  dword ptr [ebp-8],eax   // x = 9, eax = 18, temp = 18
0000002c  dec  dword ptr [ebp-4]       // x = 8, eax = 18, temp = 18
0000002f  mov  eax, dword ptr [ebp-8]  // x = 8, eax = 18, temp = 18
00000032  add  dword ptr [ebp-4],eax   // x = 26

Wednesday, June 2, 2010

DateTime.Now Causes Boxing

Update 2.9.2010
The content of this post regards DateTime.Now's implementation until v3.5. At the release of v4.0, its internal implementation was updated to avoid unnecessary boxing.
However, looking into the new implementation details revealed a new source for dynamic memory allocations in DateTime.Now. The issue is discussed in detail in the updated post: DateTime.Now in v4.0 Causes Dynamic Memory Allocations.

Perhaps you weren't aware of it, but every time you access DateTime.Now, you cause to a dynamic memory allocation through the boxing of an Int32 variable.
The reason for this lays deep inside the implementation of the Now property. If we will look a bit closer using Reflector, we could see that it actually wraps a call to UtcNow and then converts it to local time using the ToLocalTime method.
While there's nothing wrong with the call to UtcNow (it just wraps a call to the Win32 GetSystemTimeAsFileTime), there is some trouble with ToLocalTime, or actually, with the class CurrentSystemTimeZone that is being used. During the conversion the method GetDaylightChanges is called that causes the boxing. You can look for yourself:

    public override DaylightTime GetDaylightChanges(int year)
        object key = year;
        if (!this.m_CachedDaylightChanges.Contains(key))
            // ..lots of code
        return (DaylightTime)this.m_CachedDaylightChanges[key];

This is a good example showing that even in latest versions of the .Net Framework, some types still uses non-generic data structures, from the days of 1.0/1.1. In this case, the data member m_CachedDaylightChanges is actually a Hashtable).
This means that you might find plenty more of cases in which unnecessary boxing occurs without your knowing. But what makes DateTime.Now case especially serious is that it's a very common property to access, especially by logging frameworks for instance.

All of those boxings comes with a price, and it's not cheap. Due to the high frequency of dynamic memory allocations, we could cause some serious stress for the GC.
In order to demonstrate the impact on memory usage, I'll use the following benchmark:

        while (true)
            Stopwatch sw = Stopwatch.StartNew();
            for (int i = 0; i < 1000000; i++)
                DateTime now = DateTime.Now;
After executing this code, we can learn two things. The first, is that it takes on average 456ms to complete each iteration on the while loop (I'll address this result in a moment), and the other, is that we can learn the magnitude of the affect of calling DateTime.Now on our application's memory usage patterns. In this case, I've used perfmon to monitor the application's memory allocation rate:

Even though most of these allocations will be released by the GC in a Generation 0 collection, we still can't take them as granted. Especially if you're developing an application where you need to minimize as possible any dynamic memory allocation, mostly because you want to reduce the number of collections performed by the GC.

How to Avoid The Boxing?
After we've seen why we shouldn't call DateTime.Now (in certain scenarios), raises the question "How could we avoid those unnecessary memory allocations?".
First, there's Microsoft's side, which needs to make sure to update their code to make sure they avoid boxing (e.g, using a Dictionary instead of a Hashtable). But until this happens, we'll need to take matters to our own hands and workaround this issue. My suggestion is to write a wrapper method that is almost identical to UtcNow's implementation (that simply wraps a call to the Win32 API). All we need to do is to call GetLocalTime that will return use the the in SYSTEMTIME format, which we will need to convert back to a DateTime type.
For example:

public static class TimeUtil
    static extern void GetLocalTime(out SYSTEMTIME time);
    private struct SYSTEMTIME
        public ushort Year;
        public ushort Month;
        public ushort DayOfWeek;
        public ushort Day;
        public ushort Hour;
        public ushort Minute;
        public ushort Second;
        public ushort Milliseconds;
    public static DateTime LocalTime
            SYSTEMTIME nativeTime;
            GetLocalTime(out nativeTime);
            return new DateTime(nativeTime.Year, nativeTime.Month, nativeTime.Day,
                                nativeTime.Hour, nativeTime.Minute, nativeTime.Second,
                                nativeTime.Milliseconds, DateTimeKind.Local);

Running the benchmark from before with this updated implementation, we could see that we completely got rid of the memory allocations, and as a bonus, every iteration takes only 370ms. That is, an 18% improvement over the performance of DateTime.Now's implementation.