Monday, June 28, 2010

String.Format Isn't Suitable for Intensive Logging

One of the first rules that every .Net programmer learns to memorize is "Never use the + operator to combine strings. Only use the StringBuilder class to do so". If you'll ask a random programmer why should we do so, the typical answer would be "Since strings in .Net are immutable, every time we use the + operator we actually cause a memory allocation, so in order to avoid we should use the StringBuilder".
This answer is a common misconception. Does StringBuilder is necessarily better than using the + operator, or perhaps other means for combining strings? No, not necessarily.

First of all, let's look at the + operator's behavior. Basically, what it does is to allocate a new string, and copy the characters from the two older strings into the new one. So theoretically, there's a serious negative impact on memory usage, since when combining multiple strings, we could find ourselves allocation redundant strings that will only be used to contain "parts" of targeted final string.
This is only partialy true, since the C# compiler will detect combinations of 5 strings and above using the + operator, and will convert it to use string.Concat.
For example, this code:

string str2 = 1.ToString() + 2.ToString() + 3.ToString() + 4.ToString() + 5.ToString();

Will be compiled into this code:

string[] CS$0$0000 = new string[] { 1, 2, 3, 4, 5 }; // edit: removed ToString
string text1 = string.Concat(CS$0$0000);

Regarding string.Concat, you shouldn't be worried about excessive memory allocations, since the first thing the method does is to calculate the combined string's total length, creates the string using a call to AllocateFastString, and then simply copies the content of the old strings to the new one using wstrcpy (efficient raw bytes copying using unsafe code).

One of the more common usages for combining strings, is in log formatting. In case your application does lots of intensive logging, then there's a good chance that it's spending some serious amounts of time in formatting and combining strings that will eventually arrive to the logging framework.
The situation might change a little depending on which logging framework you use, but I'll demonstrate using log4net. The case in log4net is that you may choose to use the simple logging methods without formatting (e.g, Deubg/Info/Warn..), or, you may also choose to use the ones that support formatting (e.g, DebugFormat/InfoFormat/WarnFormat..). While the first option receives a parameterless string, the other receives a formatted string, with additional paramters that should be placed inside the string, using string.Format. Now this could introduce a problem.
The issue here is that every time we would like to insert some parameter into our log, we are most likely to use log4net's Format method (that uses string.Format) instead of using string.Concat or perhaps string.Join. Just because this API seems more convenient, and approachable.
The following benchmark demonstrates the differences between the three alternatives:
 
while(true)
{
    Stopwatch sw = Stopwatch.StartNew();

    for (int i = 0; i < 500000; i++)
    {
        // 960ms
        string str1 = string.Format("{0}, {1}, {2}, {3}, {4}", 1, 2, 3, 4, 5);
        // 665ms
        string str2 = string.Concat(1, ", ", 2, ", ",3, ", ", 4, ", ", 5);
        // 566ms
        string str3 = string.Join(", ", new string[] { 1, 2, 3, 4, 5 });
    }

    Console.WriteLine(sw.ElapsedMilliseconds);
}

In this case, even though I've use a fairly simple log, the differences between the methods are definite. When comparing the performance of string.Format to string.Concat, we see that we achieved a 31% of improvement. And in comparison to string.Join, we achieve a substantial 62% of performance gain (string.Join's implementation doesn't use string.Concat under the covers, but works in a similar way by calling FastAllocateString and using the UnSafeCharBuffer class to construct the new string).
Don't think that this performance improvement comes with the cost of more intensive memory usage, since in this area, Concat and Join still achieve much better results than string.Format.
These results dosn't mean that StringBuilder will always produce lesser performance, since in cases in which you don't always know the number of concatenations you will need to perform, or perhaps in cases in which you could optimize memory allocations by modifying the capacity variable in the StringBuilder's constructor, you could actually achieve better results than these.

Of course that if you perform your logs only "once in a while", this performance difference is almost meaningless. And the fact that using FooFormat might be more comfortable to us might be just enough to justify its usage. But, in cases in which your application performs some intensice logging, these performance differences might be substantial for you.

4 comments:

  1. You can't test performance this way. Repeatedly building the same string over and over won't impact memory. Strings are immutable and so aren't copied in memory just referenced. Use Random.Next().ToString() to truly create a lot of unique strings and see how it comes out.

    ReplyDelete
    Replies
    1. @MrPhil,

      Creating truly unique strings will sure enough make the whole process slower, but still, the differences between Format/Concat/Join will remain.

      Thanks

      Delete
    2. Only if you are running this tests separatelly, like runing the console application 1 time for string.Concat, one other time to string.Format, etc, as in the case you do the 3 in the same run, the last one to run will always be the fastest, thanks to internalization.

      Delete
  2. Just for a record: I have used following code and measured the times below from str3 to str1. Still the results remain same:

    var random = new Random((int)DateTime.Now.Ticks);
    while (true)
    {
    Stopwatch sw = Stopwatch.StartNew();
    for (int i = 0; i < 500000; i++)
    {
    // 494ms
    string str1 = string.Format("{0}, {1}, {2}, {3}, {4}", random.Next().ToString(), random.Next().ToString(), random.Next().ToString(), random.Next().ToString(), random.Next().ToString());
    // 437ms
    string str2 = string.Concat(random.Next().ToString(), ", ", random.Next().ToString(), ", ", random.Next().ToString(), ", ", random.Next().ToString(), ", ", random.Next().ToString());
    // 415ms
    string str3 = string.Join(", ", new string[] { random.Next().ToString(), random.Next().ToString(), random.Next().ToString(), random.Next().ToString(), random.Next().ToString() });
    }
    Console.WriteLine(sw.ElapsedMilliseconds);
    }

    ReplyDelete