Actually, it comes down to a short C# program that deals with string comparisons, and demonstrates a series of "unexpected" behaviors. How "unexpected" are the results? Well, it depends on the reader's degree of experience at string comparisons.
Lately this sort of "brain-teasers" became quite popular on different forums/newsgroups, but it seems like no matter where those teasers are posted, it never comes with a full in-depth explanation for all of the "dark-magic" going on behind the scenes, that could potentially explain the demonstrated "weird" and "unexpected" behaviors. So basically, I'll attempt to make things a little bit clearer than it is now.
So without anymore introductions, please tell me, what would be the output for the following program?
static void Main()
{
char[] a = new char[0];
string b = new string(a);
string c = new string(new char[0]);
string d = new string(new char[0]);
Console.WriteLine(object.ReferenceEquals(a, b));
Console.WriteLine(object.ReferenceEquals(b, c));
Console.WriteLine(object.ReferenceEquals(c, d));
}
So, are you ready with your answers? Let's check the output:
False"Hmm... that was interesting".
True
True
Let's see what do we got here. On the one hand, a isn't equal to b. This makes sense since after all, they are difference instances of different types. But on the other hand, we can also see that b is equal to c and d as well. This could appear to be a little confusing since as we know, strings are supposed to be immutable, so according to the code, we are "supposed" to create three different string instances (mind you that we aren't dealing with user-hardcoded, interned strings here).
It appears that even though we've create three different arrays, we've only allocated a single string (that one that b, c and d references).
Usually, when a char[] (or, some other kind of string-representing structure) is passed to a string constructor, a new string is allocated to represent the array's characters. However, as our brief program demonstrates, this isn't always the case.
To understand why this happens, we'll have to dig into the implementation of the string class, and more specifically, to it's internal implementation in SString (the following code samples are taken from SSCLI's implementation).
Looking at SString's internal constructor, we could notice that the char[] argument that was received is checked whether it's set to NULL or it's length is equal to zero. If it does, the implementation completely "forgets" about that argument, and instead uses the new SString instance to reference an interned version of an empty string (so basically, the redundant allocation of the new, empty string, is optimized away).
We could see this behavior in SString's Set and Clear functions (the Set function is invoked from the constructor).
void SString::Set(const WCHAR *string)
{
if (string == NULL || *string == 0)
Clear();
else
{
Resize((COUNT_T) wcslen(string), REPRESENTATION_UNICODE);
wcscpy_s(GetRawUnicode(), GetBufferSizeInCharIncludeNullChar(), string);
}
RETURN;
}
void SString::Clear()
{
SetRepresentation(REPRESENTATION_EMPTY);
if (IsImmutable())
{
// Use shared empty string rather than allocating a new buffer
SBuffer::SetImmutable(s_EmptyBuffer, sizeof(s_EmptyBuffer));
}
else
{
SBuffer::TweakSize(sizeof(WCHAR));
GetRawUnicode()[0] = 0;
}
RETURN;
}
I guess this technique is called as "String Interning". http://en.wikipedia.org/wiki/String_interning.
ReplyDelete@Magesh
ReplyDeleteCorrect, the CLR obviously rather to use a pre-allocated interned string instead of creatinga new, identical one.
Actually, instead of digging out the implementation, you could have just looked in the documentation of the String(char[]) ctor: http://msdn.microsoft.com/en-us/library/ttyxaek9.aspx
ReplyDeleteIt says right there that »If value is null or contains no element, an Empty instance is initialized«. So it is *not* an implementation detail but rather a specified behavior.
@Joey,
ReplyDeleteIndeed, though "digging out the implementation" is so much more fun.
Great article!. And +1 to comment - "@Joey,
ReplyDeleteIndeed, though "digging out the implementation" is so much more fun."