The overall idea can be expressed via this code:
AutoResetEvent m_event = new AutoResetEvent(false);
private void WorkCycles()
{
while(true)
{
// wait for a signal
m_event.WaitOne();
// do work..
}
}
In some scenarios, especially in those where producer threads pass data for the consumer in high frequencies, most of the calls to Set() are redundant. If for example every 100ms a producer passes new data (which results in a calling to Set), but it takes the consumer an entire second to process that data, then it means we've spent time calling Set 9 times more than we had to. Additionally, after the consumer finished its processing, it will have to call WaitOne again, only to realize the event was already set by a producer.
So, how expensive are those redundant calls to AutoResetEvent? The following benchmark should demonstrate:
while (true)
{
AutoResetEvent eventObj = new AutoResetEvent(false);
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 1000000; i++)
{
eventObj.Set();
eventObj.WaitOne();
}
Console.WriteLine(sw.ElapsedMilliseconds);
}
On my machine, the average result was 1035ms.
A whole second, this is the ridiculously large amount of time it took us the complete those few lines of code. We are wasting here an entire second just for using a synchronization primitive that in our real application, could be considered to be quite insignificant at first look. Seemingly, we shouldn't waste more than a few milliseconds running that code, but as you can see, things are a little different in practice.
EconomicAutoResetEvent
The solution for this problem is quite straightforward, since our goal is to reduce the number of redundant calls to Set/WaitOne to a minimum. How could you achieve that? One option is to use an alternative synchronization primitive, that for the sake of this post, will be named EconomicAutoResetEvent. The function of this new primitive, is to improve AutoResetEvent's performance by keeping track of the event's state, thus, avoiding transitions into the kernel.
I'll demonstrate a possible implementation for such primitive, and explain its behavior in more detail afterwords:
public class EconomicResetEvent
{
private volatile int m_eventState;
private AutoResetEvent m_event;
private Thread m_worker;
private const int EVENT_SET = 1;
private const int EVENT_NOT_SET = 2;
private const int EVENT_ON_WAIT = 3;
public EconomicResetEvent(bool initialState, Thread workerThread)
{
m_event = new AutoResetEvent(initialState);
m_eventState = initialState ? EVENT_SET : EVENT_NOT_SET;
m_worker = workerThread;
}
public void WaitForWork()
{
verifyCaller();
if (m_eventState == EVENT_SET && Interlocked.CompareExchange(
ref m_eventState, EVENT_NOT_SET, EVENT_SET) == EVENT_SET)
{
return;
}
if (m_eventState == EVENT_NOT_SET && Interlocked.CompareExchange(
ref m_eventState, EVENT_ON_WAIT, EVENT_NOT_SET) == EVENT_NOT_SET)
{
m_event.WaitOne();
}
}
public void SignalWork()
{
if (m_eventState == EVENT_NOT_SET && Interlocked.CompareExchange(
ref m_eventState, EVENT_SET, EVENT_NOT_SET) == EVENT_NOT_SET)
{
return;
}
if (m_eventState == EVENT_ON_WAIT && Interlocked.CompareExchange(
ref m_eventState, EVENT_NOT_SET, EVENT_ON_WAIT) == EVENT_ON_WAIT)
{
m_event.Set();
}
}
// [Conditional("DEBUG")]
private void verifyCaller()
{
if (m_worker != Thread.CurrentThread)
{
string errMsg = string.Format("Only the pre-defined Worker thread may
call WaitOne (Current: {0}, Worker: {1})", Thread.CurrentThread, m_worker);
throw new SynchronizationLockException(errMsg);
}
}
}
I assume that the question most of you are probably thinking about right now is "How fast it is?". Well, after running the same benchmark as before, the average result I get on my machine is only 9ms. An outstanding 11500% of improvement over using AutoResetEvent directly. And to think that all we just did is to avoid some calls to the internal kernel object.
The most important part in the implementation is the tracking of the state of the internal kernel object (signaled/not-signaled). The state itself is kept in m_eventState, this how we could know which calls to WaitForWork/SignalWork could avoid accessing m_event (AutoResetEvent).
Additionally, in order to make the store/load operations on m_eventState thread-safe, I've used a couple of CAS operations, that even though they could be cheaper then a "full blown lock", it's still considered to be quite an expensive instruction that usually it's best to avoid when they're not necessary. This is exactly why the double-tests on the if statements are performed. This technique is usually referred to as TATAS (i.e, Test-and-Test-and-Set).
As the comment at the class's deceleration mentions, some mild race conditions could occur when the consumer thread is entering/leaving WaitForWork while a producer calls SignalWork exactly at a very specific moment. This race condition won't cause any functional harm, at its worst-case, it would cause us to perform a redundant call to Set/WaitOne. Resolving this "issue" would mean that we prefer to enforce fairness. Such enforcement could have some negative effects on performance.
Since we're already discussing the subject, It would be appropriate to mention that the Parallel Extensions library had introduced a new synchronization primitive named ManualResetEventSlim, that also supplies us with extended functionality and better performance over the standard event objects. Beyond keeping the state of the internal kernel event, it also performs some spinning around the flag before actually blocking the thread, and also uses lazy initialization to create the kernel object (its created only when it's needed). The good guys at Microsoft were kind enough to publish a short but informative document that demonstrates the performance differences between the new and old synchronization primitives under a selected group of usage scenarios.