Liran Chen's Blog: Networking

Even though the negative consequences of combining Nagle's Algorithm with Delayed ACKs when using TCP is well documented in the literature, it's still considered to be a common pitfall that many people tend to forget about. Since the negative effects of the combination will appear only under certain pre-conditions, it could be a while until one correlates the application's "weird" behavior with the usage of those two algorithms. So as the performance penalty grows to be more distinct, it would be easier the identify its source.

Delayed ACKs
Since TCP guarantees that every packet that is sent on the network will arrive to its designated remote host, it needs some kind of acknowledgment from that remote host that the packet managed to arrive. So, every time a host receives a packet, it needs to send back an ACK message as an acknowledgment.
The problem with using dedicated ACK messages is that we could find ourselves "overflowing" the network with messages that have nothing to do with our application (and contains almost no data at all). All of this just to say "Yes, I've received your packet" (if you combine the minimal size of a TCP and IP header [even without the Ethernet's header size] you already reach 40 bytes [and for IPv6, that number increases to 60 bytes]).
So, in order to reduce that overhead, the usage of "Delayed ACKs" was defined. The idea is that instead of sending a dedicated ACK message for each received packet, we make an assumption that our application (the receiver) is probably about to send "some" message (not necessarily a response message) back to the remote host that sent us the packet in the first place. This way, we could "piggyback" the application's message and add to its header the ACK for the previously received packet. Hench, we could substantially reduce that amount of redundant data circulating in our network.
Usually its customary to use a 200ms delay for sending ACKs (the exact behavior of the timeout depends on the protocol's implementation detail. Does a 200ms timer is created when the socket opens? or perhaps only at times that the application needs to send ACKs?).
In Windows, the default timeout is 200ms as well. But if you want to, you can change it by modifying the value of the TcpDelAckTicks in the registry, and set it somewhere between 0ms and 600ms (as an anecdote, the Host Requirements RFC forbids using ACK delays greater than 500ms).
Its worth noting that according to the RFC, there's no need to send back an ACK for every packet we receive, since a single ACK message could acknowledge multiple received packets. If for example machine A sends 5 different messages in 10ms differences to machine B, the minute the first message arrives to machine B, it opens a timer with a 200ms timeout so it will be alerted to send back an ACK message for the first message (assuming this is a one-way interface, so machine B won't send back any messages). Until that timer will elapse, the machine will receive the other 4 messages. So if we're already going to send an ACK for the first message, we might as well modify it so it will acknowledge the entire set of 5 messages.
Additionally, receiving packets aren't the only triggers for sending an ACK message. For example, if our receive window (the maximum number of bytes we can receive without sending back an ACK message), we will have to immediately send back an ACK message. Another trigger is the "Send ACK for Every Second Packet", that does exactly what its name implies. In Windows, the default value (2) can be changed, by modifying the TcpAckFrequency registry value.

Nagle's Algorithm
Even though it isn't directly related to the usage of Delayed ACKs, this algorithm attempts to resolve a similar problem that occurs on the sender's side. From the same reason we shouldn't overflow the network with "small, ACK messages", we shouldn't send "small, application messages" ourselves, due to the overhead involved in sending the entire header, while the message's body is very small.
According to the algorithm, the protocol may delay small "send" operations so it could buffer them together (and thus, a single packet could be send for more than a couple of small, applicative messages).
Deciding when to stop buffering the data isn't arbitrary, as it depends on the rate of receiving ACKs from the remote host. The idea is that until we haven't received an ACK message for our previously sent packet, there's no point in sending an additional packet. So, while we are waiting to receive an ACK from the remote host, we are buffering all of data we are about to send (of course, under the limits of the MSS). The minute we'll receive the ACK for our previous packet, all of the buffered data will be combined, and sent as a single packet.

Seemingly, both of the mentioned algorithms justifies their existence since they are attempting to resolve real problems. However, what will happen if we'll combine them both? On one hand, both of them will try to reduce the amount of tinygrams being sent on their side (either ACK messages, or applicative messages). But on the other hand, under certain conditions, they may cause significant amount of delay when attempting to send messages on the network. The most obvious example is when you've got a one-way interface that only one side only sends messages, and the other just receive them (without "answering"). In this case, even if the application continuously sends messages under a high rate, we are expected to notice latencies up to 200ms, since the time the message was "sent" by the application, until it was received by the recipient (since it was delayed by Nagle's algorithm). In other cases, also in two-way interfaces, there might be occasions in which the client/server stops sending message for a couple of moments. This also might cause latencies up to 200ms in its receive rate. In those kind of cases, identifying the source of the latencies might be more difficult since they tend to appear randomly, with latencies that are not always constant (less then 200ms). The exact behavior is determined by the characteristics of the application.

In order to illustrate this behavior via code, I've written the following program that measures the time it takes us to send two messages that seemingly should be sent instantly. One right after the other.

void Server()
{
    Socket server = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
    server.Bind(ServerEndPoint);
    server.Listen(1);
    Socket s = server.Accept();


    while (true)
    {
        // measure how long it takes to receive both messages
        Stopwatch stopwatch = Stopwatch.StartNew();


        s.Receive(new byte[8]);
        s.Receive(new byte[8]);


        // Output: around 200ms
        Console.WriteLine(stopwatch.ElapsedMilliseconds);
    }
}


void Client()
{
    Socket client = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
    client.Bind(ClientEndPoint);
    client.Connect(ServerEndPoint);


    while (true)
    {
        client.Send(new byte[8]); // will be sent immediately
        client.Send(new byte[8]); // delayed for 200ms


        // wait for an imaginery response
        client.Receive(new byte[0]);
    }
}

Due to the unpleasant effects that might cause due to the combination of the algorithms, the RFC states that implementations of the TCP protocol, that use Nagle's algorithm, must also support a way to disable it, so applicative messages won't be delayed by the protocol when they are being sent. This ability is exposed to us by using the TCP_NODELAY flag, and in .Net, its usage is wrapped with the Socket.NoDelay property.

Liran Chen's Blog

Friday, August 20, 2010

The Case of Delayed ACKs and Nagle's Algorithm