|
June 28, 1999
TCP Performance Tuning
When it comes to tuning application servers, most network administrators
take all the obvious steps: They pick an effective OS, use a strong server
platform and build a robust infrastructure. Unfortunately, few take the
time to fine-tune their TCP/IP stacks for optimal performance. This is
a crucial step in maximizing overall performance because TCP provides
all flow-control services used to move data over the network.
There are several TCP elements you can pick to tweak performance, but
the most important one is the TCP window size; it dictates how much data
a system can send at any given time. Every TCP packet (called a "segment" in
the IETF standards) has a header that contains the "window" field (among
other things). This field advertises the amount of available space in
a system's receive buffers at the time a segment was sent. By telling
the remote system how much space is available, the advertising host controls
how much data the remote host will transmit. Large windows allow a sending
system to transmit much more data than if the window were set at a smaller
value. In contrast, small windows constrict the amount of data being
exchanged, which can result in suboptimal utilization.
The trick to maximizing performance is to define the window field being
advertised by your network's end points appropriately, so that the amount
of data being exchanged is equal to the available bandwidth. If the TCP
window is too small for a given network, the end points won't be able
to saturate the network fully, and overall performance will be slower
than expected. However, making the TCP window too large can introduce
error-recovery problems, so setting this value too high can degrade performance
substantially.
Fortunately, all systems have a "default" size that they use for the
TCP window. You can adjust this value so all your applications use an
appropriate window size. Some applications also let you set window size
on a per-connection basis, using system-level APIs that the application
calls when it first binds to TCP. You may be better off using an application-specific
setting, depending on your usage patterns, but you'll be able to do so
only if you have access to the application source code.
Accurately determining the appropriate TCP window size for your network
can be complicated, requiring many measurements and calculations to find
the best setting. Note that these calculations are connection-specific,
and therefore will be of little use if you have a variety of connections
coming in from different networks and locations (values that are optimal
for one network may be suboptimal for others). For this reason, tweaking
your system's TCP window size is most beneficial when all your users
are on networks with similar characteristics, and all the end points
(including the application server and clients) can be configured the
same way. By the same token, if you have a popular Web server that gets
dial-up connections from remote locales and the T3 circuit next door,
you won't be able to fine-tune anything. In that situation, using the
system defaults is probably the best you can do without using add-on
traffic-shaping gear.
Bandwidth x Delay = Bits-In-Flight
When deciding upon the most efficient window size for your network,
the two most important considerations are round-trip latency and the
end-to-end bandwidth on the network. Both elements determine the amount
of data that can be on the network ("in-flight") at any given time.
Latency is the amount of time it takes for a bit of data to travel
from your system to another, including all of the in-between hops and
the processing time of the two systems. The longer it takes for data
to travel across the network, the more bits that can actually be in the
network at any given time. For example, a network that has a latency
of 60 milliseconds will allow six times as many bits to be in transit
simultaneously as a network with a latency of 10 milliseconds. However,
round-trip latency (which includes the amount of time for the data to
be sent and acknowledged) is a truly important measure, as we'll soon
see.
Likewise, end-to-end bandwidth affects the number of in-flight bits,
since it affects how many bits can be sent for every unit of latency.
If you have a 1-Mbps pipe, then you can send only 1 megabit for every
second of latency. Similarly, a 100-Mbps pipe will allow 100 Mb of data
to be sent in that same second.
Multiplying the round-trip latency by the end-to-end bandwidth provides
you with the maximum amount of data that any particular TCP virtual circuit
will allow. This figure essentially defines the optimal TCP window size.
Setting a value that is smaller will result in a constrained system;
the sender will be restricted to the value advertised by the remote system's
window field rather than the network's capacity. Once this window value
has been reached, the sender must stop transmitting data until an acknowledgment
arrives. But if the value is sized correctly, the sender will be able
to send data continuously because the acknowledgment will always appear
just as all of the data has been sent. This will allow for a continuous
exchange of data, and enables fast recovery times in the event that data
gets lost.
When you perform these window-size-related calculations, remember that
end-to-end bandwidth should be your primary focus; it will serve as the
throttling point for the entire link. If you have a 100-Mbps Ethernet
card, but the remote system is using 64-Kbps ISDN, 64 Kbps is the maximum
bandwidth for the end-to-end connection. In addition, you need to measure
latency in terms of round-trip delivery times. If you're sending data
over a wide area ATM network, but the remote system is transmitting acknowledgments
to you over a satellite link, your round-trip latency times will be the
latency of the ATM network plus the latency of the satellite link.
Measuring round-trip latency times is easy: Simply ping a remote system.
However, be sure that you ping the system several times during a normal
workday--network congestion will result in longer latency times at peak
traffic periods--and that you use the most common latency time (which
may not necessarily be the average). You will need to use a ping client
that generates custom-length messages, which sends ping probes that are
equal in size to the most common packets on your network. Using very
short ICMP (Internet Control Message Protocol) messages results in lower
latencies than TCP segments will encounter in reality. In contrast, using
ICMP messages that are too large may create excessively long latency
times or cause fragmented IP packets (this will only further corrupt
your test results). For Ethernet, you should generate 1,500-byte IP packets,
which will each have a 20-byte IP header and an eight-byte ICMP header.
This translates into ICMP probes with 1,472 bytes of data.
When you do the math, be sure to use the same unit of measure for everything--bits
per second or bytes per second when appropriate, or milliseconds instead
of seconds. Otherwise, your measurement data will be incorrect. Also,
keep in mind that kilobytes are 1,024 bytes, and that megabytes are 1,024
x 1,024 bytes.
For example, when you multiply the throughput of a 19.2-Kbps modem
(19.2 x 1,024 bits) by an average round-trip time of 60 milliseconds
(.06 seconds), the maximum amount of in-flight data on the network is
approximately 1,180 bits (rounded up from 1,179.648 bits). For additional
examples of this formula at work, see "Maximum Bits-in-Flight" below.
Note that this formula only tells us how many bits can be in flight at
any given time and does not tell us the optimal window size.
As you can see in the "Maximum Bits-in-Flight" table, the number of bits
in flight changes dramatically according to the network's bandwidth and
latency. Notice that the results for the 384-Kbps WAN link with 60 milliseconds
of delay are similar to the results for 10-Mbps Ethernet with two milliseconds
of latency. With Ethernet, the available bandwidth is substantially greater,
but the end-to-end delay is extremely low: It provides little opportunity
for the bits to build up in the network. Also, note how much build up
is allowed in the 100-Mbps LAN with 10 milliseconds of latency compared
with the same LAN if it has only two milliseconds of delay. The more
infrastructure devices you add to your network, the greater the latency
will become. This explains why measuring your own network is so important.
Maximum Segment Size Multiples
It's important to note the bits-in-flight value alone is not the optimal
window size. You must also consider the maximum segment size (MSS) in
use on the connection, and multiply that value by even numbers until
you exceed the value derived for the maximum bits-in-flight. This is
due to the way in which TCP's Delayed Acknowledgment algorithm refrains
from sending acknowledgments until two fully sized TCP segments have
been received. If the sender does not transmit enough data in even multiples,
the recipient will not return acknowledgments quickly, and the exchange
will be jerky.
For example, if the segment size is 1,460 bytes (common for Ethernet
and PPP connections), the window must be an even multiple of 1,460 bytes.
If the TCP window is an odd multiple of the MSS, eventually the recipient
will not receive two full-sized segments, thereby holding off the acknowledgment
until the delayed acknowledgment timer expires.
Actually, the default window value should be at least four times the
MSS because a smaller receive window would not foster a steady data flow.
If the receive window were only twice the MSS, the sender would transmit
two segments and then stop to wait for an acknowledgment while the segments
worked their way through the network. Once received, the recipient's
acknowledgment would also have to return to the sender before more data
could be sent, causing further delay. Having a default window size four
times the size of the MSS would at least allow the sender to transmit
four segments, with the last segment being sent just as the acknowledgment
for the first two segments was being returned.
This is why many TCP stacks use a simple formula of MSS x 4, with the
resulting value as the default window size for all connections. While
this methodology works for networks with very low latency levels, it
is not sufficient for slower connections, where multiples of six or eight
are more appropriate.
"Window Sizes per Bytes-in-Flight and MTU" shows some window sizes based on the
maximum bits-in-flight and using some common segment sizes (shown here
in bytes instead of bits). Notice that the 10-Mbps link with two milliseconds
of latency is the only LAN configuration that is able to use a default
setting of 4 x MSS. The rest of the configurations--particularly 100-Mbps
Ethernet over a multihop network, which requires an MSS multiple of 90--would
suffer serious performance limitations with this setting.
This is a particular problem when Fast Ethernet networks are used on
highly distributed LANs with numerous bridges and routers. If you have
a few hundred users using a single internal Web server over a multihop
internal network, you're probably not realizing the best possible performance.
Unfortunately, if you set the default window size for all your clients
for a very large value that reflects this particular usage scenario,
you will likely experience other problems; a window this size will be
too big for most other usage scenarios. Users that are hooking up to
the Internet will establish connections using these large window sizes,
even though their end-to-end bandwidth may drop to 256 Kbps or less (resulting
in much lower levels of in-flight data).
When a window is too big, TCP can have difficulties trying to recover
from lost data. For example, if a remote Web server sees a window size
of 131 KB advertised by your client, then it eventually will try to send
that exact amount of data--even if the bandwidth times delay for that
connection only enables 4 KB of in-flight data. As a result, the remaining
127 KB would be queued in a router somewhere between the Web server and
your client.
If any of that data is lost and retransmitted, the retransmitted data
will have to sit in a router queue behind the rest of the data from the
earlier transmission, while your system continues to receive (and reject)
all the earlier segments. Eventually, the client will simply abort the
connection because it will think that the connection cannot be recovered.
Alternatively, the continuous stream of duplicate acknowledgments sent
by your client back to the remote server may cause the server to abort
the connection as well. Having an appropriately sized window would prevent
these events from occurring.
This scenario illustrates why it's very important not to set a window
value that is too large. If some of your calculations indicate big windows
for some internal services but small-sized windows for other services
(local or remote), you may wish to move some of those servers closer
to the client, so you can average the maximum window sizes to more reasonable
values. In this regard, determining the most efficient window sizes for
your organization may indicate that the infrastructure needs to be modified.
Also, it is important to note that since the TCP window field is only
16 bits long, the maximum value for this field is 65535. If you have
a satellite network or some other connection that is slow but offers
a lot of capacity, the optimal window size will exceed this value, requiring
the use of the TCP Window Scale Option (described in detail in "Advanced
TCP Options."
Written by Eric
A. Hall.
Copyright © 1999 CMP Media, Inc. Used with permission. |