Optimizing Long-Haul TCP Throughput
TCP, or Transmission Control Protocol, is the primary layer-4 communication protocol for file
copies, database synchronization, and related tasks over the WAN.
Many of these TCP synchronization tasks, such as database replication or rsync over ssh, are very
dependent on single-stream TCP throughput to maintain the replication consistence and latency
required to meet business objectives such as RTO and RPO.
Non-Optimized Performance
Without specific performance optimizations for high-bandwidth long-haul networks, TCP
throughput is a function of network latency.
If we assume a standard TCP window size of 64 kilobytes and cross-country latency of 80
milliseconds, the maximum possible single-stream throughput with no packet loss is:
When TCP was designed in the 1970s, the fastest long-haul networks available had far less than
1 megabit of available bandwidth for all traffic, let alone 6, so this was not a design consideration.
Optimizing Performance – Window Scaling
IETF RFC 1323 defines TCP window scaling, as an optional feature to improve performance over
“Long, Fat Networks”. These are networks with high throughput (generally in excess of 10 megabits),
but with high latency (due to geographic distance and speed-of-light delays).
TCP window scaling, if enabled, automatically scales the window to optimize the throughput of the
connection. If we work backwards, and use the bandwidth delay product, let’s say we need 50 megabits
of throughput over a link with 100ms latency.
50 megabits * 100 milliseconds = 512 kilobyte TCP window
While it is possible to hand-optimize this value for specific situations, the best practice is to enable
automatic window scaling. On Linux, this is accomplished by:
sysctl –w net.ipv4.tcp_window_scaling=1
add net.ipv4.tcp_window_scaling=1 to /etc/sysctl.conf
Optimizing Performance – Selective Acknowledgement
By default, a single lost packet can result in retransmission of an entire window worth of packets.
On long-fat networks such as your typical WAN, this can be a substantial amount of data.
To resolve this problem, TCP Selective Acknowledgement, defined in RFC 2018, allows the receiving
system to acknowledge all packets received even if an intermediate packet has not been received.
This results in retransmission of only the lost packets, instead of an entire window worth of data.
To enable selective acknowledgements on a Linux host:
sysctl –w net.ipv4.tcp_sack=1
add net.ipv4.tcp_sack=1 to /etc/sysctl.conf
Before and After
This is a real-world example, within a specific business unit of a leading Silicon Valley based software
company. This involved replication traffic between two datacenters (one east coast, one west coast) over a
link with 90ms latency.
Throughput before optimization: 5 mb/sec
Throughput after optimization: 66 mb/sec
TCP window scaling can lead to increased host memory consumption due to the larger send and receive
buffers. This is generally not an issue with modern hardware with multiple gigabytes of RAM
TCP window scaling and selective acknowledgements may not be supported by some older firewall
implementations. This is generally not an issue with current products from top-tier network vendors such
as Cisco and Juniper.