knowplace.org

Network Protocols Discussion / Traffic Shaping Strategies

This is only meant as a simple overview that is specific to traffic shaping. Please see the resources section for more detailed reading. However, I highly recommend that you read this section.

Generally speaking, your egress (outgoing) traffic is more important than your ingress (incoming) traffic. This is true for a couple of reasons. First, your network bottleneck on your ingress traffic typically sits at your ISP (capped bandwidth). Second, while it's possible to have more or less completely control of the traffic you send out, the reverse is not true. While protocols like TCP may have flow control features, it's not always possible to utilize that to our full advantage.

A typical network today is running at least around 100mbits/s (bits) while a T1 connection is 1.5mbits/s. This means that when traffic leaves your network, it's going from a lot to a little - putting the bottleneck at your egress packet queue. Whereas the reverse, the ingress queue, is likely never used because of the bandwidth disparity. Besides, when policing ingress policies, perfectly good packets (already received) gets dropped and have to be retransmitted. Because of this, ingress queues are still rather controversial, and will not be covered in this presentation.

What is a packet queue and why should I care?

A packet queue is basically a buffer. When the amount of packets leaving exceeds the gateway router's ability to send them, it typically queues up packets until it's possible. If the packet queue overflows, then the packets are silently dropped. If a timeout occurs because a packet sits in the queue for too long, the packet gets resent making the queue even more likely to overflow (necessitating even more resends and their associated timeouts).

All network devices utilize a packet queue. Linux by default has a packet queue length of 100, meaning that it can buffer up to 100 packets before it starts to silently drop packets. Most ISPs also configure packet queues to be significantly larger, in order to avoid resends. DSL and cable modems have their own packet queues as well.

What this means that a packet might take several seconds to make it through all of the egress queues before it actually reaches the first hop of Internet (never mind the time it takes for the response to return). The sum result of this is very high latency when the network link becomes congested.

So what? How is a bit of high latency going to hurt the network? If the latency becomes sufficiently high, timeouts may occur and necessitate packet resends making a bad situation worse. Packets already sent and received could appear to have timed-out (not acknowledged in time due to the latency). The resends further deepens the packet queues, rinse, lather, repeat.


Interactive vs. bulk traffic

Low-latency is not as important in some types of network traffic as others. Think about playing your favorite online game, browsing a webpage, typing in remote consoles (SSH), working on a remote desktop (X, VNC) or downloading your emails versus downloading large FTP (or HTTP) files, SMTP sends & receives, etc.

Generally speaking, interactive traffic requires low-latency because there's user interaction. Bulk traffic either doesn't require user interaction, or requires more bandwidth than low-latency (large downloads). Note that this is not a discussion about bandwidth. Lets take FTP for example. The command channel is on TCP port 21. This is where all the commands are issued, so it would be nice to have low-latency. However, the data channel (TCP, but port number varies) where the actual transfer happens doesn't require any user interaction and therefore doesn't need to be low-latency (but might require high bandwidth).

How about typing in a remote console? A half second or so might not be too long to wait for a webpage, but is rather hard to bear when your console display is more than several characters behind.

So in this example, you might want to prioritize SSH (TCP port 22) above HTTP (TCP port 80). But what about SSH vs. SCP?

SSH packets are typically very small, a key press or two can easily fit inside a single packet and be sent off. While SCP (also TCP 22) uses the same port, it probably belongs more in the bulk traffic category. While it may not always be perfect, you might be able to get away with assuming that SCP will on average require larger packets. Therefore, you might profile your network traffic and get an idea for the average upper limit of SSH packet sizes, and set different priorities based on packet lengths, even though both services utilize the same protocol and port number.

There probably isn't a single traffic shaping scheme that would be correct for everyone, but it's important to differentiate between services that require low-latency, and ones that don't.

QoS Guarantees

Other considerations such as minimum guaranteed bandwidth for specific network services, or a maximum bandwidth cap for certain departments or clients might also be important when it comes to traffic shaping policies.

Suppose the accounting department runs Citrix clients in order to update financial records at the headquarters, but your network is completely swamped by the sales people downloading the latest Apple iPod commercials on Kazaa. Traffic shaping can set aside a minimum guaranteed bandwidth and latency for the accounting department, but allow sales to utilize the rest of the bandwidth when the pipe lays idle.

Maybe you're sharing bandwidth with your little brother, but you want to implement a maximum bandwidth cap so that he can never exceed dial-up speeds and bother you with his downloads of Britany's latest singles on MP3s.

TCP

Commonly referred to as TCP/IP, but as the latter suggests, it really runs on top of IP. The notable feature of TCP is that it is connection oriented, and therefore has a lot of more (though necessary) overhead that's not strictly speaking, part of the "data." I won't go over the details here, but you can refer to the IP overview or check the resources section. So things like SYN, SYN/ACK, ACK/FIN, and RST packets that are vital to each and every connection ought to receive special attention. Another notable feature of TCP is that every packet that is sent has to be acknowledged by the other side, otherwise it has to be resent.

Especially when a network is choking under heavy congestion, these "overhead" packets play an increasingly crucial role. In the case of acknowledgement packets that confirm packet receipts (note that this is different than simply having a packet with the ACK flag set - which every TCP packet that belongs to an existing connection does), if they are stuck in your packet queue for longer than the retry timeout, the packet that you've already received or sent will be retransmitted, resulting in more congestion.

In the case of SYN, SYN/ACK, ACK/FIN, and RST packets, they help establish new connections or tear down existing connections. Letting these packets have priority in times of heavy congestion means that you're able to accept more connections (yes, you actually do want this) and finishing your existing connections faster. This allows your network resources to be available even during congestion.

Once again, it is useful to note that SYN, SYN/ACK, ACK/FIN, RST, and the acknowledgement packets shouldn't contain any data payload, and are single packets that can be quickly dequeued. So regardless of your queue length, these packets should never be allowed to queue for too long or dropped because the queue is overflowing.

UDP

Useful protocol. Low overhead. While this approach has certain advantages, the major disadvantage is that it lacks rate or flow control features, unlike TCP. Even though this might be counterintuitive, the strategy is more or less to let UDP packets get through as fast as possible without affecting the other crucial or prioritized services. This is true because you're also trying to avoid resends (application dependent). Overflow, resend, rinse, lather, repeat. Note that UDP high (33,000+) port packets are also used by the UNIX based traceroute tool.

ICMP

ICMP is mostly used only as a diagnostic protocol, and sometimes co-opted maliciously in denial of service (DoS) attacks. Note that while ICMP may be an important diagnostic tool, you don't actually want it to take precedence over more important network services, whatever they may be. If possible, either rate limit ICMP traffic or only give it enough priority so that they usually get through under normal circumstances, but if there's a flood of ICMP traffic, the traffic gets dropped. Especially when congestion is heavy, every packet counts. Unless (an inaccurately) low ping time is really important to your network setup, drop the ICMP packets in favor of more important traffic. Think about this, when your network link is suffering from high congestion, would you rather your diagnostic tool to reflect the actual conditions of your connection or some artificially low latency (at the expense of real traffic)?

Back to the DoS attacks, note that while normal ICMP packets do often carry a payload, it's not typically of any significant size. DoS attacks often take advantage of ICMP packets with large payload because they are so infrequently blocked by various firewalls.

 
Shane Tzen © 2010