DoS and Worm Mitigation Strategy Through Scavenger Class QoS
Worms are nothing new; they have been around in some form since the beginning of the Internet and steadily have been increasing in complexity, as shown in Figure 2-6.
Figure 2-6 Business Security Threat Evolution
Particularly since 2002, there has been an exponential increase not only in the frequency of DoS and worm attacks, but also in their relative sophistication and scope of damage. For example, more than 994 new Win32 viruses and worms were documented in the first half of 2003, more than double the 445 documented in the first half of 2002. Some of these more recent worms are shown in Figure 2-7.
Figure 2-7 Recent Internet Worms
DoS or worm attacks can be categorized into two main classes:
Spoofing attacksThe attacker pretends to provide a legitimate service but provides false information (if any) to the requester.
Flooding attacksThe attacker exponentially generates and propagates traffic until service resources (servers or network infrastructure) are overwhelmed.
Spoofing attacks best are addressed by authentication and encryption technologies; flooding attacks, on the other hand, can be mitigated using QoS technologies.
The majority of flooding attacks target PCs and servers, which, when infected, target other PCs and servers, thus multiplying traffic flows. Network devices themselves are not usually the direct targets of attacks. But the rapidly multiplying volumes of traffic flows eventually drown the CPU and hardware resources of routers and switches in their paths, causing denial of service to legitimate traffic flows. The end result is that network devices become indirect victims of the attack. This is illustrated in Figure 2-8.
A reactive approach to mitigating such attacks is to reverse-engineer the worm and set up intrusion-detection mechanisms or ACLs to limit its propagation. However, the increased sophistication and complexity of today's worms make them harder to identify from legitimate traffic flows. This exacerbates the finite time lag between when a worm begins to propagate and when the following occurs:
Sufficient analysis has been performed to understand how the worm operates and what its network characteristics are.
An appropriate patch, plug, or ACL is disseminated to network devices that might be in the path of the worm. This task might be hampered by the attack itself because network devices might become unreachable for administration during the attacks.
Figure 2-8 Impact of an Internet WormDirect and Collateral Damage
These time lags might not seem long in absolute terms, such as in minutes, but the relative window of opportunity for damage is huge. For example, in 2003, the number of hosts infected with the Slammer worm (a Sapphire worm variant) doubled every 8.5 seconds on average, infecting more than 75,000 hosts in just 11 minutes and performing scans of 55 million more hosts within the same time period.
NOTE
Interestingly, a 2002 CSI/FBI report stated that the majority of network attacks occur from within an organization, typically by disgruntled employees.
A proactive approach to mitigating DoS and worm flooding attacks within enterprise networks is to respond immediately to out-of-profile network behavior indicative of a DoS or worm attack via campus Access-Layer policers. Such policers can meter traffic rates received from endpoint devices and, when these exceed specified watermarks (at which point they no longer are considered normal flows), can mark down excess traffic to the Scavenger class marking (DSCP CS1).
In this respect, the policers would be fairly "dumb." They would not be matching specific network characteristics of specific types of attacks, but they simply would be metering traffic volumes and responding to abnormally high volumes as close to the source as possible. The simplicity of this approach negates the need for the policers to be programmed with knowledge of the specific details of how the attack is being generated or propagated. It is precisely this "dumbness" of such Access-Layer policers that allows them to maintain relevancy as worms mutate and become more complex: The policers don't care how the traffic was generated or what it looks like; all they care about is how much traffic is being put onto the wire. Therefore, they continue to police even advanced worms that continually change the tactics of how traffic is being generated.
For example, in most enterprises, it is quite abnormal (within a 95 percent statistical confidence interval) for PCs to generate sustained traffic in excess of 5 percent of their link's capacity. In the case of a Fast Ethernet switch port, this means that it would be unusual in most organizations for an end user's PC to generate more than 5 Mbps of uplink traffic on a sustained basis.
NOTE
It is important to recognize that this value (? 5 percent) for normal access-edge utilization by endpoints is just an example value. This value would likely vary from industry vertical to vertical, and from enterprise to enterprise.
It is important to recognize that what is being proposed is not to police all traffic to 5 Mbps and automatically drop the excess. If that were the case, there would not be much reason for deploying Fast Ethernet or Gigabit Ethernet switch ports to endpoint devices because even 10BASE-T Ethernet switch ports would have more uplink capacity than a 5 Mbps policer-enforced limit. Furthermore, such an approach supremely would penalize legitimate traffic that exceeded 5 Mbps on a Fast Ethernet switch port.
A less draconian approach is to couple Access-Layer policers with hardware and software (campus, WAN, and VPN) queuing policies, with both sets of policies provisioning for a less-than best-effort Scavenger class.
This would work by having Access-Layer policers mark down out-of-profile traffic to DSCP CS1 (Scavenger) and then have all congestion-management policies (whether in Catalyst hardware or in Cisco IOS Software) provision a less-than best-effort service for any traffic marked to DSCP CS1.
Let's examine how this might work, for both legitimate traffic exceeding the Access-Layer policer's watermark and illegitimate excess traffic (the result of a DoS or worm attack).
In the former case, imagine that the PC generates more than 5 Mbps of traffic, perhaps because of a large file transfer or backup. Because there is generally abundant capacity within the campus to carry the traffic, congestion (under normal operating conditions) is rarely, if ever, experienced. Typically, the uplinks to the distribution and core layers of the campus network are Gigabit Ethernet, which requires 1000 Mbps of traffic from the Access-Layer switch to create congestion. If the traffic was destined to the far side of a WAN or VPN link (which are rarely more than 5 Mbps in speed), dropping would occur even without the Access-Layer policer, simply because of the campus/WAN speed mismatch and resulting bottleneck. TCP's sliding-windows mechanism eventually would find an optimal speed (less than 5 Mbps) for the file transfer.
To make a long story short, Access-Layer policers that mark down out-of-profile traffic to Scavenger (CS1) would not affect legitimate traffic, aside from the obvious re-marking. No reordering or dropping would occur on such flows as a result of these policers (that would not have occurred anyway).
In the latter case, the effect of Access-Layer policers on traffic caused by DoS or worm attacks is quite different. As hosts become infected and traffic volumes multiply, congestion might be experienced even within the campus. If just 11 end-user PCs on a single switch begin spawning worm flows to their maximum Fast Ethernet link capacities, the GE uplink from the Access-Layer switch to the Distribution-Layer switch will congest and queuing or reordering will engage. At such a point, VoIP and critical data applications, and even Best-Effort applications, would gain priority over worm-generated traffic (and Scavenger traffic would be dropped the most aggressively); network devices would remain accessible for administration of patches, plugs, and ACLs required to fully neutralize the specific attack.
WAN links also would be protected: VoIP, critical data, and even best-effort flows would continue to receive priority over any traffic marked down to Scavenger/CS1. This is a huge advantage because generally WAN links are the first to be overwhelmed by DoS and worm attacks. The bottom line is that Access-Layer policers significantly mitigate network traffic generated by DoS or worm attacks.
It is important to recognize the distinction between mitigating an attack and preventing it entirely: The strategy being presented does not guarantee that no denial of service or worm attacks ever will happen, but it can reduce the risk and impact that such attacks could have on the network infrastructure.