QoS Tool Chest: Understanding the Mechanisms
The preceding section discussed the need to apply a QoS policy in the network, where it should be applied, and the models under which you can operate. To help you completely understand the mechanics of QoS, the next section explores the mechanisms available to perform the tasks at hand.
Classes of Service
To provide a mechanism for prioritizing the different types of IP traffic that exist on the network, it is important to adopt a CoS model that is flexible and simple to maintain and that meets the behavioral needs of different applications. Applications can then be cat-egorized into the appropriate classes according to their delivery requirements. Based on this strategy, the following QoS classes of service are defined to address the different forwarding requirements of all traffic while maintaining a small number of classes. Figure 5-2 shows an approach an enterprise could follow, leading toward the mature 11-class IP precedence model. A four- or five-class IP precedence model should be the starting baseline an enterprise should consider. This model allows a migration path as more granular classes are added over time.
Figure 5-2 How Many Classes of Service Do You Need?
Understanding the deployment needs is a multistep process:
Step 1 |
Strategically define the business objectives to be achieved via QoS. |
Step 2 |
Analyze the service-level requirements of the various traffic classes to be provisioned for. |
Step 3 |
Design and test QoS policies before production network rollout. |
Step 4 |
Roll out the tested QoS designs to the production network. |
Step 5 |
Monitor service levels to ensure that the QoS objectives are being met. |
These steps may need to be repeated as business conditions change and evolve. These steps are derived from the QoS baseline model developed by Tim Szigeti.
The classifications are split into two different areas: Layer 3 classification and Layer 2 CoS. The Layer 3 classifications cover the following:
- IP Precedence (or type of service [ToS]) markings
- Differentiated Services Code Point (DSCP), which provides for markings based on value ranges, where each DSCP specifies a particular per-hop behavior that is applied to a packet
- Per-hop behavior forwarding treatment applied at a differentiated services-compliant node to a behavior aggregate
IP ToS
In general, when referring to the ToS values, there are two methods for specifying QoS information within an IP packet. The first is to use the three most-significant bits (MSBs) of the ToS field in the IP header. These are called the IP Precedence (IPP) values. They allow for up to eight user-definable classes of service. The second method, referring to the 6 MSBs of the ToS field, is an extension of the IPP model, which allows for up to 64 DSCP values.
Based on these classifications, real-time voice bearer traffic is marked as Class 5 with guaranteed expedited delivery using an expedited queuing mechanism for voice traffic to ensure that voice quality is not adversely affected under heavy link utilization. This mechanism alone cannot guarantee protection for voice, so it needs to be used in combin-ation with good capacity planning and call admission control (CAC). You will explore the capabilities available in the upcoming sections.
Traffic marked as Class 2, 3, 4, and 6 is provided guaranteed minimum bandwidth and is serviced via class-based weighted fair queuing (CBWFQ). The minimum bandwidth used should be calculated to account for peak usage for all traffic within each class. Should these classes require bandwidth usage that exceeds the configured minimum amount, this would be allowed, provided that other classes are not fully using their minimum bandwidth allocation.
All traffic marked as Class 0 is guaranteed the remainder of the bandwidth. Class 1 (batch/scavenger) traffic is drop-insensitive, or batch transfers are given a lower-priority treatment than all other classes. Typically, Class 1 should be assigned the smallest possible amount of bandwidth. Therefore, in the event of link congestion, Class 1's bandwidth usage is immediately contained to protect other higher-priority data.
Although the standard direction is to move toward the full adoption of the DSCP model, older implementations used to define the classes of service and perform traffic matching and queuing based on the IPP values. Because the IPP value is based on the first 3 MSBs of the DSCP field, it is possible for each IPP value to cover the full range of DSCP drop precedence values (bits 3 to 6) for each class selector. It should be noted that such mech-anisms are now better placed to be moved to DSCP support to allow for the additional benefit of expedited forwarding/assured forwarding (EF/AF) class granularity and scaling of classes supported over time.
Ensure that the correct traffic mapping is carried out. Failing to do so may lead to classifi-cation of voice traffic to some value other than the DSCP value of 46 (EF). This may come about as a result of classification errors within the network or at the LAN edge due to incorrect CoS-to-DSCP or DSCP-to-CoS mappings, which can lead to service impact.
Hardware Queuing
QoS-enabled Ethernet switches provide a Layer 2 (L2) queuing mechanism, which allows for ingress and egress queuing. Ingress frames arriving at the L2 switch require buffering before scheduling on the egress port. Therefore, depending on the number of buffers available to each port, it is possible for ingress frames to be dropped instantaneously. If strict priority queues are not used, real-time voice traffic is not guaranteed for expedited delivery. Using the priority queues, if present, for both ingress and egress traffic provides a low-latency path through the L2 device for delay-sensitive traffic. All current Cisco platforms (2950, 2970, 3550, 3750, 4500, and 6500) support the use of internal DSCP to determine QoS treatment. These are derived from either the packets' DSCP trust classification, the trusted CoS markings, or an explicit configuration policy.
Although no standard number of queues is provided, the port capabilities can be determined via Cisco IOS or CatOS. The information is presented separately for both transmit and receive interfaces and is represented in 1PxQyT format. 1P refers to the strict priority queue available, xQ refers to the number of input or output queues available, and yT is the number of drop or Weighted Random Early Detection (WRED) thresholds that can be configured. It is recommended that all future hardware support a minimum of 1P1Q queuing for both ingress and egress.
Software Queuing
Prioritization and treatment of traffic is based on defined CoSs. Where potential network congestion may occur ensures that each class receives the appropriate forwarding priority, as well as minimum reserved bandwidth. If traffic for each class exceeds the allocated bandwidth requirements, depending on the WAN technologies, one of the following actions needs to be taken:
- Drop the excess traffic
- Forward excess traffic without changes to the original QoS information
- Forward the excess traffic with the ToS bits reset to a lower-priority CoS
QoS Mechanisms Defined
The QoS architecture introduces multiple components that form the basis of the building blocks of an end-to-end solution. First, you must understand the various capabilities that are available, as shown in Figure 5-3.
Figure 5-3 Scheduling Tools: Queuing Algorithms
These capabilities can be broken into several categories:
- Classification and marking—Packet classification features allow
traffic to be partitioned into multiple priority levels or CoSs.
Packets can be classified based on the incoming interface, source or destin-ation addresses, IP protocol type and port, application type (network-based application recognition [NBAR]), IPP or DSCP value, 802.1p priority, MPLS EXP field, and other criteria. Marking is the QoS feature component that "colors" a packet (frame) so that it can be identified and distinguished from other packets (frames) in QoS treatment. Policies can then be asso-ciated with these classes to perform traffic shaping, rate-limiting/policing, priority transmission, and other operations to achieve the desired end-to-end QoS for the particular application or class. Figure 5-2 showed an overview of classification for CoS, ToS, and DSCP.
- Congestion management—Congestion-management features control
congestion after it occurs.
Queuing algorithms are used to sort the traffic and then determine some method of prioritizing it onto an output link. Congestion-management techniques include Weighted Fair Queuing (WFQ), CBWFQ, and low-latency queuing (LLQ):
- WFQ is a flow-based queuing algorithm that does two things simultaneously: It schedules interactive traffic to the front of the queue to reduce response time, and it fairly shares the remaining bandwidth between high-bandwidth flows.
- CBWFQ guarantees bandwidth to data applications.
- LLQ is used for the highest-priority traffic, which is especially suited for voice over IP (VoIP).
- Congestion avoidance—Congestion-avoidance techniques
monitor network traffic loads in an effort to anticipate and avoid congestion at
common network and internetwork bottlenecks before it becomes a problem.
As shown in Figure 5-4, the WRED algorithm avoids congestion and controls latency at a coarse level by establishing control over buffer depths on both low- and high-speed data links. WRED is primarily designed to work with TCP applications. When WRED is used and the TCP source detects the dropped packet, the source slows its transmission. WRED can selectively discard lower-priority traffic when the interface begins to get congested.
Figure 5-4 Congestion Avoidance: DSCP-Based WRED
- Traffic conditioning—Traffic entering a network can
be conditioned (operated on for QoS purposes) by using a policer or shaper.
Traffic shaping involves smoothing traffic to a specified rate through the use of buffers. A policer, on the other hand, does not smooth or buffer traffic. It simply re-marks (IPP/DSCP), transmits, or drops the packets, depending on the configured policy. Legacy tools such as committed access rate (CAR) let network operators define bandwidth limits and specify actions to perform when traffic conforms to, exceeds, or completely violates the rate limits. Generic traffic shaping (GTS) provides a mechanism to control traffic by buffering it and transmitting at a specified rate. Frame Relay traffic shaping (FRTS) provides mechanisms for shaping traffic based on Frame Relay service parameters such as the committed information rate (CIR) and the backward explicit congestion notification (BECN) provided by the Frame Relay switch.
Policers and shapers are the oldest forms of QoS mechanisms. These tools have the same objectives—to identify and respond to traffic violations. Policers and shapers usually identify traffic violations in an identical manner; however, their main difference is the manner in which they respond to violations:
- A policer typically drops traffic.
- A shaper typically delays excess traffic using a buffer to hold packets and shape the flow when the source's data rate is higher than expected.
The principal drawback of strict traffic policing is that TCP retransmits dropped packets and throttles flows up and down until all the data is sent (or the connection times out). Such TCP ramping behavior results in inefficient use of bandwidth, both overutilizing and underutilizing the WAN links.
Since shaping (usually) delays packets rather than dropping them, it smoothes flows and allows for more efficient use of expensive WAN bandwidth. Therefore, shaping is more suitable in the WAN than policing. Figure 5-5 demonstrates the need for policers and shapers.
Figure 5-5 Provisioning Tools: Policers and Shapers
- This is especially the case with nonbroadcast multiaccess (NBMA) WAN media, where physical access speed can vary between two endpoints, such as Frame Relay and ATM.
- Link efficiency mechanisms—Two link efficiency
mechanisms work in conjunction with other QoS features to maximize bandwidth
utilization.
Newer multimedia application traffic, such as packetized audio and video, is in Real-Time Transport Protocol (RTP) packets and Cisco IOS Software. This saves on link bandwidth by compressing the RTP header (Compressed Real-Time Protocol [cRTP]), as shown in Figure 5-6.
VoIP packets are relatively small, often with a G.729 voice packet and approximately 20-byte payloads. However, the IP plus User Datagram Protocol (UDP) plus RTP headers equal 40 bytes (uncompressed), which could therefore account for nearly two-thirds of the entire packet. A solution is to use the Van Jacobsen algorithm to compress the headers for VoIP, reducing the header size from 40 bytes to less than 5 bytes.
Figure 5-6 IP RTP Header Compression
A data frame can be sent to the physical wire at only the interface's serialization rate. This serialization rate is the frame's size divided by the interface's clocking speed. For example, a 1500-byte frame takes 214 ms to serialize on a 56-kbps circuit.
If a delay-sensitive voice packet is behind a large data packet in the egress interface queue, the end-to-end delay budget of 150 ms could be exceeded. Refer to ITU G.114, which defines this measure as "most users being satisfied." Additionally, even a relatively small frame can adversely affect overall voice quality by simply increasing the jitter to a value greater than the size of the adaptive jitter buffer at the receiver.
Link Fragmentation and Interleaving (LFI) tools fragment large data frames into regular-sized pieces and interleave voice frames into the flow so that the end-to-end delay can be accurately predicted. This places bounds on jitter by preventing voice traffic from being delayed behind large data frames. To decrease latency and jitter for interactive traffic, LFI breaks up large datagrams and interleaves delay-sensitive interactive traffic with the resulting smaller packets, as shown in Figure 5-7.
Figure 5-7 LFI
A maximum of 10-ms serialization delay is the recommended target to use for setting fragmentation size. This allows for headroom on a per-hop basis, because it allows adequate time for end-to-end latency required by voice. Two tools are available for LFI: multilink PPP (MLP) LFI (for point-to-point links) and FRF.12 (for Frame Relay links).
While reviewing these capabilities, it is important to keep in mind that the LLQ is in effect a first-in, first-out (FIFO) queue. The amount of bandwidth reserved for the LLQ is variable, yet if the LLQ is overprovisioned, the overall effect is a dampening of QoS functionality. This is because the scheduling algorithm that decides how packets exit the device is predominantly FIFO. Overprovisioning the LLQ defeats the purpose of enabling QoS. For this reason, it is recommended that you not provision more than 33 percent of the link's capacity as LLQ.
The 33 percent limit for all LLQs is a design guideline recommendation only. There may be cases where specific business needs cannot be met while holding to this recommen-dation. In such cases, the enterprise must provision queuing according to its specific requirements and constraints.
To avoid bandwidth starvation of background applications such as network management services and best-effort traffic types, it is recommended that you not provision total band-width guarantees to exceed 75 percent of the link's capacity. This is a subjective area because of the size of the link employed for the connection. It should be applied as a general rule. When a larger link is employed, the 75 percent rule is less relevant—links such as E3/DS3 and above, for example. Figure 5-8 provides an overview of the recommendations for WAN egress design scheduling.
Figure 5-8 WAN Scheduling Design Principles
Figure 5-9 blends this all together by demonstrating the relevant tools and how they apply in the context of the network. You will explore how and where these should be applied in the case study later in this chapter.
Figure 5-9 QoS Tools Mapping
From the tools mapping, you can see that you have various options to use in terms of campus access, distribution, and WAN aggregation, with capabilities that extend through the service provider MPLS VPN and into the branch network.
Pulling It Together: Build the Trust
As discussed earlier, there are many places in the network in which the application of QoS, either marking or classification, occurs. In this section, you will pull this together in some configurations, starting with the trust boundary principle of marking nearest to the end-points on the network.
To apply this, you need to understand the edge marking mechanisms that can be applied. In this case, you will use the notion of trust boundaries. The concept of trust is an important and integral one to implementing QoS. As soon as the end devices have a set CoS or ToS, the switch can either trust them or not. If the device at the edge (in our case, a switch) trusts the settings, it does not need to do any reclassification. If it does not trust the settings, it must perform reclassification for the appropriate QoS.
The notion of trusting or not trusting forms the basis of the trust boundary. Ideally, classification should be done as close to the source as possible. If the end device can perform this function, the trust boundary for the network is at the access layer. This depends on the capabilities of the switch in the access layer. If the switch can reclassify the packets, the trust boundary remains in the access layer. If the switch cannot perform this function, the task falls to other devices in the network going toward the backbone.
In this case, the rule of thumb is to perform reclassification at the distribution layer. This means that the trust boundary has shifted to the distribution layer. It is more than likely that there is a high-end switch in the distribution layer with features to support this function. If possible, try to avoid performing this function in the core of the network.
Frames and packets can be marked as important by using Layer 2 CoS settings in the User Priority bits of the 802.1p portion of the 802.1Q header or the IPP/DSCP bits in the ToS byte of the IPv4 header.
Figure 5-10 gives an overview of the trust boundary states that can be applicable when establishing this capability:
- A device is trusted if it correctly classifies packets.
- For scalability, classification should be done as close to the edge as possible.
- The outermost trusted devices represent the trust boundary.
- (1) and (2) are optimal; (3) is acceptable (if the access switch cannot perform classification).
Figure 5-10 Establishing the Trust Boundary
For example, suppose you have a LAN edge that is configured to have the voice traffic sit in an auxiliary virtual LAN (VLAN) and data traffic that transports standard desktop PC connectivity in a standard data VLAN. In this case, you can establish a policy of trust on the auxiliary VLAN where the voice endpoints are connected and there's no trust for the data VLAN. This forms a fundamental design principle of the Differentiated Services model, which is to classify and mark packets as close to the source as possible. To keep users from marking their own traffic, a trust boundary needs to be enforced, which should likewise be as close to the source as possible.