Overcoming Link Capacity Limitations
TCP optimization and advanced TCP implementations are useful to help ensure that the network resources are adequately and efficiently utilized. However, the WAN can still be a bottleneck from a performance perspective especially when considering the massive amount of bandwidth disparity that exists between the LAN and the WAN. The most common response to addressing this bandwidth disparity is to simply add more bandwidth.
There is a severe disparity in terms of available bandwidth capacity when comparing the LAN and the WAN, and a growing disparity in terms of the cost per bit of service between the two. Yet another disparity exists between the tremendous amount of latency found in WAN connections compared to the latency found in a LAN. Put simply, the cost per bit/second of low-latency LAN bandwidth is decreasing at a faster rate than is the cost per bit/second of high-latency WAN bandwidth, making it dramatically more expensive to upgrade WAN capacity than it is to upgrade LAN capacity. This problem is exacerbated when factoring in the additional service and capital expenditures that accompany a WAN upgrade.
With this in mind, many organizations are looking to accelerator solutions to bridge the divide between WAN bandwidth and performance, and most are finding that implementing accelerator solutions (which address more than just bandwidth concerns) costs far less from a total cost of ownership (TCO) perspective and provides a massive return on investment (ROI) because it [implementing accelerator solutions] enables infrastructure to be consolidated and improves productivity and collaboration.
Compression is a technology that helps to minimize the amount of bandwidth consumed by data in transit over a network and is not a new technology. Advanced compression algorithms and implementations have come about that are more effective and efficient than the simple compression algorithms that have been around for years.
This section examines compression at a high level, including the difference between per-packet implementations and flow-based compression implementations, and then provides a more detailed overview of advanced compression techniques such as data suppression. Advanced compression techniques not only compress data in flight but also reduce the frequency of the transfer of redundant data patterns. Most accelerator devices today provide some form of compression and, when coupled with TCP optimization and application acceleration, not only can improve application performance significantly, but can do so while minimizing the amount of network bandwidth required.
Accelerators and Compression
The key challenge for lower-bandwidth networks (or any WAN where there is a high amount of contention for available bandwidth) is that the device managing bandwidth disparity (that is, the router) has to negotiate traffic from a very high-speed network onto a very low-speed network. In terms of how TCP interacts with this disparity, data is sent at the rate determined by the available cwnd, and when a bandwidth disparity is present, the ingress interface queues on the router begin to fill, causing delay to a point where loss is encountered. The detection of this loss results in a decrease of the cwnd, and the transmitting node, while in congestion avoidance, attempts to stabilize at a rate that is conducive to transmission without (or nearly without) loss.
In effect, the overloaded queue (or loss in the WAN, or anything that causes loss or delay of acknowledgment) helps stimulate the transmitting node into finding the right level at which to send data. When the router receives packets into its queues, it drains the ingress queues at a rate determined by the available next-hop network capacity and any applicable QoS configuration.
The problem begins to unfold as the client is able to transmit data at a far faster pace than the router is able to forward onto the lower-bandwidth network. This results in a trickle effect whereby only a small amount of data is able to traverse the lower-bandwidth network at a given time, and ultimately the transmitting node will adjust its throughput characteristics to match this rate based on TCP congestion avoidance. This trickle of data is routed through the network and may encounter several other points of congestion and loss along the way, all of which can directly impact the transmission rate of the sender.
When the data is finally placed on the LAN at the distant location, the receiving node ultimately has the impression that it is dealing with a very slow sender. This is directly caused by the bandwidth disparity found in the network. Conversely, in the return direction, when the server is responding to the client, the same issues are encountered, thus leading both nodes to believe that the other is slow to transmit.
Figure 6-24 shows the bandwidth disparity between the LAN and the WAN and how TCP rate control stabilizes transmitter throughput to the minimum available end-to-end network capacity.
Figure 6-24 TCP Rate Control and Bandwidth Disparity
With accelerators in place and providing compression, bandwidth disparity is largely mitigated. In effect, through compression the accelerator is able to shrink the amount of data being sent across the network to minimize bandwidth consumption, and decompress the data to its original state on the other end of the network. This not only results in bandwidth savings over the WAN, but also helps to ensure that a larger amount of data can be found in the LAN. From the perspective of the client and the server, less congestion is encountered, thereby allowing them to send at higher rates, and improving the performance of the applications in use. Figure 6-25 shows the same implementation leveraging accelerators and how less congestion is encountered.
Figure 6-25 TCP Rate Control and Bandwidth Disparity with Accelerators
While numerous compression algorithms exist, this book focuses on the two types that are used most frequently in accelerator platforms: traditional data compression and data suppression. Furthermore, this chapter focuses on lossless compression, because data cannot be compromised when working with enterprise applications the way it can when viewing images, for example.
Traditional Data Compression
Traditional data compression is defined as the process of encoding data using fewer bits than an unencoded representation would use. This is enabled by having two devices exchanging information using a common scheme for encoding and decoding data. For instance, a person can use a popular compression utility to minimize the amount of disk space a file consumes (encoding), which in turn minimizes the amount of bandwidth consumed and time taken to e-mail that same file to another person. The person who receives the e-mail with that compressed file needs to use a utility that understands the compression algorithm to decode the data to its original form. This again relies on the assumption that the sender and receiver have explicit knowledge about how to encode and decode the data in the same way.
Data compression can be applied in the network between accelerators assuming that both accelerators are designed to be compatible with one another and have a fundamental understanding of the compression algorithms employed. Most accelerators implement a well-known type of compression algorithm, such as Lempel-Ziv (or one of its variants such as LZ77 or LZ78) or DEFLATE.
Many algorithms achieve compression by replacing sections of data found within the object with small references that map to a data table that contains an index of data encountered within the object and references generated for that data. Such algorithms commonly use a sliding window with a power-of-two fixed window size (such as 1 KB, 2 KB, 4 KB, 8 KB, 16 KB, 32 KB, or beyond) to scan through the object, identifying data patterns within the window and referencing with signatures that point back to the index table. In this way, compression is limited to the capacity of the sliding window, granularity of the patterns that are identified, and the data table structures that maintain the mapping between signatures and data structures that have been previously seen. The limit of how effective the compression itself can be is called the compression domain.
Data Suppression
Data suppression operates in a similar fashion to standard data compression. Unlike data compression, however, data suppression is designed to leverage a massive compression domain that extends beyond the object, packet, or session being examined. In a sense, data compression is limited to the object, packet, or session being transmitted, whereas data suppression can leverage a larger, longer-lived compression history that is specific to previous patterns seen between two accelerator devices.
Data suppression assumes that each accelerator has a storage repository that is used as a compression history. Data suppression requires that this storage repository, sometimes known as a "codebook" or "dictionary," be loosely synchronized between two accelerators that wish to provide data suppression capabilities. Some accelerators implement data suppression using memory as a storage repository for compression history. Memory-only data suppression provides high-performance I/O for compressing and decompressing data, but has a limited capacity in terms of compression history. Some accelerators use hard disk drives instead of memory, a technique that has lower I/O performance than memory-only data suppression, but has a much larger compression history capacity. Many of today's accelerators leverage a combination of both disk and memory to achieve high performance without compromising on the length of compression history for their data suppression algorithms.
Figure 6-26 shows an example of two accelerators, each deployed in separate locations, with a synchronized compression history.
Figure 6-26 Data Suppression and Synchronized Compression History
Much like data compression, most data suppression algorithms use a sliding window to identify previously seen data patterns from within packets or connections. Each unique data pattern that is identified is assigned a unique signature, which is a small representation of the original data, or a reference to the entry in the compression library where the original data resides. Most signatures are typically only a few bytes in size. Given that a relatively large amount of data can be referenced by a signature that is only a few bytes in size, it is common for data suppression to be able to achieve over 1000:1 compression for a fully redundant data segment and up to 50:1 or better compression for an entire flow.
Many of the data suppression implementations available in accelerators today do not use a simple single-level data pattern identification process; rather, most use a hierarchical means of identifying data patterns. A hierarchical data pattern identification process analyzes a block of data at multiple layers. With hierarchical data pattern recognition, the accelerator is able to not only create signatures for the most basic data patterns, but also formulate additional signatures that refer to a large number of data patterns or signatures. In this way, if a greater degree of redundancy is identified within the data patterns, hierarchical data pattern recognition allows a smaller number of signatures to be used to refer to the original data being transmitted. If hierarchical data pattern recognition is not being used, one signature is required for each repeated data pattern, which may prove less efficient.
Hierarchical data pattern matching can yield far higher levels of data suppression (compression) than data pattern matching functions that are nonhierarchical. Figure 6-27 shows the difference between hierarchical and nonhierarchical data pattern recognition algorithms.
Figure 6-27 Nonhierarchical and Hierarchical Data Pattern Recognition
During the process of data pattern matching, a signature is generated for each identified data pattern being transmitted, and in the case of hierarchical data pattern recognition, signatures may also be generated that identify a large number of data patterns or signatures. Once the signatures have been generated, the accelerator then begins to perform a pattern-matching function whereby it compares the signature of the identified patterns (generally starting with the largest data patterns when hierarchical data pattern matching is employed) with the contents of the local compression library. This local compression library contains a database containing signatures and data patterns that have been previously recognized by the two peering accelerators, and can be built dynamically (as data is handled during the normal course of network operation) or proactively (using a cache-warming or preposition type of feature before the first user transmission is ever received).
As repeated data patterns are identified, they are removed from the message, and only the signature remains. This signature is an instruction for the distant device, notifying it of what data to retrieve from its compression history to replace the original data pattern back into the flow.
For data patterns that are nonredundant (no entry exists in the compression library for the signature), the new signature and data pattern are added to the local compression library. In this case, the newly generated signatures and data pattern are sent across the network to notify the distant accelerator that it needs to update its compression library with this new information.
Figure 6-28 shows the top-down data pattern matching function applied by an encoding accelerator when leveraging data suppression.
Figure 6-28 Data Pattern Matching
This entire process of suppressing data into a redundancy-eliminated message is defined as encoding. When peering accelerators are able to perform data suppression, the only data that needs to traverse the network is the encoded message, which includes signatures to previously seen data segments and data segments (with newly generated signatures) for data that has not been seen before, or otherwise does not exist in the local compression library.
Additionally, accelerators commonly implement data integrity protection mechanisms to ensure that the rebuild of an encoded message by the recipient accelerator is done without compromising the integrity of the original message. This commonly includes a hash calculation of the original message, called a message validity signature, immediately before encoding begins. In most cases, the hash calculation to generate the message validity signature includes accelerator-specific keys in the calculation or other means of ensuring that hash collisions cannot occur. Figure 6-29 shows an example of what an encoded message sent among accelerators may look like.
Figure 6-29 Data Suppression Encoded Message
When an encoded message is received, the message validity signature attached to the encoded message is stripped and placed to the side. The remaining message contains signatures without data attached (for redundant data patterns that should exist in the compression library) and signatures with data attached (nonredundant data), which the accelerator should add to its compression library.
The receiving accelerator then begins the process of decoding, whereby the original message is rebuilt. This is done by replacing each signature sent without attached data with the data pattern that is identified by that signature within the local compression context. Any data pattern that has an accompanying signature is added to the local compression library and the signature is stripped from the encoded message. If any signatures is identified that was generated out of the process of hierarchical pattern matching, each of the data patterns referenced by the signature is extracted from the local compression library and put in place of the signature in the encoded message.
Once the accelerator has fully decoded the encoded message, it then generates a new message validity signature, that is, the same hash calculation using the same parameters such as accelerator-specific keys or other factors, and compares the newly generated message validity signature with the message validity signature that was supplied in the encoded message by the encoding accelerator. If the two align, the decoding accelerator then knows that the message was rebuilt with data validity and integrity guaranteed, and the decoded message can then be forwarded to its destination. If any signature referenced in the encoded message does not appear in the local compression library, the decoding accelerator can send a nonacknowledgment message to the encoder to notify the encoder that the data does not exist, thereby forcing a retransmission of the data associated with the signature that does not appear in the decoding accelerator's compression history.
Data suppression also works closely with traditional data compression in accelerators. The signatures and message validity signatures generated by the data suppression function are commonly compressible, meaning that additional layers of compression can be found above and beyond the level of compression provided by data suppression on its own. Data suppression compression libraries are generally peer-specific, and some implementations can share libraries across peers. Given that the compression library for data suppression is far larger than it is for data compression and is not tied to a packet, session, or object, it can be far more effective at freeing up available WAN capacity, and can generally be efficient in reducing redundancy even across multiple applications.
Because data suppression is commonly implemented as a WAN optimization component in the network or transport layer, it does not discriminate among applications (for example, downloading an object from a website could populate the compression library to provide significant compression for an e-mail upload with that same, or modified, object). Furthermore, a data pattern that is not deemed to be a repeated pattern can also be sent through the data compression library, which means that even though a data pattern is being seen for the first time, it may be compressible, meaning that less bandwidth is used even for those scenarios.
When deploying accelerators, a best practice is to ensure that enough storage capacity is allocated to provide at least one week's worth of compression history. For instance, if a branch office location is connected to the corporate network via a T1 line (1.544 Mbps), the amount of disk capacity necessary to support that location from a compression history perspective could be calculated. If the T1 line were utilized at a rate of 75 percent for 50 percent of each day per week, the raw compression history requirement for one week would equate to the following:
- Convert 1.544 Mbps to bytes (1.544 Mbps is approximately 192 kBps)
- 192 kBps x 60 = 11.52 MB/minute
- 11.52 MB/minute x 60 = 69.12 MB/hour
- 69.12 MB/hour x 24 = 16.59 GB/day
- 16.59 GB/day x .75 x .50 = 6.2 GB/week
Assuming that the data traversing the link from the branch office back to the corporate network is 75 percent redundant (which is conservative in most branch office scenarios), this equates to a requirement of around 1.5 GB/week of compression history required to support this particular branch office location. One week of compression history generally provides enough compression data to optimize not only immediately interactive user access requirements, but also scenarios where a user begins interacting with a working set of data that has not been touched in up to a week. This also provides sufficient coverage for multiuser scenarios—for instance, where everyone in a remote office receives an e-mail with the same attachment, even if the transmission of that e-mail happens days later.
Accelerator Compression Architectures
Most accelerators implement both data suppression and traditional data compression functions. This section examines two commonly found implementations: per-packet compression and session-based compression. This section also discusses directionality as it relates to the accelerator's compression library.
Per-Packet Compression
Per-packet compression treats each packet as a compression domain. This means that each packet is analyzed as an autonomous object and compressed as an autonomous object. With such an architecture, each packet that is received by an accelerator must be handled individually.
Given that packet sizes are largely limited by the data link layer being traversed, it is safe to assume that the compression domain for per-packet compression is generally 200 to 1500 bytes (but can certainly be larger or smaller). Given this limitation, most accelerators that implement per-packet compression also implement some form of data suppression to increase the compression history, which allows larger repeatable sequences to be identified, thus providing better compression and higher degrees of bandwidth savings. However, because the accelerator processes each packet individually, it does not have the opportunity to examine a larger amount of data, precluding it from providing higher levels of compression. Per-packet processing also creates sensitivities to data changes, which are more difficult to detect and isolate when the window of data being reduced is limited to the size of a packet.
Figure 6-30 shows how per-packet compression would be used on a fully redundant transmission of the same set of packets. In this figure, the packet stream being sent is fully redundant. The process shown in this figure follows:
- A stream of data is sent as a series of packets.
- The accelerator compares the contents of each packet against the compression history.
- Redundancy is eliminated and redundant segments are replaced by signatures.
- The compressed packet is sent across the network and intercepted by the distant accelerator.
- The distant accelerator compares the contents of the encoded packet with the compression history.
- The distant accelerator replaces signatures with data patterns from the compression history.
- The distant accelerator repacketizes and reframes the original data and forwards it to the intended destination.
Figure 6-30 Per-Packet Compression with Large Shared Compression History
The challenge with per-packet compression lies in the ability of an accelerator to identify repeated data patterns that are directly associated with the position of the data pattern within the packet being received. Therefore, such forms of compression can be challenged when attempting to compress packets received from the transmission of a previously seen set of data when that set of data has undergone slight modification. The challenge is that repeated patterns are not located in the same position within the subsequently received packets after changes have been applied to the data as compared to the position where the original data was found. In this way, data does not have the same locality to the packet boundaries as it did before the changes were applied.
In these situations, the compression history must be augmented with the new data on both devices, which means the first transmission of the changed data can be largely considered nonredundant and, therefore, only minimal amounts of compression can be applied. The compression history from the original transfer of the data is mostly unusable in this case, which leads to excessive and inefficient use of memory and disk capacity for compression history on both accelerator devices to store the new data and the original patterns.
Figure 6-31 highlights the challenges of per-packet compression when data has been changed, causing a large number of new compression library entries to be created and thereby producing minimal compression benefits. This example shows the impact of adding the letter a to the beginning of the sentence used in Figure 6-30. Notice that a large number of new entries need to be added to the compression history, and little to no bandwidth savings are realized.
Figure 6-31 Per-Packet Compression Challenges with Data Locality
Session-Based Compression
An alternative to per-packet compression, which generally provides higher levels of compression even under significant amounts of data change, is session-based compression, also known as persistent compression. Session-based compression provides two key functions:
- The ability to leverage a compression history that extends through the life of the entire TCP connection when employing traditional data compression algorithms such as LZ or DEFLATE
- The ability to extend the compression domain for data suppression functionality by leveraging a TCP proxy as an intermediary data buffer
Extending the compression library to span the history of the TCP connection provides better overall compression because the history that can be leveraged is far greater than and not limited to the packet size.
Using the TCP proxy as an intermediary data buffer allows the accelerator to temporarily buffer a large amount of data that can then be delivered en masse to the data suppression function, which allows it to span boundaries of multiple packets when identifying redundancy. By disconnecting the data suppression function and the data from the packet boundaries, data suppression algorithms are able to better identify redundancy and isolate changes to previously encountered byte patterns. This allows for far higher levels of compression than can be provided with per-packet compression.
Figure 6-32 shows how session-based compression with hierarchical data suppression provides location-independent compression capabilities with high levels of compression. Notice how the hierarchical compression history contains signatures that reference a series of other signatures, thereby providing higher levels of compression. In this figure, a single signature represents the entire string of text being sent.
Figure 6-32 Compression with Hierarchical Data Suppression
When coupled with a data suppression function that uses a content-based means of identifying data patterns and hierarchical pattern matching, this allows the data suppression function to accurately pinpoint where changes were inserted into the data stream. In essence, when changes are made to the data and the changed data is transmitted between two nodes, content-based pattern matching can isolate the new data to allow the previously built compression library entries to still be leveraged, resulting in high levels of compression even in the face of changed data. Rather than having to re-update the entire compression library on both devices, the accelerators can incrementally update the compression libraries with only the new data. This means that the challenges that plague per-packet compression are nullified.
Figure 6-33 shows that session-based compression with hierarchical data suppression capabilities provides far better identification of changed byte patterns than do per-packet compression algorithms, resulting in more efficient use of the compression library, bandwidth savings, and overall better performance.
Figure 6-33 Session-Based Compression with Changed Data
Directionality
Another subtle architectural implementation that you should consider is the directionality of the data suppression compression library. Directionality defines whether a single compression library is used regardless of the direction of traffic flow, called a bidirectional compression library, or whether a single compression library is used per direction of traffic flow, called a unidirectional compression library.
In the case of a bidirectional compression library, traffic flowing in one direction populates the compression library, and this same library can be used for traffic flowing in the reverse direction through the accelerators. In the case of a unidirectional compression library, traffic flowing in one direction populates one compression library, and traffic flowing in the reverse direction populates a separate compression library. With unidirectional compression libraries, there must be one library per direction of traffic flow.
Having unidirectional compression libraries introduces inefficiencies and performance limitations that need to be considered. Having a unidirectional compression library–based architecture means that when a user downloads an object, only the repeated download of that object will pass through the same library, because the traffic is flowing in the same direction. Even after an object has been downloaded, the subsequent upload (in the reverse direction) of that object passes through a completely different compression library, meaning that it will not be suppressed.
The upload of the object effectively receives no benefit from the download because two separate compression libraries are used. This not only creates the issue of having poor performance until the object has been transferred in both directions, but also means lower overall efficiency from a memory and disk utilization perspective, as the data must be stored twice. Having to store the data twice leads to a lower overall compression history. Figure 6-34 highlights the performance challenges of unidirectional data suppression libraries in scenarios where data that has been downloaded is being uploaded.
Figure 6-34 Unidirectional Data Suppression Library Challenges
With a bidirectional compression library, the problem illustrated in Figure 6-34 is not encountered. Traffic flowing in either direction leverages the same compression library, so the download of an object can be leveraged to help suppress the repeated patterns found while uploading the same object. Furthermore, repeated patterns are not stored once per direction of traffic flow, leading to better efficiency when utilizing disk and memory resources while also extending the overall effective compression history that the accelerator peers can leverage. Figure 6-35 shows how use of a bidirectional data suppression library provides compression benefits and is agnostic to the direction of traffic flow.
Figure 6-35 Bidirectional Data Suppression Library