MPEG-2 Compression
MPEG-2 video compression is the de facto standard for entertainment video. MPEG-2 video compression is popular for a number of reasons:
It is an international standard [ISO/IEC IS 13818-2].
MPEG-2 places no restrictions on the video encoder implementation. This allows each encoder designer to introduce new techniques to improve compression efficiency and picture quality. Since MPEG-2 video encoders were first introduced in 1993, compression efficiency has improved by 30 to 40%, despite predictions by many that MPEG-2s fundamental theoretical limitations would prevent this.
MPEG-2 fully defines the video decoders capability at particular levels and profiles. Many MPEG-2 chip-sets are available and will work with any main level at main profile (MP@ML)compliant MPEG-2 bit-stream from any source. Nevertheless, quality can change significantly from one MPEG-2 video decoder to another, especially in error handling and video clip transitions.
MPEG-2 video compression is part of a larger standard that includes support for transport and timing functions.
Moreover, MPEG-2 is likely to remain as the dominant standard for entertainment video because it has been so successful in establishing an inventory of standard decoders (both in existing consumer electronics products and in the chip libraries of most large semiconductor companies). Additional momentum comes from the quantity of real-time and stored content already compressed into MPEG-2 format. Even succeeding work by the MPEG committees has been abandoned (MPEG-3) or retargeted to solve different problems (MPEG-4 and MPEG-7).
MPEG-2 is a lossy video compression method based on motion vector estimation, discrete cosine transforms, quantization, and Huffman encoding. (Lossy means that data is lost, or thrown away, during compression, so quality after decoding is less than the original picture.) Taking these techniques in order:
Motion vector estimation is used to capture much of the change between video frames, in the form of best approximations of each part of a frame as a translation (generally due to motion) of a similar-sized piece of another video frame. Essentially, there is a lot of temporal redundancy in video, which can be discarded. (The term temporal redundancy is applied to information that is repeated from one frame to another.)
Discrete cosine transform (DCT) is used to convert spatial information into frequency information. This allows the encoder to discard information, corresponding to higher video frequencies, which are less visible to the human eye.
Quantization is applied to the DCT coefficients of either original frames (in some cases) or the DCT of the residual (after motion estimation) to restrict the set of possible values transmitted by placing them into groups of values that are almost the same.
Huffman encoding uses short codes to describe common values and longer codes to describe rarer valuesthis is a type of entropy coding.
The foregoing is a highly compressed summary of MPEG-2 video compression (with many details omitted). However, there are so many excellent descriptions of MPEG compression (see DTV: The Revolution in Electronic Imaging, by Jerry C. Whitaker; Digital Compression for Multimedia: Principles and Standards, by Jerry D. Gibson and others; Testing Digital Video, by Dragos Ruiu and others; and Modern Cable Television Technology; Video, Voice, and Data Communications, by Walter Ciciora and others) that more description is not justified here. Instead, the following sections concentrate on the most interesting aspects of MPEG:
-
What are MPEG-2s limitations?
-
What happens when MPEG-2 breaks?
-
How can compression ratios be optimized to reduce transmission cost without compromising (too much) on quality?
MPEG Limitations
If MPEG-2 is so perfect, why is there any need for other compression schemes? (There are a great many alternative compression algorithms, such as wavelet, pyramid, fractal, and so on.) MPEG-2 is a good solution for coding relatively high-quality video when certain transmission requirements can be met. However, MPEG-2 coding is rarely used in Internet applications because the Internet cannot generally guarantee the quality of service (QoS) parameters required for MPEG-2coded streams. These QoS parameters are summarized in Table 4-1.
Table 4-1 MPEG-2 QoS Parameters for Entertainment Quality Video
As you can see from the table, for entertainment-quality video, MPEG-2 typically requires a reasonably high bit rate, and this bit rate must be guaranteed. Video-coding will, in general, produce a variable information rate, but MPEG-2 allows for CBR transmission facilities (for example, satellite transponders, microwave links, and fiber transmission facilities). As such, MPEG-2 encoders attempt to take advantage of every bit in the transmission link by coding extra detail during less-challenging scenes. When the going gets toughduring a car chase, for exampleMPEG-2 encoders use more bits for motion and transmit less detail. Another way to think of this is that MPEG-2 encoding varies its degree of loss according to the source material. Fortunately, the human visual system tends to work in a similar way, and we pay less attention to detail when a scene contains more motion. (This is true of a car chase whether you are watching it or you are in it!)
MPEG-2 coded material is extremely sensitive to errors and lost information because of the way in which MPEG-2 puts certain vital information into a single packet. If this packet is lost or corrupted, there can be a significant impact on the decoder, causing it to drop frames or to produce very noticeable blocking artifacts. If you think of an MPEG-2 stream as a list of instructions to the decoder, you can understand why the corruption of a single instruction can play havoc with the decoded picture.
Finally, MPEG-2 is extremely sensitive to variations in transmission delay. These are not usually measurable in synchronous transmission systems (for example, satellite links) because each bit propagates through the system according to the clock rate. In packet- or cell-based networks, however, it is possible for each packet-sized group of bits to experience a different delay. MPEG-2 was designed with synchronous transmission links in mind and embeds timing information into certain packets by means of timestamps. If the timestamps experience significant jitter (or cell delay variation), it causes distortions in audio and video fidelity due to timing variations in the sample clocksfor example, color shifts due to color subcarrier phase variations.
MPEG-2 Artifacts
What are MPEG artifacts? In practice, all lossy encoders generate artifacts, or areas of unfaithful visual reproduction, all the time; if the encoder is well designed, all these artifacts will be invisible to the human eye. However, the best laid plans sometimes fail; the following are some of the more common MPEG-2 artifacts:
-
If the compression ratio is too high, there are sometimes simply not enough bits to encode the video signal without significant loss. The better encoders will progressively soften the picture (by discarding some picture detail); however, poorer encoders sometimes break down and overflow an internal buffer. When this happens, all kinds of visual symptomsfrom bright green blocks to dropped framescan result. After such a breakdown, the encoder will usually recover for a short period until once again the information rate gets too high to code into the available number of bits.
-
Another common visible artifact is sometimes visible in dark scenes or in close-ups of the face and is sometimes called contouring. As the name suggests, the image looks a little like a contour map drawn with a limited set of shades rather than a continuously varying palette. This artifact sometimes reveals the macro-block boundaries (which is sometimes called tiling). When this happens, it is usually because the encoder allocates too few quantization levels to the scene.
-
High-frequency mosquito noise will sometimes be apparent in the background. Mosquito noise is often apparent in surfaces, such as wood, plaster, and wool, that contain an almost limitless amount of detail due to their natural texture. The encoder can be overtaxed by so much detail and creates a visual effect that looks as if the walls are crawling with ants.
NOTE
Macro-blocks are areas of 16-by-16 pixels that are used by MPEG for DCT and motion-estimation purposes. See Chapter 3 of Modern Cable Television Technology; Video, Voice, and Data Communications by Walter Ciciora and others, for more details.
There are many more artifacts associated with MPEG encoding and decoding; however, a well-designed system should rarely, if ever, produce annoying visible artifacts.
MPEG-2 Operating Guidelines
To avoid visible artifacts due to encoding, transmission errors, and decoding, the entire MPEG-2 system must be carefully designed to operate within certain guidelines:
-
The compression ratio cannot be pushed too high. Just where the limit is on compression ratio for given material at a certain image resolution and frame rate is a subject of intense and interminable debate. Ultimately, the decision involves engineers and artists and will vary according to encoder performance (there is some expectation of improvements in rate with time, although also some expectation of a law of diminishing returns). Table 4-2 gives some guidance based on experience.
The transmission system must generate very few errors during the average viewing time of an event. For example, in a two-hour movie, the same viewers may tolerate very few significant artifacts (such as frame drop or green blocks). In practice, this means that the transmission system must employ forward error correction (FEC) techniques.
Table 4-2 MPEG-2 Resolution Versus Minimum Bit Rate Guidelines
Other Video Compression Algorithms
There are a great many alternative video compression algorithms, such as wavelet, pyramid, fractal, and so on (see Chapter 7 of Digital Compression for Multimedia: Principles and Standards by Jerry D. Gibson and others). Many have special characteristics that make them suitable for very low bit rate facilities, for software decoding on a PC, and so on. However, it is unlikely that they will pose a significant threat to MPEG-2 encoding for entertainment video in the near future.
Compression Processing Requirements
Lets take a full-resolution video frame that contains 480 lines, each consisting of 720 pixels. The total frame, therefore, contains 345,600 pixels. Remember that a new frame arrives from the picture source every 33 milliseconds. Thus, the pixel rate is 10,368,000 per second. Imagine that the compression process requires about 100 operations per pixel. Obviously, a processor with a performance of 1,000 million instructions per second (mips) is required.
In practice, custom processing blocks are often built in hardware to handle common operations, such as motion estimation and DCT used by MPEG-2 video compression.
Details of MPEG-2 Video Compression
The following sections detail some of the more practical aspects of MPEG-2 video compression:
Picture resolutionMPEG-2 is designed to handle the multiple picture resolutions that are commonly in use for broadcast television. This section defines what is meant by picture resolution and how it affects the compression process.
Compression ratioMPEG-2 can achieve excellent compression ratios when compared to analog transmission, but there is some confusion about the definition of compression ratios. This section discusses the difference between the MPEG compression ratio and the overall compression ratio.
Real-time MPEG-2 compressionMost of the programs delivered over cable systems are compressed in real-time at the time of transmission. This section discusses the special requirements for real-time MPEG-2 encoders.
Nonreal-time MPEG-2 compressionStored-media content does not require a real-time encoder, and there are certain advantages to nonreal-time compression systems.
Statistical multiplexingThis section explains how statistical multiplexing works in data communications systems and what special extensions have been invented to support the statistical multiplexing of MPEG-2 program streams.
Re-multiplexingRe-multiplexing, or grooming, of compressed program streams is discussed, including a recent technique that actually allows the program stream to be dynamically reencoded to reduce its bit rate.
Picture Resolution
MPEG-2 compression is a family of standards that defines many different profiles and levels. (For a complete description of all MPEG-2 profiles and levels, see Chapter 5 of DTV: The Revolution in Electronic Imaging by Jerry C. Whitaker.) MPEG-2 compression is most commonly used in its main profile at its main level (abbreviated to MP@ML). This MPEG-2 profile and level is designed for the compression of standard definition television pictures with a resolution of 480 vertical lines.
The resolution of a picture describes how many pixels are used to describe a single frame. The higher the resolution, the more pixels per frame. In many cases, the luminance information is coded with more pixels than the chrominance information.
NOTE
The retina of the human eye perceives more detail with rod cells, which are sensitive only to the intensity of lightthe luminanceand perceives less detail with cone cells, which are sensitive to the color of lightthe chrominance.
Chroma subsampling takes advantage of the way the human eye works by sampling the chrominance with less detail than the luminance information. In the MPEG-2 main profile (MP), the chrominance information is subsampled at half the horizontal and vertical resolution compared to the luminance information. For example, if the luminance information is sampled at a resolution of 480 by 720, the chrominance information is sampled at a resolution of 240 by 360, requiring one-fourth the number of pixels. This arrangement is called 4:2:0 sampling. The effect of 4:2:0 sampling is to nearly halve the video bandwidth compared to sampling luminance and chrominance at the same resolution.
Compression Ratio
The compression ratio is a commonly misused term. It is used to compare the spectrum used by a compressed signal with the spectrum used by an equivalent NTSC (National Television Systems Committee) analog signal. Expressed this way, typical compression ratios achieved by MPEG-2 range from 6:1 to 14:1. Why is the term confusing?
If you take the same video signal and modulate it as an analog signal (uncompressed) and compress it using MPEG-2, you have two very different things. The analog signal is an analog waveform with certain bandwidth constraints so that it fits into 6 MHz, whereas the MPEG-2 elementary stream is just a string of bits that cannot be transmitted until further processing steps are taken. These steps include multiplexing, transport, and digital modulation, and they all affect how much bandwidth is required by the compressed signal.
To compare apples with apples, you must take the same video signal and convert it to an uncompressed digital signal (this is actually the first step in the compression process and is termed analog-to-digital conversion or simply sampling). You can now compare the uncompressed digital signal with the MPEG-2 compressed elementary stream for a true comparison of the input bit rate and the output bit rate of the compression process. A full-resolution uncompressed video signal sampled in 4:2:0 (see the previous section, Picture Resolution) requires 124.416 Mbps. MPEG-2 can squeeze this down to about 4 Mbps with little or no loss in perceived quality; this is a true compression ratio of 124:4 or 31:1. This is very different than the commonly quoted range of 6:1 to 14:1.
To continue the math, take the 4 Mbps video elementary stream and add an audio stream at 192 Kbps to create a program stream at 4.192 Mbps. Add information to describe how the streams are synchronized and place the data into short transport packets for efficient multiplexing with other streams. You now have a payload of approximately 4.3 Mbps. Using 64-QAM modulation (see the section Broadband Transmission in this chapter), six 4.3 Mbps streams fit into its 27 Mbps payload. Thus, we could express this as a 6:1 compression ratio.
This is all very confusing! In this example, a video signal with a 31:1 MPEG-2 video compression ratio is roughly equivalent to an overall compression ratio of 6:1. (If the example employs 256-QAM and statistical multiplexing, you might achieve an overall compression ratio of 12:1 although the MPEG-2 video compression ratio is still 31:1.)
In this book, the terms MPEG-2 video compression ratio and overall compression ratio will be used to distinguish these very different measures.
Real-Time MPEG-2 Compression
Real-time compression is commonly used at satellite up-links to compress a video signal into a digital program stream as part of the transmission (or retransmission) process. Very often, the encoder runs for long periods of time without manual intervention. There must be sufficient headroom in the allocated bit rate to allow the encoder to operate correctly for all kinds of material that it is likely to encode. (Headroom refers to available, but normally unused, bits that are allocated to allow for the video compression of difficult scenes.) Each channel requires a dedicated encoder, so price is a definite issue for multichannel systems. The encoder must also be highly reliable, and in many cases automatic switching to a backup encoder is required.
NonReal-Time MPEG-2 Compression
Nonreal-time encoders are technically similar to real-time encoders, but have very different requirements. In fact, they may encode in real-time but their application is to encode to a stored media (such as a tape or disc), and a highly-paid compressionist usually monitors the compression of each scene. (Compressionists are studio engineers who not only understand how to operate the encoding equipment but also apply their artistic judgment in selecting the best trade-off between compression ratio and picture quality.) Therefore, encoder price is less of an issue and performance is extremely important because the compressed material will be viewed over and over again. In the case of digital versatile disks (DVDs), no annoying visible artifacts, however subtle, can be tolerated, because the picture quality will be carefully evaluated by a magazine reviewer.
Statistical Multiplexing
Statistical multiplexing is a technique commonly used in data communications to extract the maximum efficiency from a CBR link. A number of uncorrelated, bursty traffic sources are multiplexed together so that the sum of their peak rates exceed the link capacity. Because the sources are uncorrelated, there is a low probability that the sum of their transmit rates will exceed the link capacity. However, although the multiplex can be engineered so that periods of link oversubscription are rare, they will occur. (See Murphys law!) In data communications networks, periods of oversubscription are accommodated by packet buffering and, in extreme cases, packet discard. (The Internet is a prime example of an oversubscribed, statistically multiplexed network where packet delay and loss may be high during busy periods.)
Video material has a naturally varying information ratewhen the scene suddenly changes from an actor sitting at a table to an explosion, the information rate skyrockets. Although MPEG-2 is designed to compensate by encoding more or less detail according to the amount of motion, the encoded bit rate may vary by a ratio of 5 to 1 during a program.
This makes MPEG-2 program streams excellent candidates for statistical multiplexing, except for the fact that MPEG-2 is extremely sensitive to delay and loss. As such, statistical multiplexing cannot be used for MPEG-2 if there is any probability of loss due to oversubscription.
Therefore, statistical multiplexing has been specially modified for use with MPEG by the addition of the following mechanisms:
A series of real-time encoders are arranged so that their output can be combined by a multiplexer into a single multi-program transport stream (MPTS). Each encoder has a control signal that instructs it to set its target bit-rate to a certain rate.
The multiplexer monitors the sum of the traffic from all the encoders as it combines them, and in real-time decides whether the bit rate is greater or lower than the transmission link capacity.
When one encoder has a more challenging scene to compress, it requests that its output rate be allowed to rise. The hope is that one of the other encoders will have less-difficult material and will lower its output rate.
However, there is a significant probability that all the encoders could be called upon to encode a challenging scene at the same time. When this happens, the aggregate bit rate will exceed the link capacity. A conventional statistical multiplexer would discard some packets, but in the case of MPEG-2, this would be disastrous and almost guarantee poor-quality video at the output of the decoders.
Instead, the multiplexer buffers the additional packets and requests that the encoders lower their encoded bit rate. The buffered packets are delayed by only a few milliseconds, but MPEG-2 is extremely sensitive to delay variation. The multiplexer can fix this within limits; as long as the decoder pipeline does not underflow and the timestamps are adjusted to compensate for the additional time they are buffered, the decoder continues to function normally.
Some statistical multiplexers use a technique called look ahead statistical multiplexing (pioneered by DiviComsee http://www.divi.com/). In this technique, the material is encoded or statistics are extracted in a first pass, the information is passed to the multiplexer (while the original input video is passing through a pipeline delay), and bit rates are assigned for each encoder; so when the real encoding happens, a reasonable bit rate is already assigned. This solves some of the nasty feedback issues that can happen in less sophisticated designs.
Reencoding
Until recently, it was impossible to modify an encoded MPEG-2-bit stream in real-time. It is now possible, however, to parse the MPEG-2 syntax and modify it to reduce the bit rate by discarding some of the encoded detail. This technique was pioneered by Imedia Corporation (http://www.imedia.com/) and allows the feedback loop between the MPEG encoders and the multiplexer to be removed. In a reencoding (or translation) system, the multiplexer is used to combine a number of variable bit rate MPEG-2 streams. If, at any instant in time, the aggregate bit rate of all of the streams exceeds the transmission link capacity, the multiplexer will reencode one or more of the streams to intelligently discard information to reduce their bit rate. Unlike statistical multiplexing, where the multiplexer could not discard any bits, the multiplexer reduces the bit rate by discarding some informationfor example, fine detail.
Reencoding is a very useful technique to use whenever a number of multi-program transport streams are groomedthat is, a new output multiplex is formed from program streams taken from several input multiplexes. Without some means of adapting the coded rate, re-multiplexing would result in considerable inefficiency and the output multiplex would contain fewer channels.
A second application of reencoding is in Digital Program Insertion (DPI). DPI splices one single-program transport program stream into another so that the viewer is unaware of the transition. It can be used to insert local advertisements into a broadcast program. Reencoding allows the inserted segment to be rate-adapted to the segment that it replaces. DPI is discussed in more detail in Chapter 15, "OpenCable Headend Interfaces."
Although reencoding techniques are extremely useful, feedback-controlled statistical multiplexing is superior from a compression-efficiency perspective when it is possible to collocate encoders and multiplexers. Hence, feedback-controlled statistical multiplexing tends to dominate at original encoding sites that include statistical multiplexing, whereas reencoding is appropriate at nodes where grooming of statistically multiplexed signals needs to be performed.