Voice Compression
Two basic variations of 64 Kbps PCM are commonly used: µ-law and a-law. The methods are similar in that they both use logarithmic compression to achieve 12 to 13 bits of linear PCM quality in 8 bits, but they are different in relatively minor compression details (µ-law has a slight advantage in low-level, signal-to-noise ratio performance). Usage is historically along country and regional boundaries, with North America using µ-law and Europe and other countries using a-law modulation. It is important to note that when making a long-distance call, any required µ-law to a-law conversion is the responsibility of the µ-law country.
Another compression method used often is adaptive differential pulse code modulation (ADPCM). A commonly used instance of ADPCM is ITU-T G.726, which encodes using 4-bit samples, giving a transmission rate of 32 Kbps. Unlike PCM, the 4 bits do not directly encode the amplitude of speech, but they do encode the differences in amplitude, as well as the rate of change of that amplitude, employing some rudimentary linear prediction.
PCM and ADPCM are examples of waveform codecs—compression techniques that exploit redundant characteristics of the waveform itself. New compression techniques were developed over the past 10 to 15 years that further exploit knowledge of the source characteristics of speech generation. These techniques employ signal processing procedures that compress speech by sending only simplified parametric information about the original speech excitation and vocal tract shaping, requiring less bandwidth to transmit that information.
These techniques can be grouped together generally as source codecs and include variations such as linear predictive coding (LPC), code excited linear prediction compression (CELP), and multipulse, multilevel quantization (MP-MLQ).
Voice Coding Standards
The ITU-T standardizes CELP, MP-MLQ PCM, and ADPCM coding schemes in its G-series recommendations. The most popular voice coding standards for telephony and packet voice include:
- G.711 —Describes the 64 Kbps PCM voice coding technique outlined earlier; G.711-encoded voice is already in the correct format for digital voice delivery in the public phone network or through Private Branch eXchanges (PBXs).
- G.726 —Describes ADPCM coding at 40, 32, 24, and 16 Kbps; you also can interchange ADPCM voice between packet voice and public phone or PBX networks, provided that the latter has ADPCM capability.
- G.728 —Describes a 16 Kbps low-delay variation of CELP voice compression.
- G.729 —Describes CELP compression that enables voice to be coded into 8 Kbps streams; two variations of this standard (G.729 and G.729 Annex A) differ largely in computational complexity, and both generally provide speech quality as good as that of 32 Kbps ADPCM.
- G.723.1 —Describes a compression technique that you can use to compress speech or other audio signal components of multimedia service at a low bit rate, as part of the overall H.324 family of standards. Two bit rates are associated with this coder: 5.3 and 6.3 Kbps. The higher bit rate is based on MP-MLQ technology and provides greater quality. The lower bit rate is based on CELP, provides good quality, and affords system designers with additional flexibility.
- iLBC (Internet Low Bitrate Codec) —A free speech codec suitable for robust voice communication over IP. The codec is designed for narrow band speech and results in a payload bit rate of 13.33 kbps with an encoding frame length of 30 ms and 15.20 kbps with an encoding length of 20 ms. The iLBC codec enables graceful speech quality degradation in the case of lost frames, which occurs in connection with lost or delayed IP packets. The basic quality is higher than G.729A, with high robustness to packet loss. The PacketCable consortium and many vendors have adopted iLBC as a preferred codec. It is also being used by many PC-to-Phone applications, such as Skype, Google Talk, Yahoo! Messenger with Voice, and MSN Messenger.
Mean Opinion Score
You can test voice quality in two ways: subjectively and objectively. Humans perform subjective voice testing, whereas computers—which are less likely to be "fooled" by compression schemes that can "trick" the human ear—perform objective voice testing.
Codecs are developed and tuned based on subjective measurements of voice quality. Standard objective quality measurements, such as total harmonic distortion and signal-to-noise ratios, do not correlate well to a human's perception of voice quality, which in the end is usually the goal of most voice compression techniques.
A common subjective benchmark for quantifying the performance of the speech codec is the mean opinion score (MOS). MOS tests are given to a group of listeners. Because voice quality and sound in general are subjective to listeners, it is important to get a wide range of listeners and sample material when conducting a MOS test. The listeners give each sample of speech material a rating of 1 (bad) to 5 (excellent). The scores are then averaged to get the mean opinion score.
MOS testing also is used to compare how well a particular codec works under varying circumstances, including differing background noise levels, multiple encodes and decodes, and so on. You can then use this data to compare against other codecs.
MOS scoring for several ITU-T codecs is listed in Table 7-3. This table shows the relationship between several low-bit rate coders and standard PCM.
Table 7-3. ITU-T Codec MOS Scoring
Compression Method |
Bit Rate (Kbps) |
Sample Size (ms) |
MOS Score |
G.711 PCM |
64 |
0.125 |
4.1 |
G.726 ADPCM |
32 |
0.125 |
3.85 |
G.728 Low Delay Code Excited Linear Predictive (LD-CELP) |
15 |
0.625 |
3.61 |
G.729 Conjugate Structure Algebraic Code Excited Linear Predictive (CS-ACELP) |
8 |
10 |
3.92 |
G.729a CS-ACELP |
8 |
10 |
3.7 |
G.723.1 MP-MLQ |
6.3 |
30 |
3.9 |
G.723.1 ACELP |
5.3 |
30 |
3.65 |
iLBC Freeware |
15.2 |
20 |
3.9 |
13.3 |
30 |
For iLBC codec - Research Paper - "COMPARISONS OF FEC AND CODEC ROBUSTNESS ON VOIP QUALITY AND BANDWIDTH EFFICIENCY" - WENYU JIANG AND HENNING SCHULZRINNE. Columbia University, Department of Computer Science, USA.
Perceptual Speech Quality Measurement
Although MOS scoring is a subjective method of determining voice quality, it is not the only method for doing so. The ITU-T put forth recommendation P.861, which covers ways you can objectively determine voice quality using Perceptual Speech Quality Measurement (PSQM).
PSQM has many drawbacks when used with voice codecs (vocoders). One drawback is that what the "machine" or PSQM hears is not what the human ear perceives. In layman's terms, a person can trick the human ear into perceiving a higher-quality voice, but a computer cannot be tricked. Also, PSQM was developed to "hear" impairments caused by compression and decompression and not packet loss or jitter.