Top Banner
UNIT III Audio Compression
85

UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Dec 30, 2015

Download

Documents

Silvia Parks
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

UNIT III Audio Compression

Page 2: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Outline

• Psychoacoustics• Fundamentals of audio • Temporal and frequency masking• MPEG audio• Compandors• Speech compression-Introduction• Vocoders-different types

Page 3: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Psychoacoustics• The range of human hearing is about 20 Hz to

about 20 kHz

• The frequency range of the voice is typically only from about 500 Hz to 4 kHz

• The dynamic range, the ratio of the maximum sound amplitude to the quietest sound that humans can hear, is on the order of about 120 dB

3

Page 4: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Equal-Loudness Relations• Fletcher-Munson Curves

– Equal loudness curves that display the relationship between perceived loudness (“Phons”, in dB) for a given stimulus sound volume (“Sound Pressure Level”, also in dB), as a function of frequency

• Fig. 14.1 shows the ear’s perception of equal loudness:– The bottom curve shows what level of pure tone stimulus is

required to produce the perception of a 10 dB sound– All the curves are arranged so that the perceived loudness

level gives the same loudness as for that loudness level of a pure tone at 1 kHz

4

Page 5: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Fig. 14.1: Flaetcher-Munson Curves (re-measured by Robinson and Dadson)

5

Page 6: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Frequency Masking• Lossy audio data compression methods, such as MPEG/Audio

encoding, remove some sounds which are masked anyway

• The general situation in regard to masking is as follows:

1. A lower tone can effectively mask (make us unable to hear) a higher tone

2. The reverse is not true – a higher tone does not mask a lower tone well

3. The greater the power in the masking tone, the wider is its influence – the broader the range of frequencies it can mask.

4. As a consequence, if two tones are widely separated in frequency then little masking occurs

6

Page 7: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Threshold of Hearing• A plot of the threshold of human hearing for a pure tone

Fig. 14.2: Threshold of human hearing, for pure tones

7

Page 8: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Threshold of Hearing (cont’d)• The threshold of hearing curve: if a sound is above the dB

level shown then the sound is audible• Turning up a tone so that it equals or surpasses the curve

means that we can then distinguish the sound• An approximate formula exists for this curve:

(14.1)

– The threshold units are dB; the frequency for the origin(0,0) in formula (14.1) is 2,000 Hz: Threshold(f) = 0 at f =2 kHz

8

20.8 0.6( /1000 3.3) 3 4Threshold( ) 3.64( /1000) 6.5 10 ( /1000)ff f e f

Page 9: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Frequency Masking Curves• Frequency masking is studied by playing a particular pure tone, say 1

kHz again, at a loud volume, and determining how this tone affects our ability to hear tones nearby in frequency

– one would generate a 1 kHz masking tone, at a fixed sound level of 60 dB, and then raise the level of a nearby tone, e.g., 1.1 kHz, until it is just audible

• The threshold in Fig. 14.3 plots the audible level for a single masking tone (1 kHz)

• Fig. 14.4 shows how the plot changes if other masking tones are used

9

Page 10: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Fig. 14.3: Effect on threshold for 1 kHz masking tone

10

Page 11: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Fig. 14.4: Effect of masking tone at three different frequencies

11

Page 12: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Critical Bands• Critical bandwidth represents the ear’s resolving power for

simultaneous tones or partials

– At the low-frequency end, a critical band is less than100 Hz wide, while for high frequencies the width canbe greater than 4 kHz

• Experiments indicate that the critical bandwidth:

– for masking frequencies < 500 Hz: remains approximately constant in width ( about 100 Hz)– for masking frequencies > 500 Hz: increases approximately linearly with frequency

12

Page 13: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Table 14.1 25-Critical Bands and Bandwidth

Li & Drew13

Page 14: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

14

Page 15: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Bark Unit• Bark unit is defined as the width of one critical band,

for any masking frequency• The idea of the Bark unit: every critical band width is

roughly equal in terms of Barks (refer to Fig. 14.5)

Fig. 14.5: Effect of masking tones, expressed in Bark units

15

Page 16: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Conversion: Frequency & Critical Band Number• Conversion expressed in the Bark unit:

(14.2)

• Another formula used for the Bark scale:

b = 13.0 arctan(0.76 f)+3.5 arctan(f2/56.25) (14.3)

where f is in kHz and b is in Barks (the same applies to all below)

• The inverse equation:

f = [(exp(0.219*b)/352)+0.1]*b−0.032*exp[−0.15*(b−5)2] (14.4)

• The critical bandwidth (df) for a given center frequency f can also be approximated by:

df = 25 + 75 × [1 + 1.4(f2)]0.69 (14.5)

16

2

/100, for 500 ,Critical band number (Bark)

9 4log ( /1000), for 500.%

f f

f f

Page 17: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Temporal Masking• Phenomenon: any loud tone will cause the

hearing receptors in the inner ear to become saturated and require time to recover

• The following figures show the results of Masking experiments:

17

Page 18: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Fig. 14.6: The louder is the test tone, the shorter it takes for our hearing to get over hearing the masking.

18

Page 19: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Fig. 14.7: Effect of temporal and frequency maskings depending on both time and closeness in frequency.

19

Page 20: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Fig. 14.8: For a masking tone that is played for a longer time, it takes longer before a test tone can be heard. Solid curve: masking tone played for 200 msec; dashed curve: masking tone played for 100 msec.

20

Page 21: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

14.2 MPEG Audio• MPEG audio compression takes advantage of psychoacoustic

models, constructing a large multi-dimensional lookup table to transmit masked frequency components using fewer bits

• MPEG Audio Overview1. Applies a filter bank to the input to break it into its frequency

components

2. In parallel, a psychoacoustic model is applied to the data for bit allocation block

3. The number of bits allocated are used to quantize the info from the filter bank – providing the compression

21

Page 22: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

MPEG Layers• MPEG audio offers three compatible layers:

– Each succeeding layer able to understand the lower layers

– Each succeeding layer offering more complexity in the psychoacoustic model and better compression for a given level of audio quality

– each succeeding layer, with increased compression effectiveness, accompanied by extra delay

• The objective of MPEG layers: a good tradeoff betweenquality and bit-rate

22

Page 23: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

MPEG Layers (cont’d)• Layer 1 quality can be quite good provided a comparatively high bit-

rate is available

– Digital Audio Tape typically uses Layer 1 at around 192 kbps

• Layer 2 has more complexity; was proposed for use in Digital Audio Broadcasting

• Layer 3 (MP3) is most complex, and was originally aimed at audio transmission over ISDN lines

• Most of the complexity increase is at the encoder, not the decoder – accounting for the popularity of MP3 players

L23

Page 24: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

MPEG Audio Strategy• MPEG approach to compression relies on:

– Quantization– Human auditory system is not accurate within the width

of a critical band (perceived loudness and audibility of a frequency)

• MPEG encoder employs a bank of filters to:– Analyze the frequency (“spectral”) components of the audio signal by calculating a frequency transform of a window of signal values– Decompose the signal into subbands by using a bank of

filters (Layer 1 & 2: “quadrature-mirror”; Layer 3: adds a DCT; psychoacoustic model: Fourier transform)

24

Page 25: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

MPEG Audio Strategy (cont’d)• Frequency masking: by using a psychoacoustic model to

estimate the just noticeable noise level:– Encoder balances the masking behavior and the available

number of bits by discarding inaudible frequencies– Scaling quantization according to the sound level that is left over,

above masking levels

• May take into account the actual width of the critical bands:– For practical purposes, audible frequencies are divided into 25

main critical bands (Table 14.1)– To keep simplicity, adopts a uniform width for all frequency

analysis filters, using 32 overlapping subbands

25

Page 26: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

MPEG Audio Compression Algorithm

Fig. 14.9: Basic MPEG Audio encoder and decoder.

26

Page 27: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Basic Algorithm (cont’d)• The algorithm proceeds by dividing the input into 32

frequency subbands, via a filter bank– A linear operation taking 32 PCM samples, sampled in time;

output is 32 frequency coefficients

• In the Layer 1 encoder, the sets of 32 PCM values are first assembled into a set of 12 groups of 32s– an inherent time lag in the coder, equal to the time to

accumulate 384 (i.e., 12×32) samples

• Fig.14.11 shows how samples are organized– A Layer 2 or Layer 3, frame actually accumulates more than 12

samples for each subband: a frame includes 1,152 samples

L27

Page 28: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Fig. 14.11: MPEG Audio Frame Sizes

28

Page 29: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Bit Allocation Algorithm• Aim: ensure that all of the quantization noise is below the masking

thresholds

• One common scheme:– For each subband, the psychoacoustic model calculates the Signal-to-Mask

Ratio (SMR)in dB– Then the “Mask-to-Noise Ratio” (MNR) is defined as the difference (as shown in

Fig.14.12):

(14.6)

– The lowest MNR is determined, and the number of code-bits allocated to this subband is incremented

– Then a new estimate of the SNR is made, and the process iterates until there are no more bits to allocate

29

dB dB dBMNR SNR SMR

Page 30: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Fig. 14.12: MNR and SMR. A qualitative view of SNR, SMR and MNR are shown, with one dominate masker and m bits allocated to a particular critical band.

30

Page 31: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

• Mask calculations are performed in parallel with subband filtering, as in Fig. 4.13:

Fig. 14.13: MPEG-1 Audio Layers 1 and 2.

31

Page 32: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Layer 2 of MPEG-1 Audio• Main difference:

– Three groups of 12 samples are encoded in each frame and temporal masking is brought into play, as well as frequency masking

– Bit allocation is applied to window lengths of 36 samples instead of 12

– The resolution of the quantizers is increased from 15 bits to 16

• Advantage:

– a single scaling factor can be used for all three groups

32

Page 33: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Layer 3 of MPEG-1 Audio• Main difference:

– Employs a similar filter bank to that used in Layer 2, except using a set of filters with non-equal frequencies

– Takes into account stereo redundancy

– Uses Modified Discrete Cosine Transform (MDCT) — addresses problems that the DCT has at boundaries of the window used by overlapping frames by 50%:

(14.7)

33

1

0

2 / 2 1( ) 2 ( )cos 1/ 2 , 0,.., / 2 12

N

i

NF u f i i u u NN

Page 34: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Fig : MPEG-Audio Layer 3 Coding.

34

Page 35: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

• Table shows various achievable MP3 compression ratios:

Table : MP3 compression performance

35

Page 36: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

MPEG-2 AAC (Advanced Audio Coding)• The standard vehicle for DVDs:

– Audio coding technology for the DVD-Audio Recordable (DVD-AR) format, also adopted by XM Radio

• Aimed at transparent sound reproduction for theaters– Can deliver this at 320 kbps for five channels so that sound

can be played from 5 different directions: Left, Right, Center, Left-Surround, and Right-Surround

• Also capable of delivering high-quality stereo sound at bit-rates below 128 kbps

36

Page 37: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

MPEG-2 AAC (cont’d)• Support up to 48 channels, sampling rates

between 8 kHz and 96 kHz, and bit-rates up to 576 kbps per channel

• Like MPEG-1, MPEG-2, supports three different “profiles”, but with a different purpose:

– Main profile– Low Complexity(LC) profile– Scalable Sampling Rate (SSR) profile

37

Page 38: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

MPEG-4 Audio• Integrates several different audio components into one

standard: speech compression, perceptually based coders, text-to-speech, and MIDI

• MPEG-4 AAC (Advanced Audio Coding), is similar to the MPEG-2 AAC standard, with some minor changes

• Perceptual Coders– Incorporate a Perceptual Noise Substitution module– Include a Bit-Sliced Arithmetic Coding (BSAC) module– Also include a second perceptual audio coder, a vector-

quantization method entitled TwinVQ

38

Page 39: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

MPEG-4 Audio (Cont’d)• Structured Coders

– Takes “Synthetic/Natural Hybrid Coding” (SNHC) in order to have very low bit-rate delivery an option

– Objective: integrate both “natural” multimedia sequences, both video and audio, with those arising synthetically – “structured” audio

– Takes a “toolbox” approach and allows specification of many such models.

– E.g., Text-To-Speech (TTS) is an ultra-low bit-rate method, and actually works, provided one need not care what the speaker actually sounds like

39

Page 40: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Other Commercial Audio Codecs• Table 14.3 summarizes the target bit-rate range and

main features of other modern general audio codecs

Table 14.3: Comparison of audio coding systems

40

Page 41: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

The Future: MPEG-7 and MPEG-21

• Difference from current standards:

– MPEG-4 is aimed at compression using objects.

– MPEG-7 is mainly aimed at “search”: How can we find objects, assuming that multimedia is indeed coded in terms of objects

Li & Drew41

Page 42: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

– MPEG-7: A means of standardizing meta-data for audiovisual multimedia sequences – meant to represent information about multimedia information

In terms of audio: facilitate the representation and search for sound content. Example application supported by MPEG-7: automatic speech recognition (ASR).

– MPEG-21: Ongoing effort, aimed at driving a standardization effort for a Multimedia Framework from a consumer’s perspective, particularly interoperability In terms of audio: support of this goal, using audio.

42

Page 43: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Uniform Quantization

It was discussed in the previous lecture that the disadvantage of using uniform quantization is that low amplitude signals are drastically effected.

This fact can be observed by considering the simulation results in the next four slides.

In both cases two signals with a similar shape, but different amplitudes, are applied to the same quantizer with a spacing of 0.0625 between two quantization

levels.

The effects of quantization on the low amplitude signal are obviously more significant than on the high amplitude signal.

43

Page 44: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Uniform Quantization

Input Signal 1.

44

Max Amplitude = 1

Page 45: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Uniform Quantization

Quantized Signal 1. Δv=0.0625

45

Page 46: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Uniform Quantization

Input Signal 2.

46

Max Amplitude = 0.125

Page 47: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Uniform Quantization

Quantized Signal 2. Δv=0.0625

47

Page 48: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Uniform Quantization

Figure-1 Input output characteristic of a uniform quantizer.48

Page 49: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Uniform Quantization

Recall that the Signal to Quantization Noise Ratio of a uniform quantizer is given by:

This equation verifies the discussion on slide-1 that SNqR for a low amplitude signal is quite low. Therefore, the effect of quantization noise on such audio signals should be noticeable. Lets consider the case of voice

signals (see next slide)

2

22

ˆ)(

3p

q m

tmLRSN

49

Page 50: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Uniform Quantization

Click on the following links to listen to a sample voice signal. First play “voice file-1”; then play “voice file-1 Quantized”. Do you notice the degradation in voice quality?

This degradation can be attributed to uniformly spaced quantization levels.

Voice file-1 Voice file-1. Quantized (uniform)

Note: You may not notice the difference between the two clips if you are using small laptop speakers. You should use either headphones or larger speakers.

50

Page 51: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Uniform Quantization

More insight into signal degradation can be gained by looking at the voice signal’s Histogram. A histogram shows the distribution of values of data. Figure-2 below shows the histogram of the voice signal-1. Most of the values have low amplitude and occur around zero. Therefore, for voice signals uniform quantization will result in signal

degradation.

Figure-2 Histogram of voice signal-151

Page 52: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Non-Uniform Quantization

The effect of quantization noise can be reduced by increasing the number of quantization intervals in the low amplitude regions. This means that spacing between

the quantization levels should not be uniform.

This type of quantization is called “Non-Uniform Quantization”. Input-Output Characteristics shown below.

52

Page 53: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Non-uniform Quantization

Non-uniform quantization is achieved by, first passing the input signal through a “compressor”. The output of the compressor is then passed

through a uniform quantizer.

The combined effect of the compressor and the uniform quantizer is that of a non-uniform quantizer. (see figure 3.)

At the receiver the voice signal is restored to its original form by using an expander .

This complete process of Compressing and Expanding the signal before and after uniform quantization is called Companding.

53

Page 54: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

-1

1

-1

x=m(t)/mp1

y=g(x)

Non-uniform Quantization (Companding)

Input output relationship of a compressor.54

Page 55: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

))(

1ln()1ln(

1

pm

tmy

A-Law (USA)

1)(

0 pm

tmWhere,

The value of ‘µ’ used with 8-bit quantizers for voice signals is 255

Non-uniform Quantization (Companding)

55

Page 56: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Non-uniform Quantization (Companding)

The µ-law compressor characteristic curve for different values of ‘µ.’56

Page 57: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Non-uniform Quantization (Companding)

Expander

)(ˆ tm

Compressor

)(tm

Uniform Quantizer

Click on symbols to listen to voice signal at each stage

57

Page 58: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Non-uniform Quantization (Companding)

)(tm

Expander

)(ˆ tm

Compressor

)(tm

Uniform Quantizer

Click on symbols to listen to voice signal at each stage

58

The 3 stages combine to give the characteristics of a

Non-uniform quantizer.

Page 59: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Non-uniform Quantization (Companding)

)(tm )(ˆ tm

Uniform Quantizer

Click on symbols to listen to voice signal at each stage

A uniform quantizer with input and output voice files is presented here for comparison with non-uniform quantizer.

59

Page 60: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Non-Uniform Quantization

Lets have a look at the histogram of the compressed voice signal. In contrast to the histogram of the uncompressed signal (figure-2) you can see that the values are now more distributed. Therefore, it can be said that the compressor changes the histogram/ pdf of the voice signal from gaussian (bell shape) to a uniform distribution (shown

below).

Figure-3 Histogram of compressed voice signal60

Page 61: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Non-Uniform Quantization

Where is the Compression???..

The compression process in Non-uniform quantization demands some elaboration for clarity of concepts. It should be noted that the compression mentioned in previous slides is not the time or frequency domain compression which students are familiar with. This can be verified by looking at the time domain waveforms at the input and output of the compressor. Note that both the signals last for 3.75 seconds. Therefore, there is no

compression in time or frequency.

Fig-4-a Signal at Compressor Input Fig-4-b Signal at Compressor Output61

Page 62: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

Non-Uniform Quantization

Where is the Compression???..

The compression here occurs in the amplitude values. An intuitive way of explaining this compression in amplitudes is to say that the amplitudes of the compressed signal are more closely spaced (compressed) in comparison to the original signal. This can also be observed by looking at the waveform of the compressed signal (fig-4-b). The compressor boosts the small amplitudes by a large amount. However, the large amplitude values receive very small gain and the maximum value remains the same. Therefore, the small values are multiplied by a large gain and are spaced relatively

closer to the large amplitude values.

A parameter which can be used to measure the degree of compression here is the Dynamic range. “The Dynamic Range is the ratio of maximum and minimum value of a

variable quantity such as sound or light.] [ ”

In the simulations the Dynamic Range (DR) of the compressor input = 41.45 dB

Whereas Dynamic Range (DR) of compressor output = 13.95 dB

62

Page 63: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

63

Vocoders

Page 64: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

64

The Channel Vocoder (analyzer):• The channel vocoder employs a bank of bandpass

filters,– Each having a bandwidth between 100 HZ and 300 HZ.– Typically, 16-20 linear phase FIR filter are used.

• The output of each filter is rectified and lowpass filtered.– The bandwidth of the lowpass filter is selected to match

the time variations in the characteristics of the vocal tract.• For measurement of the spectral magnitudes, a

voicing detector and a pitch estimator are included in the speech analysis.

Page 65: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

65

The Channel Vocoder (analyzer block diagram):

BandpassFilter

A/DConverter

LowpassFilter

A/DConverter

LowpassFilter

Rectifier

Rectifier

BandpassFilter

Voicingdetector

Pitchdetector

Encoder

S(n)To

Channel

Page 66: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

66

The Channel Vocoder (synthesizer):

• At the receiver the signal samples are passed through D/A converters.

• The outputs of the D/As are multiplied by the voiced or unvoiced signal sources.

• The resulting signal are passed through bandpass filters.

• The outputs of the bandpass filters are summed to form the synthesized speech signal.

Page 67: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

67

The Channel Vocoder (synthesizer block diagram):D/A

Converter

Decoder

D/AConverter

VoicingInformation

Pitchperiod

Pulsegenerator

Random Noise

generator

BandpassFilter

BandpassFilter

Switch

∑OutputOutput

speechspeech

FromFrom

ChannelChannel

Page 68: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

68

The Phase Vocoder :

• The phase vocoder is similar to the channel vocoder.

• However, instead of estimating the pitch, the phase vocoder estimates the phase derivative at the output of each filter.

• By coding and transmitting the phase derivative, this vocoder destroys the phase information .

Page 69: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

69

The Phase Vocoder (analyzer block diagram):

nkcos nksin

nkcos

LowpassFilter

Encoder

LowpassFilter

DifferentiatorDifferentiator

DifferentiatorDifferentiator

Decimator

Decimator

ComputeShort-termMagnitude

AndPhase

Derivative

To

ChannelChannel

S(n)

nksin

nkcos

nak

nbk

Short-term magnitude

Short-term phase derivative

Page 70: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

70

The Phase Vocoder (synthesizer block diagram, kth channel):

nkcos

Interpolator

Decoder

FromFrom

ChannelChannel Cos

Integrator

InterpolatorSin

Decimate

Short-term

amplitude

Decimate

Short-term

Phase

derivative

nksin

Page 71: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

71

The Formant Vocoder :

• The formant vocoder can be viewed as a type of channel vocoder that estimate the first three or four formants in a segment of speech.

• It is this information plus the pitch period that is encoded and transmitted to the receiver.

Page 72: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

72

The Formant Vocoder :

• Example of formant:– (a) : The spectrogram of the utterance “day one” showing

the pitch and the harmonic structure of speech.– (b) : A zoomed spectrogram of the fundamental and the

second harmonic.

(a) (b)

Page 73: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

73

The Formant Vocoder (analyzer block diagram):

F3

F2

F1

PitchAnd

V/UDecoder

F3B3

F2B2

F1B1

V/U

F0

Fk :The frequency of the kth formant Bk :The bandwidth of the kth formant

InputInput

SpeechSpeech

Page 74: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

74

The Formant Vocoder (synthesizer block diagram):

F3

F2

F1

ExcitationSignal

F3

B3

F2

B2

F1

B1

V/U

F0

Page 75: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

75

Linear Predictive Coding :

• The objective of LP analysis is to estimate parameters of an all-pole model of the vocal tract.

• Several methods have been devised for generating the excitation sequence for speech synthesizes.

• LPC-type of speech analysis and synthesis are differ primarily in the type of excitation signal that is generated for speech synthesis.

Page 76: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

76

LPC 10 :

• This methods is called LPC-10 because of 10 coefficient are typically employed.

• LPC-10 partitions the speech into the 180 sample frame.

• Pitch and voicing decision are determined by using the AMDF and zero crossing measures.

Page 77: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

77

Residual Excited LP Vocoder :

• Speech quality in speech quality can be improved at the expense of a higher bit rate by computing and transmitting a residual error, as done in the case of DPCM.

• One method is that the LPC model and excitation parameters are estimated from a frame of speech.

Page 78: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

78

Residual Excited LP Vocoder :

• The speech is synthesized at the transmitter and subtracted from the original speech signal to form the residual error.

• The residual error is quantized, coded, and transmitted to the receiver

• At the receiver the signal is synthesized by adding the residual error to the signal generated from the model.

Page 79: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

79

RELPRELP Block Diagram :Buffer

And window

LPanalysis

Encoder

LPSynthesis

model

S(n)S(n)

To

ChannelChannelExcitation

parameters

LPLP

ParametersParameters

Page 80: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

80

Code Excited LP :

• CELP is an analysis-by-synthesis method in which the excitation sequence is selected from a codebook of zero-mean Gaussian sequence.

• The bit rate of the CELP is 4800 bps.

Page 81: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

81

CELP (analysis-by-synthesis coder) :

GaussianExcitationcodebook

PitchSynthesis

filter

SpectralEnvelope

)LP(Synthesis filter

PerceptualWeightingFilter W(z)

ComputeEnergy of Error

)square and sum(

Buffer andLP

analysis

Side

informationGain

LP

parameters

Speech samples

Index of

Excitation

sequence

+

-

Page 82: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

82

CELP (synthesizer) :

FromFrom

ChannelChanneldecoder

BufferAnd

controller

GaussianExcitationcodebook

PitchSynthesis

filter

LPSynthesis

filter

LP parameters,

gain and pitch estimate

updates

Page 83: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

83

Vector Sum Excited LP :• The VSELP coder and decoder basically differ in

method by which the excitation sequence is formed.

• In next block diagram of the VSELP, there are three excitation source.

• One excitation is obtained from the pitch period state.

• The other two excitation source are obtained from two codebook.

Page 84: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

84

Vector Sum Excited LP :

• The bit rate of the VSELP is about 8000 bps.– Bit allocations for 8000-bps VSELP

Parameters Bits/5-ms Frame Bits/20msParameters Bits/5-ms Frame Bits/20ms

10 LPC coefficients - 38

Average speech energy - 5

Excitation codewords

from two VSELP

codebooks 14 56

Gain parameters 8 32

Lag of pitch filter 7 28

Total 29 159

Page 85: UNIT III Audio Compression Outline Psychoacoustics Fundamentals of audio Temporal and frequency masking MPEG audio Compandors Speech compression-Introduction.

85

VSELP Decoder :

1

0

Long-termFilter state

Codebook1

Codebook2

∑Pitch

synthesisfilter

Spectralpost filter

Spectralenvelop

)LP(synthesis

filter

Synthetic

Speech

2