Hearing and Audio Compression

1

Hearing and Audio Compression

Prof. Les AtlasDepartment of Electrical Engineering, University

of Washington

• Questions Discussed:– Why compress audio signals such as music?– What is the sensitivity of our ears to

different frequencies?– What is the role of critical bands in our

hearing?– What is the role of masking in our hearing?– How does MP3 and AAC+ (both used in, say,

iPods) use these properties of hearing and how much can they compress?

2

Why Compress Audio Signals?

• Compact disks use uncompressed pulse code modulation(PCM) (Mbps=Million bits per second)– Too many bits for most modern applications

• A single 4 minute song, uncompressed, needs

• Definition: CODEC: The encoder and decoder pair used to reduce (compress) the storage and/or bandwidth needs for signals in the audio range.

• Traditional lossless compression methods (e.g. zip file coding) when used alone, don’t provide enough compression.

Sampling rate Audio rate Overhead Total bit rate44.1 KHz 1.41 Mbps 2.91 Mbps 4.32 Mbps

4.32 Mbits 4 min 60 sec 1036.8 Mbits of storage 1 Gbit× × = >

3

Another Key Definition, dB

• Definition: decibel (dB) Sound Pressure Level (SPL)

• One pascal of pressure is one newton per square meter.

• Examples:– 0 dB SPL: Threshold of human hearing (with healthy

ears); sound of a mosquito flying 3 m (10 ft) away– 50 dB SPL: Inside a quiet restaurant– 110 dB SPL: Football stadium during kickoff at 50 yard

line; chainsaw at 1 m (3 ft)– 130 dB SPL: Threshold of pain– 150 dB SPL: Jet engine at 30 m (100 ft)– 194 dB SPL: Theoretical limit at sea level; pressure

waves with a greater intensity behave as shock waves

SPL 10sound pressure in pascalsdB 20log

20 micropascals=

4

Where is Audio Compression Used?

Some examples:– Internet and compact music players, e.g. Fraunhofer’s

MPEG-1 Layer 3 (.mp3) and Microsoft’s Windows Media Audio (.wma) and iTunes standard MPEG-4 AAC

– Digital audio broadcasting• Where mp3 got its start

– Satellite radio– Sony’s Mini-Disc (ATRAC)– Surround sound systems such as

• Dolby’s AC-3• Sony’s SDDS• Digital Theater System’s DTS

5

Some Simple Lossy Compression Methods

• Silence compression — don’t transmit the “silence,” and hence save those bits– related to run-length coding

• Adaptive differential pulse code modulation (ADPCM)– ADPCM encodes only the difference between two or

more consecutive samples; the difference is then quantized --> hence the usually slight loss is fidelity

• These techniques still don’t save enough bits• What’s needed to save more bits: More

understanding of the receiver: our ears!

6

What is the Sensitivity of our Ears to Different Frequencies?• Psychoacoustics--

The Study of How We Perceive Sound• Human hearing and voice

– Frequency range of our hearing is about 40 Hz to 14 kHz• We are the most sensitive at 2 to 4 KHz.

– Usual dynamic range (quietest to loudest) is about 96 dB – Normal voice range is about 80 Hz to 8 kHz

• Some examples:– middle A on a piano is at 440 Hz. A 440 Hz tone– A swept tone from 65 to 4186 Hz

7

Hearing Sensitivity and Frequency

• The basic psychoacoustic Experiment– Put a person in a quiet room. Raise the level of a fixed frequency

tone until it is just barely audible. Vary the frequency and plot these auditory thresholds.

• Typical result for a normal hearing individual

8

Hearing Sensitivity and LossyCompression• Same figure as last slide:

• Any ideas yet on how to save some bits?

9

What is the Role of Critical Bands and Masking in our Hearing?• The peripheral structures of hearing:

10

Frequency Analysis by our Ears

• Our ears’ cochleas do an approximate analysis of frequency, similar to an imperfect “Fourier transform.”

• Our ears aren’t so perfect. For example:– The human auditory system has limited frequency

resolution.

• Moreover, our ears’ frequency resolution decreases with increasing frequency.

11

Critical Bands

• A perceptually uniform measure of frequency can be expressed in terms of the width of “critical bands.”

• The bands have width less than 100 Hz at the lowest audible frequencies, and more than 4 kHz at the high end.

• Altogether, the audio frequency range can be partitioned into 25 critical bands.

12

What is the Role of Masking in our Hearing?• Do our ear’s frequency channels interfere with

each other?– Yes. This effect is called “frequency masking”

• Example of masking experiment:– Play 250 Hz tone (masker) at fixed level (65 dB). Play a

second test tone (e.g., 180 Hz) at a different level and raise its level until it’s just distinguishable (threshold).

– Vary the frequency of test tones and plot their thresholds

13

Example Plot of Frequency Masking

14

Temporal Masking

• If we hear a loud sound, then it stops, it takes time until we can perceive a subsequent tone.

• Experiment:– Play 1 kHz masking tone at 60 dB, then, after a short delay, a test

tone at 1.1 kHz at 40 dB. – Adjust the delay time to the shortest time needed for the test tone

to be heard.– Repeat with different levels of the test tone and plot:

15

Combined Effect of BothFrequency and Temporal Masking

• Any ideas on how to save some more bits?

16

How does MP3 use These Properties of Hearing and How Much Can it Compress?

• What we call MP3 is actually MPEG-1 Layer 3, where “MPEG” stands for Motion Pictures Experts Group.

• “Layer 1” through “Layer 3” represent increasing levels of MPEG-1 algorithm complexity and, hence, compression.

17

Original MPEG-1

• 1.5 Mbits/sec total for audio and video • About 1.2 Mbits/sec for video, 0.3 Mbits/sec for

audio • Compression factor ranges from 2.7 to 24. • Transparency: With compression rate 6:1 (16 bits

stereo sampled at 48 KHz thus compressed to 256 kbits/sec) and optimal listening conditions, expert listeners can not distinguish between coded and original audio clips.

• MPEG audio supports input sampling frequencies of 32, 44.1 and 48 KHz.

18

MPEG-1 Stereo

• Supports one or two audio channels in one of the four modes: 1.Monophonic – single audio channel 2.Dual-monophonic – two fully independent channels, e.g.,

separate English and French movie tracksJoint stereo coding:3.Intensity stereo – for each frequency channel send only

the average difference in power4.Mid-side stereo: ( )M L R= + ( )S L R= −

( )/2L M S= + ( )/2R M S= −

19

Basic Steps in the MP3 Audio Encoding Algorithm1. Use filters to divide the audio signal into 32 frequency

subbands. 2. Determine amount of masking for each band caused by

nearby band using a psychoacoustic masking model3. If the power in a band is below the masking threshold,

don't encode that band. 4. Otherwise, determine number of bits needed to represent

the coefficient such that noise introduced by quantization is below the masking effect.

5. Format the bitstream for transmission.

20

Example Use of the Frequency Masking Model in MP3• The levels of the first 16 of the 32 sub-bands are: ------------------------------------------------------------------------------------Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Level (dB) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1------------------------------------------------------------------------------------

• The signal level in 7th band is 10 dB ( < 12 dB of a precomputed masking model ), so ignore it.

• The signal level in 9th band is 35 dB ( > 15 dB of a precomputed masking model ), so send it.

• Only the signals above the masking level needs to be sent, so instead of using 6 bits to encode bands 7-9, we can use 4 bits: a saving of 2 bits.

21

Steps in MP3

From: http://www.iis.fhg.de/amm/techinf/layer3/index.html

22

Typical Performance Data for MP3

Quality Bandwidth Mono or Stereo? KBits/s Compression

Telephone 2.5 kHz M 8 96:1Short-Wave + 4.5 kHz M 16 48:1AM Radio + 7.5 kHz M 32 24:1FM Radio 11 kHz S 56 - 64 26 - 24:1Near-CD 15 kHz S 96 16:1CD 15+kHz S 112 -128 14 - 12:1

From: http://www.iis.fhg.de/amm/techinf/layer3/index.html

23

Conclusions

• Which EE and CSE areas does MP3 and related audio compression use?– Digital signal processing, which provides guaranteed

accuracy, perfect reproducibility, and new approaches which simply couldn’t be done in the analog world.

– Information theory, especially rate/distortion theory– Psychoacoustics for perceptual coding– Hardware for inexpensive real-time decoders– Dense memories, flash or disk, for storing lots of songs– Software and network engineering for efficient and low

latency approaches and for file sharing– Clever human interfaces for indexing and selecting songs – Digital rights management to try to make you pay for

songs

Hearing and Audio Compression

Documents