MPEG Standards for Audio

8/19/2019 MPEG Standards for Audio

1/46

ECN-516

Topic: MPEG Standards for Audio


2/46

Psychoacoustics

These methods are related to how humans actually hear sounds:

Human hearing and voice

Frequency range is about 20 Hz to 20 kHz, most sensitive at 1 to 5 KHz. Dynamic range (quietest to loudest) is about 96 dB

Normal voice range is about 500 Hz to 2 kHz

Low frequencies are vowels and bass

High frequencies are consonants


3/46

Human Hearing and Voice

Experiment:

Put a person in a quiet room. Raise level of 1 kHz tone until just

barely audible. Vary the frequency and plot

dB

0

10

20

30

40

2 4 6 8 10 12

kHz


4/46

Psychoacoustics

How sensitive is human hearing?

To answer this question we look at the following concepts:

Threshold of hearing

Describes the notion of “quietness”

Frequency Masking

A component (at a particular frequency) masks components at neighboringfrequencies. Such masking may be partial.

Temporal Masking

When two tones (samples) are played closed together in time, one can mask theother.


5/46

Threshold of hearing

The ear is most sensitive to frequencies between 1 and 5

kHz, where we can actually hear signals below 0 dB.

Two tones of equal power and different frequencies will

not be equally loud.

Sensitivity decreases at low and high frequencies.

dB

0

10

20

30

40

2 4 6 8 10 12

Will not be heard anyway; discard!

kHz


6/46

Frequency Masking

Question: Do receptors interfere with each other?

Experiment: Play 1 kHz tone (masking tone) at fixed level (60 dB).

Play test tone at a different level (e.g., 1.1kHz), and raise level

until just distinguishable.

Vary the frequency of the test tone and plot the threshold when it becomes audible:

Repeat for various frequencies of masking tones


7/46

Frequency Masking

A tone at a certain frequency will raise the threshold in a

critical band around that frequency.

The masker raises the threshold of audibility so that the

adjacent tone above it is no longer audible.


8/46

Critical bands

Perceptually uniform measure of frequency, non-

proportional to width of masking curve

About 100 Hz for masking frequency < 500 Hz, grow

larger and larger above 500 Hz.

The width is called the size of the critical band


9/46

Critical bands

The human auditory system has a limited, frequencydependent resolution.

This frequency dependence is expressed in the form of critical band widths, less then 100 Hz for low and morethen 4 kHz for high frequencies.

The human ear blurs the various signal components insidea critical band.


10/46

Temporal Masking

If we hear a loud sound, and then it stops, it takes a little while until wecan hear a soft tone nearby (in frequency).

Question: how to quantify?

Experiment:

Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40 dB. Test tonecan't be heard (it's masked).

Stop masking tone, then stop test tone after a short delay.

Adjust delay time to the shortest time that test tone can be heard (e.g., 5 ms).

Repeat with different level of the test tone and plot:

Try other frequencies for test tone (masking tone duration constant). Total effect of

masking

The temporal masking effect is the masking that occurs when a sound

raises the audibility threshold for a brief interval preceding and

following the sound.


11/46

Temporal Masking

If we hear a loud sound, and then it stops, it takes a little while until we

can hear a soft tone nearby (in frequency).

The temporal masking effect is the masking that occurs when a sound

raises the audibility threshold for a brief interval preceding and

following the sound.

Energy

TimeStrong sound (”masker”)

Forward (post) maskingApprox. 100 ms

Backward (pre) masking< 10 ms


12/46

Observation

If we have a loud tone at, say, 1 kHz, then nearby quieter

tones are masked.

Best compared on critical band scale - range of masking isabout 1 critical band

Two factors for masking - frequency masking and temporal

masking

Question: How to use this for compression?


13/46

MPEG

Moving Picture Experts Group (MPEG)

Established in 1988

Standards under

International Organization for standardization (ISO)

International Electrotechnical Commission (IEC)

Official name is: ISO/IEC JTC1 SC29 WG11


14/46

MPEG

First High Fidelity Audio standard

Part of a multiple standard for

Video compression

Audio compression

Audio, Video and Data synchronization at an aggregate rate of 1.5

Mbit/sec


15/46

MPEG Audio

Physically lossy compression algorithm

Perceptually lossless, transparent algorithm

Exploits perceptual properties of human ear

Psychoacoustic modeling


16/46

MPEG Audio Standard

Ensures inter-operability

Defines coded bitstream syntax

Defines decoding process

Guarantees decoder’s accuracy


17/46

MPEG audio features

No assumptions about the nature of the audio source

Exploitation of human auditory system perceptual

limitations

Removal of perceptually irrelevant parts of audio signal


18/46

MPEG audio sampling rates

32 kHz

44.1 kHz

48 kHz


19/46

MPEG Audio Overview

Facts

The two most common advanced (beyond simple ADPCM) techniquesfor audio coding are:

Sub-Band Coding (SBC) based

Adaptive Transform Coding based

MPEG audio coding is comprised of three independent layers. Eachlayer is a self-contained SBC coder with its own time-frequency

mapping, psychoacoustic model, and quantizer.

Layer I: Uses sub-band coding

Layer II: Uses sub-band coding (longer frames, more compression)

Layer III: Uses both sub-band coding and transform coding.

MPEG-1 Audio is intended to take a PCM audio signal sampled at a

rate of 32, 44.1 or 48 kHz, and encode it at a bit rate of 32 to 192 kbps

per audio channel (depending on layer).


20/46

MPEG Audio Compression


21/46


22/46


23/46

MPEG Coding Specifics

AudioSamples

Sub-band filter 0

Sub-band filter 1

Sub-band filter 2

Sub-band filter 31

.

.

.

12samples

12samples

12samples

12samples

12samples

12samples

Layer IFrame

Layer II, IIIFrame

.

.

.

..

.


24/46

The Polyphase Filter Bank

Key component common to all layers

Divides the audio signal into 32 equal-width frequency

subbands

The filters provide good time and reasonable frequency

resolution

Critical bands associated with psychoacoustic models


25/46


26/46

MPEG Audio Psycho-acoustic Model

MPEG audio compresses by removing acoustically

irrelevant parts of audio signals

Takes advantage of human auditory systems inability tohear quantization noise under auditory masking

Analyzes the audio signal and computes the amount of

noise masking as a function of frequency

The encoder decides how best to represent the input signalwith a minimum number of bits


27/46

Basic Steps in Psychoacoustic Model

Time align audio data

Convert audio to frequency domain representation

Process spectral values into tonal and non-tonal components

Apply a spreading function

Set a lower bound for threshold values

Find the threshold values for each subband

Calculate the signal to mask ratio


28/46

Masking and Quantization (Example)

Say, performing the sub-band filtering step on the input results in thefollowing values (for demonstration, we are only looking at the first16 of the 32 bands):

Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Level 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1

The 60 dB level of the 8th band gives a masking of 12 dB in the 7th band, 15 dB in the 9th. (according to the Psychoacoustic model)

The level in 7th band is 10 dB ( < 12 dB ), so ignore it.

The level in 9th band is 35 dB ( > 15 dB ), so send it.

We only send the amount above the masking level.

Therefore, instead of using 6 bits to encode it, we can use 4 bits -- asaving of 2 bits (= 12 dB).

“determine number of bits needed to represent the coefficient such that, thenoise introduced by quantization is below the masking effect” [noiseintroduced = 12 dB; masking = 15 dB]


29/46

MPEG Audio Layer I

Simplest coding

Suitable for bit rates above 128 kbits/sec per channel

Philips Digital Compact Cassette


30/46

MPEG Audio Layer II

Intermediate complexity

Bit rates around 128 kbits/sec per channel

Digital Audio Broadcasting (DAB)

Synchronized Video and Audio on CD-ROM

Full motion CD-I

Video-CD


31/46


32/46

MPEG Layer III coding

Based on Layer I & II filter banks

Compensation of filter deficiencies by processing outputswith a Modified Discrete Cosine Transform


33/46

MPEG Layer III enhancements

Alias reduction

Non uniform quantization

Scalefactor bands

Entropy coding of data values

Use of a “bit reservoir”


34/46

Effectiveness of MPEG Audio

*Quality factor:

5 – perfect

4 - just noticeable

3 - slightly annoying

2 – annoying

1 - very annoying

16 bits stereo sampled at 48 KHz => 768

Layer I: 192 kbits/sec => Compression ration of (768/192) = 4:1

Layer II: 128 kbits/sec => Compression ration of (768/128) = 6:1

Layer II: 64 kbits/sec => Compression ration of (768/64) = 12:1

Layer

Targetbit-ratefor eachchannel

RatioQuality*

at

64 kbps

Quality at128 kbps

Layer I 192 kbps 4:1 -- --

Layer II 128 kbps 6:1 2.1 to 2.6 4+

Layer III 64 kbps 12:1 3.6 to 3.8 4+


35/46

MPEG 1

First standard to be published by the MPEG organization

(in 1992)

A standard for storage and retrieval of moving pictures and

audio on storage media

Example formats: VideoCD (VCD), mp3, mp2


36/46

MPEG-1 Layers I, II, III

MPEG layer differences lie in processing power and

resulting audio/sound quality

Mp1 – little processing needed, poor quality

Mp2 – minimal processing, “okay” quality

Mp3 – massive processing, high “CD” quality


37/46

MPEG-1 Audio Layer II

Called MP2

Dominant standard for audio broadcasting

DAB digital radio and DVB digital television

Sampling rates: 32, 44.1, 48 kHz

Bit rates: 32, 48, 56, 64, 80, 96, … 384 kbps

Format: mono, stereo, dual channel, …

MP2 – sub-band audio encoder in time domain


38/46

MPEG-1 Audio Layer III

MPEG-1 Layer III is called MP3 format

Popular for PC and Internet applications

Goal to compress to 128 kbps, but can be compressed to higher orlower resulting quality

Utilization of psychoacoustics

Scientific study of sound perception


39/46

MPEG-1 Audio Encoding

Characteristics

Precision 16 bits

Sampling frequency: 32KHz, 44.1 KHz, 48 KHz

3 compression layers: Layer 1, Layer 2, Layer 3 (MP3)

Layer 3: 32-320 kbps, target 64 kbps




40/46

MPEG-2

Extends video & audio compression of MPEG-1

Substantially reduces bandwidth required for high-qualitytransmissions

Optimizes balance between resolution (quality) and bandwidth (speed)

HDTV(Grand Alliance)

ITU-R HDTV

International Telecommunication Union Radiocommunication Sector

16/9 ASPECT RATIO

Audio: Dolby AC-3

DVB HDTV Digital video broadcasting

4/3 ASPECT RATIO

MPEG audio layer 2


41/46

MPEG-2 Advanced Audio Coding (AAC)

codec

Sampling frequencies from 8 kHz to 96k Hz

1 to 48 channels per stream

Temporal Noise Shaping (TNS) smooths quantizationnoise by making frequency domain predictions

Prediction: Allows predictable sound patterns such as

speech to be predicted and compressed with better quality


42/46

MPEG-4

Submergence

Handle specific requirements from rapidly developing multimedia applications

Advantages over MPEG-1 and MPEG-2

Object-oriented coding

Applications: Digital TV

TV logos, Customized advertising, Multi-window screen

Mobile multimedia Cell phones and palm computers

Games

Personalize games

Streaming Video

News updates and live music shows over Internet


43/46

MPEG 7

Content representation standard for information search

Makes searching the Web for multimedia content as easyas searching for text-only files

Operates in both real-time and non real-time environments


44/46

MPEG 21

“Multimedia framework”

Based on two essential concepts:1. Digital Item

2. Concept of Users interacting with Digital Item

More universal framework for digital content protection


45/46

MPEG Standards

MPEG-1 : a standard for storage and retrieval of moving pictures and

audio on storage media

MPEG-2 : a standard for digital television

MPEG-4 : a standard for multimedia applications

MPEG-7 : a content representation standard for information search

MPEG-21: offers metadata information for audio and video files


46/46

Reference

“A Tutorial on MPEG/Audio Compression”, Davis Pan,

I E E E M u l ti medi a , pp. 60-74, 1995.

MPEG Standards for Audio

Documents