8/19/2019 MPEG Standards for Audio
1/46
ECN-516
Topic: MPEG Standards for Audio
8/19/2019 MPEG Standards for Audio
2/46
Psychoacoustics
These methods are related to how humans actually hear sounds:
Human hearing and voice
Frequency range is about 20 Hz to 20 kHz, most sensitive at 1 to 5 KHz. Dynamic range (quietest to loudest) is about 96 dB
Normal voice range is about 500 Hz to 2 kHz
Low frequencies are vowels and bass
High frequencies are consonants
8/19/2019 MPEG Standards for Audio
3/46
Human Hearing and Voice
Experiment:
Put a person in a quiet room. Raise level of 1 kHz tone until just
barely audible. Vary the frequency and plot
dB
0
10
20
30
40
2 4 6 8 10 12
kHz
8/19/2019 MPEG Standards for Audio
4/46
Psychoacoustics
How sensitive is human hearing?
To answer this question we look at the following concepts:
Threshold of hearing
Describes the notion of “quietness”
Frequency Masking
A component (at a particular frequency) masks components at neighboringfrequencies. Such masking may be partial.
Temporal Masking
When two tones (samples) are played closed together in time, one can mask theother.
8/19/2019 MPEG Standards for Audio
5/46
Threshold of hearing
The ear is most sensitive to frequencies between 1 and 5
kHz, where we can actually hear signals below 0 dB.
Two tones of equal power and different frequencies will
not be equally loud.
Sensitivity decreases at low and high frequencies.
dB
0
10
20
30
40
2 4 6 8 10 12
Will not be heard anyway; discard!
kHz
8/19/2019 MPEG Standards for Audio
6/46
Frequency Masking
Question: Do receptors interfere with each other?
Experiment: Play 1 kHz tone (masking tone) at fixed level (60 dB).
Play test tone at a different level (e.g., 1.1kHz), and raise level
until just distinguishable.
Vary the frequency of the test tone and plot the threshold when it becomes audible:
Repeat for various frequencies of masking tones
8/19/2019 MPEG Standards for Audio
7/46
Frequency Masking
A tone at a certain frequency will raise the threshold in a
critical band around that frequency.
The masker raises the threshold of audibility so that the
adjacent tone above it is no longer audible.
8/19/2019 MPEG Standards for Audio
8/46
Critical bands
Perceptually uniform measure of frequency, non-
proportional to width of masking curve
About 100 Hz for masking frequency < 500 Hz, grow
larger and larger above 500 Hz.
The width is called the size of the critical band
8/19/2019 MPEG Standards for Audio
9/46
Critical bands
The human auditory system has a limited, frequencydependent resolution.
This frequency dependence is expressed in the form of critical band widths, less then 100 Hz for low and morethen 4 kHz for high frequencies.
The human ear blurs the various signal components insidea critical band.
8/19/2019 MPEG Standards for Audio
10/46
Temporal Masking
If we hear a loud sound, and then it stops, it takes a little while until wecan hear a soft tone nearby (in frequency).
Question: how to quantify?
Experiment:
Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40 dB. Test tonecan't be heard (it's masked).
Stop masking tone, then stop test tone after a short delay.
Adjust delay time to the shortest time that test tone can be heard (e.g., 5 ms).
Repeat with different level of the test tone and plot:
Try other frequencies for test tone (masking tone duration constant). Total effect of
masking
The temporal masking effect is the masking that occurs when a sound
raises the audibility threshold for a brief interval preceding and
following the sound.
8/19/2019 MPEG Standards for Audio
11/46
Temporal Masking
If we hear a loud sound, and then it stops, it takes a little while until we
can hear a soft tone nearby (in frequency).
The temporal masking effect is the masking that occurs when a sound
raises the audibility threshold for a brief interval preceding and
following the sound.
Energy
TimeStrong sound (”masker”)
Forward (post) maskingApprox. 100 ms
Backward (pre) masking< 10 ms
8/19/2019 MPEG Standards for Audio
12/46
Observation
If we have a loud tone at, say, 1 kHz, then nearby quieter
tones are masked.
Best compared on critical band scale - range of masking isabout 1 critical band
Two factors for masking - frequency masking and temporal
masking
Question: How to use this for compression?
8/19/2019 MPEG Standards for Audio
13/46
MPEG
Moving Picture Experts Group (MPEG)
Established in 1988
Standards under
International Organization for standardization (ISO)
International Electrotechnical Commission (IEC)
Official name is: ISO/IEC JTC1 SC29 WG11
8/19/2019 MPEG Standards for Audio
14/46
MPEG
First High Fidelity Audio standard
Part of a multiple standard for
Video compression
Audio compression
Audio, Video and Data synchronization at an aggregate rate of 1.5
Mbit/sec
8/19/2019 MPEG Standards for Audio
15/46
MPEG Audio
Physically lossy compression algorithm
Perceptually lossless, transparent algorithm
Exploits perceptual properties of human ear
Psychoacoustic modeling
8/19/2019 MPEG Standards for Audio
16/46
MPEG Audio Standard
Ensures inter-operability
Defines coded bitstream syntax
Defines decoding process
Guarantees decoder’s accuracy
8/19/2019 MPEG Standards for Audio
17/46
MPEG audio features
No assumptions about the nature of the audio source
Exploitation of human auditory system perceptual
limitations
Removal of perceptually irrelevant parts of audio signal
8/19/2019 MPEG Standards for Audio
18/46
MPEG audio sampling rates
32 kHz
44.1 kHz
48 kHz
8/19/2019 MPEG Standards for Audio
19/46
MPEG Audio Overview
Facts
The two most common advanced (beyond simple ADPCM) techniquesfor audio coding are:
Sub-Band Coding (SBC) based
Adaptive Transform Coding based
MPEG audio coding is comprised of three independent layers. Eachlayer is a self-contained SBC coder with its own time-frequency
mapping, psychoacoustic model, and quantizer.
Layer I: Uses sub-band coding
Layer II: Uses sub-band coding (longer frames, more compression)
Layer III: Uses both sub-band coding and transform coding.
MPEG-1 Audio is intended to take a PCM audio signal sampled at a
rate of 32, 44.1 or 48 kHz, and encode it at a bit rate of 32 to 192 kbps
per audio channel (depending on layer).
8/19/2019 MPEG Standards for Audio
20/46
MPEG Audio Compression
8/19/2019 MPEG Standards for Audio
21/46
8/19/2019 MPEG Standards for Audio
22/46
8/19/2019 MPEG Standards for Audio
23/46
MPEG Coding Specifics
AudioSamples
Sub-band filter 0
Sub-band filter 1
Sub-band filter 2
Sub-band filter 31
.
.
.
12samples
12samples
12samples
12samples
12samples
12samples
Layer IFrame
Layer II, IIIFrame
.
.
.
..
.
8/19/2019 MPEG Standards for Audio
24/46
The Polyphase Filter Bank
Key component common to all layers
Divides the audio signal into 32 equal-width frequency
subbands
The filters provide good time and reasonable frequency
resolution
Critical bands associated with psychoacoustic models
8/19/2019 MPEG Standards for Audio
25/46
8/19/2019 MPEG Standards for Audio
26/46
MPEG Audio Psycho-acoustic Model
MPEG audio compresses by removing acoustically
irrelevant parts of audio signals
Takes advantage of human auditory systems inability tohear quantization noise under auditory masking
Analyzes the audio signal and computes the amount of
noise masking as a function of frequency
The encoder decides how best to represent the input signalwith a minimum number of bits
8/19/2019 MPEG Standards for Audio
27/46
Basic Steps in Psychoacoustic Model
Time align audio data
Convert audio to frequency domain representation
Process spectral values into tonal and non-tonal components
Apply a spreading function
Set a lower bound for threshold values
Find the threshold values for each subband
Calculate the signal to mask ratio
8/19/2019 MPEG Standards for Audio
28/46
Masking and Quantization (Example)
Say, performing the sub-band filtering step on the input results in thefollowing values (for demonstration, we are only looking at the first16 of the 32 bands):
Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Level 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1
The 60 dB level of the 8th band gives a masking of 12 dB in the 7th band, 15 dB in the 9th. (according to the Psychoacoustic model)
The level in 7th band is 10 dB ( < 12 dB ), so ignore it.
The level in 9th band is 35 dB ( > 15 dB ), so send it.
We only send the amount above the masking level.
Therefore, instead of using 6 bits to encode it, we can use 4 bits -- asaving of 2 bits (= 12 dB).
“determine number of bits needed to represent the coefficient such that, thenoise introduced by quantization is below the masking effect” [noiseintroduced = 12 dB; masking = 15 dB]
8/19/2019 MPEG Standards for Audio
29/46
MPEG Audio Layer I
Simplest coding
Suitable for bit rates above 128 kbits/sec per channel
Philips Digital Compact Cassette
8/19/2019 MPEG Standards for Audio
30/46
MPEG Audio Layer II
Intermediate complexity
Bit rates around 128 kbits/sec per channel
Digital Audio Broadcasting (DAB)
Synchronized Video and Audio on CD-ROM
Full motion CD-I
Video-CD
8/19/2019 MPEG Standards for Audio
31/46
8/19/2019 MPEG Standards for Audio
32/46
MPEG Layer III coding
Based on Layer I & II filter banks
Compensation of filter deficiencies by processing outputswith a Modified Discrete Cosine Transform
8/19/2019 MPEG Standards for Audio
33/46
MPEG Layer III enhancements
Alias reduction
Non uniform quantization
Scalefactor bands
Entropy coding of data values
Use of a “bit reservoir”
8/19/2019 MPEG Standards for Audio
34/46
Effectiveness of MPEG Audio
*Quality factor:
5 – perfect
4 - just noticeable
3 - slightly annoying
2 – annoying
1 - very annoying
16 bits stereo sampled at 48 KHz => 768
Layer I: 192 kbits/sec => Compression ration of (768/192) = 4:1
Layer II: 128 kbits/sec => Compression ration of (768/128) = 6:1
Layer II: 64 kbits/sec => Compression ration of (768/64) = 12:1
Layer
Targetbit-ratefor eachchannel
RatioQuality*
at
64 kbps
Quality at128 kbps
Layer I 192 kbps 4:1 -- --
Layer II 128 kbps 6:1 2.1 to 2.6 4+
Layer III 64 kbps 12:1 3.6 to 3.8 4+
8/19/2019 MPEG Standards for Audio
35/46
MPEG 1
First standard to be published by the MPEG organization
(in 1992)
A standard for storage and retrieval of moving pictures and
audio on storage media
Example formats: VideoCD (VCD), mp3, mp2
8/19/2019 MPEG Standards for Audio
36/46
MPEG-1 Layers I, II, III
MPEG layer differences lie in processing power and
resulting audio/sound quality
Mp1 – little processing needed, poor quality
Mp2 – minimal processing, “okay” quality
Mp3 – massive processing, high “CD” quality
8/19/2019 MPEG Standards for Audio
37/46
MPEG-1 Audio Layer II
Called MP2
Dominant standard for audio broadcasting
DAB digital radio and DVB digital television
Sampling rates: 32, 44.1, 48 kHz
Bit rates: 32, 48, 56, 64, 80, 96, … 384 kbps
Format: mono, stereo, dual channel, …
MP2 – sub-band audio encoder in time domain
8/19/2019 MPEG Standards for Audio
38/46
MPEG-1 Audio Layer III
MPEG-1 Layer III is called MP3 format
Popular for PC and Internet applications
Goal to compress to 128 kbps, but can be compressed to higher orlower resulting quality
Utilization of psychoacoustics
Scientific study of sound perception
8/19/2019 MPEG Standards for Audio
39/46
MPEG-1 Audio Encoding
Characteristics
Precision 16 bits
Sampling frequency: 32KHz, 44.1 KHz, 48 KHz
3 compression layers: Layer 1, Layer 2, Layer 3 (MP3)
Layer 3: 32-320 kbps, target 64 kbps
Layer 2: 32-384 kbps, target 128 kbps
Layer 1: 32-448 kbps, target 192 kbps
8/19/2019 MPEG Standards for Audio
40/46
MPEG-2
Extends video & audio compression of MPEG-1
Substantially reduces bandwidth required for high-qualitytransmissions
Optimizes balance between resolution (quality) and bandwidth (speed)
HDTV(Grand Alliance)
ITU-R HDTV
International Telecommunication Union Radiocommunication Sector
16/9 ASPECT RATIO
Audio: Dolby AC-3
DVB HDTV Digital video broadcasting
4/3 ASPECT RATIO
MPEG audio layer 2
8/19/2019 MPEG Standards for Audio
41/46
MPEG-2 Advanced Audio Coding (AAC)
codec
Sampling frequencies from 8 kHz to 96k Hz
1 to 48 channels per stream
Temporal Noise Shaping (TNS) smooths quantizationnoise by making frequency domain predictions
Prediction: Allows predictable sound patterns such as
speech to be predicted and compressed with better quality
8/19/2019 MPEG Standards for Audio
42/46
MPEG-4
Submergence
Handle specific requirements from rapidly developing multimedia applications
Advantages over MPEG-1 and MPEG-2
Object-oriented coding
Applications: Digital TV
TV logos, Customized advertising, Multi-window screen
Mobile multimedia Cell phones and palm computers
Games
Personalize games
Streaming Video
News updates and live music shows over Internet
8/19/2019 MPEG Standards for Audio
43/46
MPEG 7
Content representation standard for information search
Makes searching the Web for multimedia content as easyas searching for text-only files
Operates in both real-time and non real-time environments
8/19/2019 MPEG Standards for Audio
44/46
MPEG 21
“Multimedia framework”
Based on two essential concepts:1. Digital Item
2. Concept of Users interacting with Digital Item
More universal framework for digital content protection
8/19/2019 MPEG Standards for Audio
45/46
MPEG Standards
MPEG-1 : a standard for storage and retrieval of moving pictures and
audio on storage media
MPEG-2 : a standard for digital television
MPEG-4 : a standard for multimedia applications
MPEG-7 : a content representation standard for information search
MPEG-21: offers metadata information for audio and video files
8/19/2019 MPEG Standards for Audio
46/46
Reference
“A Tutorial on MPEG/Audio Compression”, Davis Pan,
I E E E M u l ti medi a , pp. 60-74, 1995.