Top Banner

of 18

MPEG Standards for Audio

Jul 07, 2018

Download

Documents

rajnish kumar
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/19/2019 MPEG Standards for Audio

    1/46

    ECN-516

    Topic: MPEG Standards for Audio

  • 8/19/2019 MPEG Standards for Audio

    2/46

    Psychoacoustics

    These methods are related to how humans actually hear sounds:

    Human hearing and voice

    Frequency range is about 20 Hz to 20 kHz, most sensitive at 1 to 5 KHz. Dynamic range (quietest to loudest) is about 96 dB

     Normal voice range is about 500 Hz to 2 kHz

    Low frequencies are vowels and bass

    High frequencies are consonants

  • 8/19/2019 MPEG Standards for Audio

    3/46

    Human Hearing and Voice

    Experiment:

    Put a person in a quiet room. Raise level of 1 kHz tone until just

     barely audible. Vary the frequency and plot

    dB

    0

    10

    20

    30

    40

    2 4 6 8 10 12

    kHz

  • 8/19/2019 MPEG Standards for Audio

    4/46

    Psychoacoustics

    How sensitive is human hearing?

    To answer this question we look at the following concepts:

    Threshold of hearing

    Describes the notion of “quietness”

    Frequency Masking

    A component (at a particular frequency) masks components at neighboringfrequencies. Such masking may be partial.

    Temporal Masking

    When two tones (samples) are played closed together in time, one can mask theother.

  • 8/19/2019 MPEG Standards for Audio

    5/46

    Threshold of hearing

    The ear is most sensitive to frequencies between 1 and 5

    kHz, where we can actually hear signals below 0 dB.

    Two tones of equal power and different frequencies will

    not be equally loud.

    Sensitivity decreases at low and high frequencies.

    dB

    0

    10

    20

    30

    40

    2 4 6 8 10 12

    Will not be heard anyway; discard!

    kHz

  • 8/19/2019 MPEG Standards for Audio

    6/46

    Frequency Masking

    Question: Do receptors interfere with each other?

    Experiment: Play 1 kHz tone (masking tone) at fixed level (60 dB).

    Play test tone at a different level (e.g., 1.1kHz), and raise level

    until just distinguishable.

    Vary the frequency of the test tone and plot the threshold when it becomes audible:

    Repeat for various frequencies of masking tones

  • 8/19/2019 MPEG Standards for Audio

    7/46

    Frequency Masking

    A tone at a certain frequency will raise the threshold in a

    critical band around that frequency.

    The masker raises the threshold of audibility so that the

    adjacent tone above it is no longer audible.

  • 8/19/2019 MPEG Standards for Audio

    8/46

    Critical bands

    Perceptually uniform measure of frequency, non-

     proportional to width of masking curve

    About 100 Hz for masking frequency < 500 Hz, grow

    larger and larger above 500 Hz.

    The width is called the size of the critical band 

  • 8/19/2019 MPEG Standards for Audio

    9/46

    Critical bands

    The human auditory system has a limited, frequencydependent resolution.

    This frequency dependence is expressed in the form of critical band widths, less then 100 Hz for low and morethen 4 kHz for high frequencies.

    The human ear blurs the various signal components insidea critical band.

  • 8/19/2019 MPEG Standards for Audio

    10/46

    Temporal Masking

      If we hear a loud sound, and then it stops, it takes a little while until wecan hear a soft tone nearby (in frequency).

    Question: how to quantify?

    Experiment:

    Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40 dB. Test tonecan't be heard (it's masked).

    Stop masking tone, then stop test tone after a short delay.

    Adjust delay time to the shortest time that test tone can be heard (e.g., 5 ms).

    Repeat with different level of the test tone and plot:

    Try other frequencies for test tone (masking tone duration constant). Total effect of

    masking

      The temporal masking effect is the masking that occurs when a sound 

    raises the audibility threshold for a brief interval preceding and 

     following the sound.

  • 8/19/2019 MPEG Standards for Audio

    11/46

    Temporal Masking

      If we hear a loud sound, and then it stops, it takes a little while until we

    can hear a soft tone nearby (in frequency).

      The temporal masking effect is the masking that occurs when a sound 

    raises the audibility threshold for a brief interval preceding and 

     following the sound.

    Energy

    TimeStrong sound (”masker”)

    Forward (post) maskingApprox. 100 ms

    Backward (pre) masking< 10 ms

  • 8/19/2019 MPEG Standards for Audio

    12/46

    Observation

    If we have a loud tone at, say, 1 kHz, then nearby quieter

    tones are masked.

    Best compared on critical band scale - range of masking isabout 1 critical band

    Two factors for masking - frequency masking and temporal

    masking

    Question: How to use this for compression?

  • 8/19/2019 MPEG Standards for Audio

    13/46

    MPEG

    Moving Picture Experts Group (MPEG)

    Established in 1988

    Standards under 

    International Organization for standardization (ISO)

    International Electrotechnical Commission (IEC)

    Official name is: ISO/IEC JTC1 SC29 WG11

  • 8/19/2019 MPEG Standards for Audio

    14/46

    MPEG

    First High Fidelity Audio standard

    Part of a multiple standard for 

    Video compression

    Audio compression

    Audio, Video and Data synchronization at an aggregate rate of 1.5

    Mbit/sec

  • 8/19/2019 MPEG Standards for Audio

    15/46

    MPEG Audio

    Physically lossy compression algorithm

    Perceptually lossless, transparent algorithm

    Exploits perceptual properties of human ear 

    Psychoacoustic modeling

  • 8/19/2019 MPEG Standards for Audio

    16/46

    MPEG Audio Standard

    Ensures inter-operability

    Defines coded bitstream syntax

    Defines decoding process

    Guarantees decoder’s accuracy

  • 8/19/2019 MPEG Standards for Audio

    17/46

    MPEG audio features

     No assumptions about the nature of the audio source

    Exploitation of human auditory system perceptual

    limitations

    Removal of perceptually irrelevant parts of audio signal

  • 8/19/2019 MPEG Standards for Audio

    18/46

    MPEG audio sampling rates

    32 kHz

    44.1 kHz

    48 kHz

  • 8/19/2019 MPEG Standards for Audio

    19/46

    MPEG Audio Overview

    Facts 

    The two most common advanced (beyond simple ADPCM) techniquesfor audio coding are:

    Sub-Band Coding (SBC) based

    Adaptive Transform Coding based

    MPEG audio coding is comprised of three independent layers. Eachlayer is a self-contained SBC coder with its own time-frequency

    mapping, psychoacoustic model, and quantizer.

    Layer I: Uses sub-band coding

    Layer II: Uses sub-band coding (longer frames, more compression)

    Layer III: Uses both sub-band coding and transform coding.

    MPEG-1 Audio is intended to take a PCM audio signal sampled at a

    rate of 32, 44.1 or 48 kHz, and encode it at a bit rate of 32 to 192 kbps

     per audio channel (depending on layer).

  • 8/19/2019 MPEG Standards for Audio

    20/46

    MPEG Audio Compression

  • 8/19/2019 MPEG Standards for Audio

    21/46

  • 8/19/2019 MPEG Standards for Audio

    22/46

  • 8/19/2019 MPEG Standards for Audio

    23/46

    MPEG Coding Specifics

    AudioSamples 

    Sub-band filter 0

    Sub-band filter 1

    Sub-band filter 2

    Sub-band filter 31

    .

    .

    .

    12samples

    12samples

    12samples

    12samples

    12samples

    12samples

    Layer IFrame

    Layer II, IIIFrame

    .

    .

    .

    ..

    .

  • 8/19/2019 MPEG Standards for Audio

    24/46

    The Polyphase Filter Bank 

    Key component common to all layers

    Divides the audio signal into 32 equal-width frequency

    subbands

    The filters provide good time and reasonable frequency

    resolution

    Critical bands associated with psychoacoustic models

  • 8/19/2019 MPEG Standards for Audio

    25/46

  • 8/19/2019 MPEG Standards for Audio

    26/46

    MPEG Audio Psycho-acoustic Model

    MPEG audio compresses by removing acoustically

    irrelevant parts of audio signals

    Takes advantage of  human auditory systems inability tohear quantization noise under auditory masking

    Analyzes the audio signal and computes the amount of

    noise masking as a function of frequency

    The encoder decides how best to represent the input signalwith a minimum number of bits

  • 8/19/2019 MPEG Standards for Audio

    27/46

    Basic Steps in Psychoacoustic Model

    Time align audio data

    Convert audio to frequency domain representation

    Process spectral values into tonal and non-tonal components

    Apply a spreading function

    Set a lower bound for threshold values

    Find the threshold values for each subband

    Calculate the signal to mask ratio

  • 8/19/2019 MPEG Standards for Audio

    28/46

    Masking and Quantization (Example)

    Say, performing the sub-band filtering step on the input results in thefollowing values (for demonstration, we are only looking at the first16 of the 32 bands):

    Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

    Level 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1

    The 60 dB level of the 8th band gives a masking of 12 dB in the 7th band, 15 dB in the 9th. (according to the Psychoacoustic model)

    The level in 7th band is 10 dB ( < 12 dB ), so ignore it.

    The level in 9th band is 35 dB ( > 15 dB ), so send it.

    We only send the amount above the masking level.

    Therefore, instead of using 6 bits to encode it, we can use 4 bits -- asaving of 2 bits (= 12 dB).

    “determine number of bits needed to represent the coefficient such that, thenoise introduced by quantization is below the masking effect” [noiseintroduced = 12 dB; masking = 15 dB]

  • 8/19/2019 MPEG Standards for Audio

    29/46

    MPEG Audio Layer I

    Simplest coding

    Suitable for bit rates above 128 kbits/sec per channel

    Philips Digital Compact Cassette

  • 8/19/2019 MPEG Standards for Audio

    30/46

    MPEG Audio Layer II

    Intermediate complexity

    Bit rates around 128 kbits/sec per channel

    Digital Audio Broadcasting (DAB)

    Synchronized Video and Audio on CD-ROM

    Full motion CD-I

    Video-CD

  • 8/19/2019 MPEG Standards for Audio

    31/46

  • 8/19/2019 MPEG Standards for Audio

    32/46

    MPEG Layer III coding

    Based on Layer I & II filter banks

    Compensation of filter deficiencies by processing outputswith a Modified Discrete Cosine Transform

  • 8/19/2019 MPEG Standards for Audio

    33/46

    MPEG Layer III enhancements

    Alias reduction

     Non uniform quantization

    Scalefactor bands

    Entropy coding of data values

    Use of a “bit reservoir”

  • 8/19/2019 MPEG Standards for Audio

    34/46

    Effectiveness of MPEG Audio

    *Quality factor:

    5 –  perfect

    4 - just noticeable

    3 - slightly annoying

    2 – annoying

    1 - very annoying

    16 bits stereo sampled at 48 KHz => 768

    Layer I: 192 kbits/sec => Compression ration of (768/192) = 4:1

    Layer II: 128 kbits/sec => Compression ration of (768/128) = 6:1

    Layer II: 64 kbits/sec => Compression ration of (768/64) = 12:1

    Layer

    Targetbit-ratefor eachchannel

    RatioQuality*

    at

    64 kbps

    Quality at128 kbps

    Layer I 192 kbps 4:1 -- --

    Layer II 128 kbps 6:1 2.1 to 2.6 4+

    Layer III 64 kbps 12:1 3.6 to 3.8 4+

  • 8/19/2019 MPEG Standards for Audio

    35/46

    MPEG 1

    First standard to be published by the MPEG organization

    (in 1992)

    A standard for storage and retrieval of moving pictures and

    audio on storage media

    Example formats: VideoCD (VCD), mp3, mp2

  • 8/19/2019 MPEG Standards for Audio

    36/46

    MPEG-1 Layers I, II, III

    MPEG layer differences lie in processing power and

    resulting audio/sound quality

    Mp1 – little processing needed, poor quality

    Mp2 – minimal processing, “okay” quality

    Mp3 – massive processing, high “CD” quality

  • 8/19/2019 MPEG Standards for Audio

    37/46

    MPEG-1 Audio Layer II

    Called MP2

    Dominant standard for audio broadcasting

    DAB digital radio and DVB digital television

    Sampling rates: 32, 44.1, 48 kHz

    Bit rates: 32, 48, 56, 64, 80, 96, … 384 kbps

    Format: mono, stereo, dual channel, …

    MP2 – sub-band audio encoder in time domain

  • 8/19/2019 MPEG Standards for Audio

    38/46

    MPEG-1 Audio Layer III

    MPEG-1 Layer III is called MP3 format

    Popular for PC and Internet applications

    Goal to compress to 128 kbps, but can be compressed to higher orlower resulting quality

    Utilization of psychoacoustics

    Scientific study of sound perception

  • 8/19/2019 MPEG Standards for Audio

    39/46

    MPEG-1 Audio Encoding

    Characteristics

    Precision 16 bits

    Sampling frequency: 32KHz, 44.1 KHz, 48 KHz

    3 compression layers: Layer 1, Layer 2, Layer 3 (MP3)

    Layer 3: 32-320 kbps, target 64 kbps

    Layer 2: 32-384 kbps, target 128 kbps

    Layer 1: 32-448 kbps, target 192 kbps

  • 8/19/2019 MPEG Standards for Audio

    40/46

    MPEG-2

    Extends video & audio compression of MPEG-1

    Substantially reduces bandwidth required for high-qualitytransmissions

    Optimizes balance between resolution (quality) and bandwidth (speed)

    HDTV(Grand Alliance)

    ITU-R HDTV

    International Telecommunication Union Radiocommunication Sector 

    16/9 ASPECT RATIO

    Audio: Dolby AC-3

    DVB HDTV Digital video broadcasting

    4/3 ASPECT RATIO

    MPEG audio layer 2

  • 8/19/2019 MPEG Standards for Audio

    41/46

    MPEG-2 Advanced Audio Coding (AAC)

    codec

    Sampling frequencies from 8 kHz to 96k Hz

    1 to 48 channels per stream

    Temporal Noise Shaping (TNS) smooths quantizationnoise by making frequency domain predictions

    Prediction: Allows predictable sound patterns such as

    speech to be predicted and compressed with better quality

  • 8/19/2019 MPEG Standards for Audio

    42/46

    MPEG-4

    Submergence

    Handle specific requirements from rapidly developing multimedia applications

    Advantages over MPEG-1 and MPEG-2

    Object-oriented coding

    Applications: Digital TV

    TV logos, Customized advertising, Multi-window screen

    Mobile multimedia Cell phones and palm computers

    Games

    Personalize games

    Streaming Video

     News updates and live music shows over Internet

  • 8/19/2019 MPEG Standards for Audio

    43/46

    MPEG 7

    Content representation standard for information search

    Makes searching the Web for multimedia content as easyas searching for text-only files

    Operates in both real-time and non real-time environments

  • 8/19/2019 MPEG Standards for Audio

    44/46

    MPEG 21

    “Multimedia framework”

    Based on two essential concepts:1. Digital Item

    2. Concept of Users interacting with Digital Item

    More universal framework for digital content protection

  • 8/19/2019 MPEG Standards for Audio

    45/46

    MPEG Standards

    MPEG-1 : a standard for storage and retrieval of moving pictures and

    audio on storage media

    MPEG-2 : a standard for digital television

    MPEG-4 : a standard for multimedia applications

    MPEG-7 : a content representation standard for information search

    MPEG-21: offers metadata information for audio and video files

  • 8/19/2019 MPEG Standards for Audio

    46/46

    Reference

    “A Tutorial on MPEG/Audio Compression”, Davis Pan,

    I E E E M u l ti medi a  , pp. 60-74, 1995.