Top Banner
Chapter 9 Audio Compression standards Introduction Psychoacoustics model MPEG Audio 1
35

Chapter 14 MPEG Audio Compression...audio content for possible compression. (this exploits a number of limitations of human ear- i.e. masking). • Psychoacoustics is the scientific

Jan 26, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Chapter 9Audio Compression standards

    Introduction

    Psychoacoustics model

    MPEG Audio

    1

  • Fundamentals of Multimedia, Chapter 14

    Introduction

    • Basic Idea of Audio compression is to Exploit areas where the human ear is less sensitive to sound to achieve compression

    • Psychoacoustic model of hearing is used to evaluate audio content for possible compression. (this exploits a number of limitations of human ear- i.e. masking).

    • Psychoacoustics is the scientific study of sound perception. More specifically, it is the branch of science studying the psychological and physiological responses associated with sound (including speech and music). It can be further categorized as a branch of psychophysics.

    2

  • Fundamentals of Multimedia, Chapter 14

    Introduction

    • Using this approach, sampled segments of the sourceaudio waveform are analysed – but only those featuresthat are perceptible to the ear are transmitted.

    • E.g although the human ear is sensitive to signals in therange 20Hz to 20 kHz, the level of sensitivity to eachsignal is non-linear; that is the ear is more sensitive tosome signals than others.

    3

  • Fundamentals of Multimedia, Chapter 14

    Introduction• MPEG audio compression uses this kind of perception

    phenomenon by simply giving up on the tones that can not be heard anyway.

    • It uses the curve of human hearing perceptual sensitivity to make decisions on when and to what degree frequency masking and temporal masking make some components of the music inaudible.

    • Then controls the quantization process so that these components do not influence the output.

    4

  • Psychoacoustics

    5

  • Fundamentals of Multimedia, Chapter 14

    Psychoacoustics

    • The range of normal human hearing is about 20 Hz to about 20 kHz

    • The frequency range of the voice is typically only from about 500 Hz to 4 kHz

    6

    http://www.youtube.com/embed/qNf9nzvnd1k?rel=0

  • Fundamentals of Multimedia, Chapter 14

    Psychoacoustics

    Sensitivity of the ear: The dynamic range of ear is defined as the loudest soundit can hear to the quietest sound (about 120 dB)

    Sensitivity of the ear varies with the frequency of thesignal as shown....in next slide.

    The ear is most sensitive to signals in the range 2-5kHzhence the signals in this band are the quietest the ear issensitive to.

    In the fig. although the Signal A & B have same relativeamplitude, signal A would be heard only because it isabove the hearing threshold and B is below the hearingthreshold. 7

  • Fundamentals of Multimedia, Chapter 14

    Question: How sensitive is human hearing?

    • The sensitivity of the human ear with respect to frequency is given by the following graph.

    8

  • Fundamentals of Multimedia, Chapter 14

    Threshold of Hearing

    • The threshold of hearing curve: if a sound is above the dB level shown then the sound is audible

    • Turning up a tone so that it equals or surpasses the curve means that we can then distinguish the sound

    • An approximate formula exists for this curve:

    • The threshold units are dB; the frequency for the origin (0,0) in previous formula is 2,000 Hz: Threshold(f) = 0 at f =2 kHz

    9

    20.8 0.6( /1000 3.3) 3 4Threshold( ) 3.64( /1000) 6.5 10 ( /1000)ff f e f

  • Fundamentals of Multimedia, Chapter 14

    Threshold of Hearing

    Fig. 14.2: Threshold of human hearing, for pure tones10

  • Fundamentals of Multimedia, Chapter 14

    Psychoacoustics

    Frequency Masking: When multiple signals are present inaudio, a strong signal may reduce the level of sensitivityof the ear to other signals which are near to it infrequency.

    Temporal masking: When the ear hears a loud sound ittakes a short but a finite time before it could hear aquieter sound.

    Psychoacoustic Model is used to identify those signalswhich are influenced by masking and these are theneliminated from the transmitted signal........and hencecompression is achieved ...

    11

    Masking

    Frequency Temporal

  • Fundamentals of Multimedia, Chapter 14

    Frequency Masking

    • When an audio sound consists of multiple frequency signals is present, the sensitivity of the ear changes and varies with the relative amplitude of the signal

    • If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard.

    Frequency Masking is the process of blocking, removing or ignoring specific frequency components of a signal.

    12

    http://www.youtube.com/embed/k6DVywW5NR4?rel=0

  • Fundamentals of Multimedia, Chapter 14

    Frequency Masking

    Conclusions from diagram:• Signal B is larger than signal A. This causes the basic sensitivity curve of the

    ear to be distorted in the region of signal B

    • Signal A will no longer be heard as it is within the distortion band.

  • Fundamentals of Multimedia, Chapter 14

    Frequency Masking• Lossy audio data compression methods, such as MPEG/Audio

    encoding, remove some sounds which are masked anyway, thus reducing the total amount of information.

    • The general situation in regard to masking is as follows:

    • A lower tone can effectively mask (make us unable to hear) a higher tone. The reverse is not true. A higher tone does not mask a lower tone well. Tones can mask lower frequency sounds, but not as effectively as they mask higher frequency ones.

    • The greater the power in the masking tone, the wider is its influence, the broader the range of frequencies it can mask.

    • As a consequence, if two tones are widely separated in frequency then little masking occurs

    14

  • Fundamentals of Multimedia, Chapter 14

    Frequency Masking Curves

    • Frequency masking is studied by playing a particular pure tone, say 1 kHz again, at a loud volume, and determining how this tone affects our ability to hear tones nearby in frequency

    • One would generate a 1 kHz masking tone, at a fixed sound level of 60 dB, and then raise the level of a nearby tone, e.g., 1.1 kHz, until it is just audible

    • The threshold in Fig. 14.3 plots the audible level for a single masking tone (1 kHz)

    • Fig. 14.4 shows how the plot changes if other masking tones are used

    15

  • Fundamentals of Multimedia, Chapter 14

    16

    Fig. 14.3: Effect on threshold for 1 kHz masking tone

    Frequency Masking Curves

  • Fundamentals of Multimedia, Chapter 14

    Variation of frequency masking effect with frequency:

    Masking effect at various frequencies 1, 4, and 8kHz are shown as:

    • Width of masking curve (means range of frequencies that are affected)increases with increasing frequency.

    • The width of each curve at a particular signal level is known as the criticalbandwidth for that frequency.

    • Practically, if a signal can be decomposed into frequencies, then forfrequencies that will be partially masked, only audible part will be used toset quantization noise threshold.

    17

    Fig. 14.4: Effect of masking tone at three different frequencies

    Frequency Masking Curves

  • Fundamentals of Multimedia, Chapter 14

    Temporal Masking

    • Temporal masking:

    • After the ear hears a loud sound: It takes a further short while before it can hear a quieter sound.

    • Phenomenon:

    • Any loud tone will cause the hearing receptors in the inner ear to become saturated and require time to recover

    18

  • Fundamentals of Multimedia, Chapter 14

    Temporal masking

    • After the ear hears a loud sound it takes a further shorttime before it can hear a quieter sound.

    • This is known as the temporal masking.

    • After the loud sound ceases it takes a short period oftime for the signal amplitude to decay.

    • During this time, signals whose amplitudes are less thanthe decay envelope will not be heard and hence neednot be transmitted.

    • In order to exploit this phenomenon, the input audiowaveform must be processed over a time period that iscomparable with that associated with temporal masking.

  • Fundamentals of Multimedia, Chapter 14

    Temporal masking caused by loud signal

  • Fundamentals of Multimedia, Chapter 14

    Example of Temporal Masking

    • Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40 dB. Test tone can’t be heard (it’s masked).

    • Stop masking tone, then stop test tone after a short delay.

    • Adjust delay time to the shortest time that test tone can be heard (e.g., 5 ms).

    • Repeat with different level of the test tone and plot:

    21

    •Fig. 14.6: The louder is the test tone, the shorter it takes for our hearing to get over hearing the masking.

  • Fundamentals of Multimedia, Chapter 14

    Equal-Loudness Relations

    • When play two pure tones (sinusoidal sound wave) with the same amplitude but different frequencies, one may sound louder than the other WHY??

    • Because ear does not hear low or high frequencies as well as frequencies in the middle range. At normal sound volume levels, the ear is most sensitive to frequencies between 1kHz and 5kHz.

    22

  • Fundamentals of Multimedia, Chapter 14

    Equal-Loudness Relations

    • Fletcher-Munson Curves

    • Equal loudness curves that display the relationship between perceived loudness (“Phons”, in dB) for a given stimulus sound volume (“Sound Pressure Level”, also in dB), as a function of frequency

    • Fig. 14.1 shows the ear’s perception of equal loudness:

    • The bottom curve shows what level of pure tone stimulus is required to produce the perception of a 10 dB sound

    • All the curves are arranged so that the perceived loudness level gives the same loudness as for that loudness level of a pure tone at 1 kHz

    23

    http://newt.phys.unsw.edu.au/jw/hearing.html

  • Fundamentals of Multimedia, Chapter 14

    •Fig. 14.1: Fletcher-Munson Curves (re-measured by Robinson and Dadson)

    24

  • MPEG Audio

    25

  • Fundamentals of Multimedia, Chapter 14

    MPEG Audio

    • MPEG audio compression takes advantage of psychoacoustic models, constructing a large multi-dimensional lookup table to transmit masked frequency components using fewer bits.

    26

  • Fundamentals of Multimedia, Chapter 14

    MPEG Audio Overview

    1. The input audio stream passes through a filter bank to break it into its frequency components.

    Filter Bank: Divides the input into multiple sub-bands.

    27

  • Fundamentals of Multimedia, Chapter 14

    MPEG Audio Overview

    2, In parallel, the input audio stream simultaneously passes through a psychoacoustic model.The Psychoacoustic Model: The key component of the MPEG encoder that enables its high performance. It analyzes the audio signal and computes the amount of noise masking that is available as a function of frequency.

    28

  • Fundamentals of Multimedia, Chapter 14

    MPEG Audio Overview

    3. The bit or noise allocation block uses the signal-to-mask ratios to decide how to apportion the total number of code bits available for the quantization of the sub-band signals to minimize the audibility of the quantization noise.

    29

  • Fundamentals of Multimedia, Chapter 14

    MPEG Audio Overview

    4. Finally, the last block takes the representation of the quantized audio samples and formats the data into a decodable bit stream.

    30

  • Fundamentals of Multimedia, Chapter 14

    MPEG Layers

    • MPEG audio offers three compatible layers:

    • Each succeeding layer able to understand the lower layers

    • Each succeeding layer offering more complexity in the psychoacoustic model and better compression for a given level of audio quality

    • Each succeeding layer, with increased compression effectiveness, accompanied by extra delay

    • The objective of MPEG layers: a good tradeoff between quality and bit-rate

    31

  • Fundamentals of Multimedia, Chapter 14

    MPEG Layers (cont’d)

    • Layer 1 quality can be quite good provided a comparatively high bit-rate is available

    • Digital Audio Tape typically uses Layer 1 at around 192 kbps

    • Layer 2 has more complexity; was proposed for use in Digital Audio Broadcasting

    • Layer 3 (MP3) is most complex, and was originally aimed at audio transmission over ISDN lines

    • Most of the complexity increase is at the encoder, not the decoder – accounting for the popularity of MP3 players

    32

  • Fundamentals of Multimedia, Chapter 14

    MPEG Audio Compression Algorithm

    • Fig. 14.9: Basic MPEG Audio encoder and decoder.

    33

  • Fundamentals of Multimedia, Chapter 14

    • Fig. 14.11: MPEG Audio Frame Sizes34

  • Fundamentals of Multimedia, Chapter 14

    Example:•After analysis, the first levels of 16 of the 32 bands are these:----------------------------------------------------------------------

    Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

    Level (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1

    ----------------------------------------------------------------------

    •If the level of the 8th band is 60dB,

    it gives a masking of 12 dB in the 7th band, 15dB in the 9th.

    Level in 7th band is 10 dB ( < 12 dB ), so ignore it.

    Level in 9th band is 35 dB ( > 15 dB ), so send it.

    --> Can encode with up to 2 bits (= 12 dB) of quantization error.

    35