Top Banner

Click here to load reader

Chapter 14 MPEG Audio Compression ... audio content for possible compression. (this exploits a number of limitations of human ear- i.e. masking). • Psychoacoustics is the scientific

Jan 26, 2021

ReportDownload

Documents

others

  • Chapter 9 Audio Compression standards

    Introduction

    Psychoacoustics model

    MPEG Audio

    1

  • Fundamentals of Multimedia, Chapter 14

    Introduction

    • Basic Idea of Audio compression is to Exploit areas where the human ear is less sensitive to sound to achieve compression

    • Psychoacoustic model of hearing is used to evaluate audio content for possible compression. (this exploits a number of limitations of human ear- i.e. masking).

    • Psychoacoustics is the scientific study of sound perception. More specifically, it is the branch of science studying the psychological and physiological responses associated with sound (including speech and music). It can be further categorized as a branch of psychophysics.

    2

  • Fundamentals of Multimedia, Chapter 14

    Introduction

    • Using this approach, sampled segments of the source audio waveform are analysed – but only those features that are perceptible to the ear are transmitted.

    • E.g although the human ear is sensitive to signals in the range 20Hz to 20 kHz, the level of sensitivity to each signal is non-linear; that is the ear is more sensitive to some signals than others.

    3

  • Fundamentals of Multimedia, Chapter 14

    Introduction • MPEG audio compression uses this kind of perception

    phenomenon by simply giving up on the tones that can not be heard anyway.

    • It uses the curve of human hearing perceptual sensitivity to make decisions on when and to what degree frequency masking and temporal masking make some components of the music inaudible.

    • Then controls the quantization process so that these components do not influence the output.

    4

  • Psychoacoustics

    5

  • Fundamentals of Multimedia, Chapter 14

    Psychoacoustics

    • The range of normal human hearing is about 20 Hz to about 20 kHz

    • The frequency range of the voice is typically only from about 500 Hz to 4 kHz

    6

    http://www.youtube.com/embed/qNf9nzvnd1k?rel=0

  • Fundamentals of Multimedia, Chapter 14

    Psychoacoustics

    Sensitivity of the ear:  The dynamic range of ear is defined as the loudest sound it can hear to the quietest sound (about 120 dB)

     Sensitivity of the ear varies with the frequency of the signal as shown....in next slide.

     The ear is most sensitive to signals in the range 2-5kHz hence the signals in this band are the quietest the ear is sensitive to.

     In the fig. although the Signal A & B have same relative amplitude, signal A would be heard only because it is above the hearing threshold and B is below the hearing threshold. 7

  • Fundamentals of Multimedia, Chapter 14

    Question: How sensitive is human hearing?

    • The sensitivity of the human ear with respect to frequency is given by the following graph.

    8

  • Fundamentals of Multimedia, Chapter 14

    Threshold of Hearing

    • The threshold of hearing curve: if a sound is above the dB level shown then the sound is audible

    • Turning up a tone so that it equals or surpasses the curve means that we can then distinguish the sound

    • An approximate formula exists for this curve:

    • The threshold units are dB; the frequency for the origin (0,0) in previous formula is 2,000 Hz: Threshold(f) = 0 at f =2 kHz

    9

    20.8 0.6( /1000 3.3) 3 4Threshold( ) 3.64( /1000) 6.5 10 ( /1000)ff f e f     

  • Fundamentals of Multimedia, Chapter 14

    Threshold of Hearing

    Fig. 14.2: Threshold of human hearing, for pure tones 10

  • Fundamentals of Multimedia, Chapter 14

    Psychoacoustics

     Frequency Masking: When multiple signals are present in audio, a strong signal may reduce the level of sensitivity of the ear to other signals which are near to it in frequency.

     Temporal masking: When the ear hears a loud sound it takes a short but a finite time before it could hear a quieter sound.

     Psychoacoustic Model is used to identify those signals which are influenced by masking and these are then eliminated from the transmitted signal........and hence compression is achieved ...

    11

    Masking

    Frequency Temporal

  • Fundamentals of Multimedia, Chapter 14

    Frequency Masking

    • When an audio sound consists of multiple frequency signals is present, the sensitivity of the ear changes and varies with the relative amplitude of the signal

    • If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard.

     Frequency Masking is the process of blocking, removing or ignoring specific frequency components of a signal.

    12

    http://www.youtube.com/embed/k6DVywW5NR4?rel=0

  • Fundamentals of Multimedia, Chapter 14

    Frequency Masking

    Conclusions from diagram: • Signal B is larger than signal A. This causes the basic sensitivity curve of the

    ear to be distorted in the region of signal B

    • Signal A will no longer be heard as it is within the distortion band.

  • Fundamentals of Multimedia, Chapter 14

    Frequency Masking • Lossy audio data compression methods, such as MPEG/Audio

    encoding, remove some sounds which are masked anyway, thus reducing the total amount of information.

    • The general situation in regard to masking is as follows:

    • A lower tone can effectively mask (make us unable to hear) a higher tone. The reverse is not true. A higher tone does not mask a lower tone well. Tones can mask lower frequency sounds, but not as effectively as they mask higher frequency ones.

    • The greater the power in the masking tone, the wider is its influence, the broader the range of frequencies it can mask.

    • As a consequence, if two tones are widely separated in frequency then little masking occurs

    14

  • Fundamentals of Multimedia, Chapter 14

    Frequency Masking Curves

    • Frequency masking is studied by playing a particular pure tone, say 1 kHz again, at a loud volume, and determining how this tone affects our ability to hear tones nearby in frequency

    • One would generate a 1 kHz masking tone, at a fixed sound level of 60 dB, and then raise the level of a nearby tone, e.g., 1.1 kHz, until it is just audible

    • The threshold in Fig. 14.3 plots the audible level for a single masking tone (1 kHz)

    • Fig. 14.4 shows how the plot changes if other masking tones are used

    15

  • Fundamentals of Multimedia, Chapter 14

    16

    Fig. 14.3: Effect on threshold for 1 kHz masking tone

    Frequency Masking Curves

  • Fundamentals of Multimedia, Chapter 14

    Variation of frequency masking effect with frequency:

    Masking effect at various frequencies 1, 4, and 8kHz are shown as:

    • Width of masking curve (means range of frequencies that are affected) increases with increasing frequency.

    • The width of each curve at a particular signal level is known as the critical bandwidth for that frequency.

    • Practically, if a signal can be decomposed into frequencies, then for frequencies that will be partially masked, only audible part will be used to set quantization noise threshold.

    17

    Fig. 14.4: Effect of masking tone at three different frequencies

    Frequency Masking Curves

  • Fundamentals of Multimedia, Chapter 14

    Temporal Masking

    • Temporal masking:

    • After the ear hears a loud sound: It takes a further short while before it can hear a quieter sound.

    • Phenomenon:

    • Any loud tone will cause the hearing receptors in the inner ear to become saturated and require time to recover

    18

  • Fundamentals of Multimedia, Chapter 14

    Temporal masking

    • After the ear hears a loud sound it takes a further short time before it can hear a quieter sound.

    • This is known as the temporal masking.

    • After the loud sound ceases it takes a short period of time for the signal amplitude to decay.

    • During this time, signals whose amplitudes are less than the decay envelope will not be heard and hence need not be transmitted.

    • In order to exploit this phenomenon, the input audio waveform must be processed over a time period that is comparable with that associated with temporal masking.

  • Fundamentals of Multimedia, Chapter 14

    Temporal masking caused by loud signal

  • Fundamentals of Multimedia, Chapter 14

    Example of Temporal Masking

    • Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40 dB. Test tone can’t be heard (it’s masked).

    • Stop masking tone, then stop test tone after a short delay.

    • Adjust delay time to the shortest time that test tone can be heard (e.g., 5 ms).

    • Repeat with different level of the test tone and plot:

    21

    •Fig. 14.6: The louder is the test tone, the shorter it takes for our hearing to get over hearing the masking.

  • Fundamentals of Multimedia, Chapter 14

    Equal-Loudness Relations

    • When play two pure tones (sinusoidal sound wave) with the same amplitude but different frequencies, one may sound louder than the other WHY??