Perceptual Audio Coding &<+(,(’HSW+.3RO\8 2 Preface Physiology of the human ear Critical bands Threshold of hearing Amplitude masking Temporal masking Rationale for perceptual coding Coding techniques Subband coding Transform coding MPEG Audio standards MP1 MP2 MP3 &<+(,(’HSW+.3RO\8 3 Preface ■ Traditionally, audio recording systems have used objective parameters as their design goals - flat response, minimal noise, and so on. ■ Perceptual coders recognize that the final receiver is the human auditory system and make use it to code audio signals. Physiology of the human ear
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Perceptual Audio Coding
&<+��(,(�'HSW��+.3RO\8 2
■ Preface
■ Physiology of thehuman ear� Critical bands
� Threshold of hearing
� Amplitude masking
� Temporal masking
■ Rationale forperceptual coding
■ Coding techniques� Subband coding
� Transform coding
■ MPEG Audio standards� MP1
� MP2
� MP3
&<+��(,(�'HSW��+.3RO\8 3
Preface
■ Traditionally, audio recording systems haveused objective parameters as their designgoals - flat response, minimal noise, and soon.
■ Perceptual coders recognize that the finalreceiver is the human auditory system andmake use it to code audio signals.
Physiology of the human ear
&<+��(,(�'HSW��+.3RO\8 5
Critical bands
■ The ear contains roughly 30,000 hair cellsarranged in multiple rows along the basilarmembrane.
■ Hair cells respond to the strongeststimulation in their local regions calledcritical bands.
■ Critical bands are not fixed and any audibletone will create a critical band centered onit.
&<+��(,(�'HSW��+.3RO\8 6
■ Critical bands are much narrower at lowfrequencies than at high frequencies.
■ Critical bandwidth (Hz) =24.7(4.37fc+1),where fc is the center frequency in kHz.
■ The critical band concept is an empiricalphenomenon.
&<+��(,(�'HSW��+.3RO\8 7
■ The bark is a unit to measure the criticalband rate.
■ A critical band has a width of 1 bark.
■ Critical bands are important in perceptualcoding because they show that the eardiscriminates between energy in the band,and the energy outside the band; inparticular, this promotes masking.
&<+��(,(�'HSW��+.3RO\8 8
Threshold of hearing
■ Two fundamental phenomena that governhuman hearing are the minimum hearingthreshold and masking.
■ The threshold of hearing curve describes theminimum level at which the ear can detect atone at a given frequency.
■ Masking theory argues that the softer tone isjust detectable when its energy equals theenergy of the part of the louder maskingsignal in the critical band.
■ Masking can overlap adjacent critical bandswhen a signal is loud or contains harmonics.
■ Simultaneous masking curves areasymmetrical in a way that the slope of theshifted curve is less steep on the high-frequency side.
■ As sound level of the masker increases, thethreshold curve broadens, and in particularits upper slope decreases while lower sloperemains relatively unaffected.
■ Temporal masking occurs when tones aresounded close in time, but notsimultaneously.
■ A louder tone appearing just after(premasking), or before (postmasking) asofter tone overcomes the softer tone.
&<+��(,(�'HSW��+.3RO\8 19
Before maskerexists
After maskerexists
Maskerexists
&<+��(,(�'HSW��+.3RO\8 20
&<+��(,(�'HSW��+.3RO\8 21
■ Amplitude and temporal masking form acontour that can be mapped in the time-frequency domain.
■ Perceptual coders identify this contour forchanging signal conditions, and code thesignal appropriately.
&<+��(,(�'HSW��+.3RO\8 22
■ Using diverse and dynamically changingpsychoacoustical cues and signal analysis,inaudible components can be removed withacceptable degradation.
Rational for Perceptual Coding
&<+��(,(�'HSW��+.3RO\8 24
■ Perceptual coding systems analyze thefrequency and amplitude content of theinput signal, compare it to a model ofhuman auditory perception, and code itaccordingly.
■ Tests show that ratios of 4:1 or 6:1 can betransparent.
&<+��(,(�'HSW��+.3RO\8 25
■ The coding performance of perceptualcoding relies on the following factors:� Only audible information is coded.
� Bits are assigned according to audibility.
� Quantization error is confined in a critical band.
&<+��(,(�'HSW��+.3RO\8 26
■ Perceptual coding is tolerant of errors.� With PCM, an error introduces a broadband
noise.
� With most perceptual coders, the error islimited to a narrow band corresponding to thebandwidth of the coded critical band, thuslimiting its loudness.
Coding Techniques
&<+��(,(�'HSW��+.3RO\8 28
■ There are two types of frequency domaincoders: subband and transform coders.
■ Both coders operate over a block ofsamples.
■ This block must be kept short to stay withinthe temporal resolution of the ear.
&<+��(,(�'HSW��+.3RO\8 29
■ In practical applications, many coders arehybrid coders which combine techniquesfrom both subband and transform coding.
&<+��(,(�'HSW��+.3RO\8 30
Subband coding:
■ Blocks of consecutive time-domain samplesrepresenting the boardband signal arecollected over a short period and applied toa digital filter bank.
■ The filter bank divides the signal intomultiple bandlimited channels toapproximate the critical band response ofthe human ear.
&<+��(,(�'HSW��+.3RO\8 31
Analysisfilter bank
Frequencyanalysis
Subband
::
Audioinpu t
f
Bitallocation
Quantiza-tion
t
t
t
1
2
N
1 ... NSubband
t
C odedsigna l
B lock d iagram of a subband coder
Synthesisfilter bank +:
Audiooutpu t
t
&<+��(,(�'HSW��+.3RO\8 32
&<+��(,(�'HSW��+.3RO\8 33
■ The samples in each subband are analyzedand compared to a psychoacoustic model.
■ The coder adaptively quantizes the samplesin each subband based on the maskingthreshold in that subband.
■ Each subband is coded independently withmore or fewer bits allocated to the samplesin the subband.
&<+��(,(�'HSW��+.3RO\8 34
Average level
&<+��(,(�'HSW��+.3RO\8 35
■ Bit allocation is determined by apsychoacoustic model and analysis of thesignal itself.
■ Samples are dynamically quantizedaccording to audibility of signals.
&<+��(,(�'HSW��+.3RO\8 36
Average energy
&<+��(,(�'HSW��+.3RO\8 37
■ The signal-to-mask ratio (SMR) of aparticular subband is the difference betweenthe maximum signal and the maskingthreshold in that subband and is used todetermine the number of bits assigned to asubband.
■ The signals below the minimum or maskingcurve are not coded.
&<+��(,(�'HSW��+.3RO\8 38
■ The number of bits given to any subbandmust be sufficient to yield a requantizationnoise level that is below the masking level.
■ The quantization noise in a subband islimited to that subband and can be maskedby the audio signal in that subband.