Top Banner
Fundamentals Fundamentals of of Perceptual Audio Encoding Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 2/25/05 Harvard-MIT Division of Health Sciences and Technology HST.723: Neural Coding and Perception of Sound Instructor: Bertrand Delgutte
40

Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

May 12, 2018

Download

Documents

hahanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Fundamentals Fundamentals of of

Perceptual Audio EncodingPerceptual Audio Encoding

Craig LewistonHST.723 Lab II

2/25/05

Harvard-MIT Division of Health Sciences and TechnologyHST.723: Neural Coding and Perception of SoundInstructor: Bertrand Delgutte

Page 2: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Goals of Lab

• Introduction to fundamental principles of digital audio & perceptual audio encoding

• Learn the basics of psychoacoustic models used in perceptual audio encoding.

• Run 2 experiments exploring some fundamental principles behind the psychoacoustic models of perceptual audio encoding.

Page 3: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Digital AudioDigital Audio

Quantization

ANALOG-TO-DIGITAL CONVERSION

DIGITAL SIGNAL PROCESSING

DIGITAL-TO-ANALOG CONVERSION

ANALOG AUDIO INPUT

ANALOG AUDIO OUTPUT

Page 4: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Quantization

N Bits => 2N Bits => 2NN levelslevels

Page 5: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Quantization

Quantization Noise is the difference between the analog signal and the digital representation, and arises as a result of the error in the quantization of the analog signal.

N Bits => 2N Bits => 2NN levelslevels

Page 6: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Quantization

3 84 165 328 256

16 65536

BitsBits LevelsLevels

With each increase in the bit level, the digital representation of the analog signal increases in fidelity, and the quantization noise becomes smaller.

Page 7: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Quantization

3 84 165 328 256

16 65536

BitsBits LevelsLevels

With each increase in the bit level, the digital representation of the analog signal increases in fidelity, and the quantization noise becomes smaller.

Page 8: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Quantization

3 84 165 328 256

16 65536

BitsBits LevelsLevels

With each increase in the bit level, the digital representation of the analog signal increases in fidelity, and the quantization noise becomes smaller.

Page 9: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Quantization

3 84 165 328 256

16 65536

BitsBits LevelsLevels

With each increase in the bit level, the digital representation of the analog signal increases in fidelity, and the quantization noise becomes smaller.

Page 10: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Quantization

3 84 165 328 256

16 65536

BitsBits LevelsLevels

With each increase in the bit level, the digital representation of the analog signal increases in fidelity, and the quantization noise becomes smaller.

Page 11: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Digital AudioDigital Audio

ANALOG-TO-DIGITAL CONVERSION

DIGITAL SIGNAL PROCESSING

DIGITAL-TO-ANALOG CONVERSION

ANALOG AUDIO INPUT

ANALOG AUDIO OUTPUT

CD Audio: • 2 Channels (Stereo)• 44.1 kHz sampling rate• 16 bit encoding

2 * 44.1 kHz * 16 bits = 1.41 Mb/s+

Overhead (synchronization, error correction, etc.)

CD Audio = 4.32 Mb/s

Page 12: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

CompressionCompression

• High data rates, such as CD audio (4.32 Mb/s), are incompatible with internet & wireless applications.

• Audio data must somehow be compressed to a smaller size (less bits), while not affecting signal quality and reducing quantization noise.

• Perceptual Audio Encoding is the encoding of audio signals, incorporating psychoacoustic knowledge of the auditory system, in order to reduce the amount of bits necessary to faithfully reproduce the signal.

• MPEG-1 Layer III (aka mp3)• MPEG-2 Advanced Audio Coding (AAC)

Page 13: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

MPEG MPEG = Motion Picture Experts GroupMPEG is a family of encoding standards for digital multimedia information

• MPEG-1: a standard for storage and retrieval of moving pictures and audio on storage media (e.g., CD-ROM).

• Layer I• Layer II• Layer III (aka MP3)

• MPEG-2: standard for digital television, including high-definition television (HDTV), and for addressing multimedia applications.

• Advanced Audio Coding (AAC)

• MPEG-4: a standard for multimedia applications, with very low bit-rate audio-visual compression for those channels with very limited bandwidths (e.g., wireless channels).

• MPEG-7: a content representation standard for information search

Page 14: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Overview of Perceptual Encoding

General Perceptual Audio Encoder (Painter & Spanias, 2000):• Psychoacoustic analysis => masking thresholds• Masking thresholds ≡ Excitation Pattern• Basic principle of Perceptual Audio Encoder: use masking pattern

of stimulus to determine the least number of bits necessary for each frequency sub-band, so as to prevent the quantization noisefrom becoming audible.

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 15: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Masking

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 16: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Quantization Noise

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1

0

1

2

3

Time (ms)

Am

plitu

de (

Qua

ntiz

atio

n Le

vels

)

102

103

104

−30

−20

−10

0

10

20

30

Frequency (Hz)

Leve

l (dB

)

Page 17: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Quantization Noise

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1

0

1

2

3

Time (ms)

Am

plitu

de (

Qua

ntiz

atio

n Le

vels

)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−15

−10

−5

0

5

10

15

Time (ms)

Am

plitu

de (

Qua

ntiz

atio

n Le

vels

)

102

103

104

−30

−20

−10

0

10

20

30

Frequency (Hz)

Leve

l (dB

)

Page 18: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Quantization Noise

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1

0

1

2

3

Time (ms)

Am

plitu

de (

Qua

ntiz

atio

n Le

vels

)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−15

−10

−5

0

5

10

15

Time (ms)

Am

plitu

de (

Qua

ntiz

atio

n Le

vels

)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−250

−200

−150

−100

−50

0

50

100

150

200

250

Time (ms)

Am

plitu

de (

Qua

ntiz

atio

n Le

vels

)

102

103

104

−30

−20

−10

0

10

20

30

Frequency (Hz)

Leve

l (dB

)

102

103

104

−30

−20

−10

0

10

20

30

Frequency (Hz)

Leve

l (dB

)

102

103

104

−30

−20

−10

0

10

20

30

Frequency (Hz)

Leve

l (dB

)

Page 19: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Sub-band Coding

102

103

104

−30

−20

−10

0

10

20

30

Frequency (Hz)

Leve

l (dB

)

Page 20: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Sub-band Coding

102

103

104

−30

−20

−10

0

10

20

30

Frequency (Hz)

Leve

l (dB

)

Page 21: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Sub-band Coding

102

103

104

−30

−20

−10

0

10

20

30

Frequency (Hz)

Leve

l (dB

)

Page 22: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Sub-band Coding

102

103

104

−30

−20

−10

0

10

20

30

Frequency (Hz)

Leve

l (dB

)

m

m-1

m+1

Page 23: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Masking/Bit Allocation

The number of bits used to encode each frequency sub-band is equal to the least number of bits with a quantization noise that is below the minimum masking threshold for that sub-band.

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 24: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Example: MPEG-1 Psychoacoustic Model I

1. Spectral Analysis and SPL Normalization

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 25: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Example: MPEG-1 Psychoacoustic Model I

2. Identification of Tonal Maskers & calculation of individual masking thresholds

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 26: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Example: MPEG-1 Psychoacoustic Model I

2. Identification of Noise Maskers & calculation of individual masking thresholds

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 27: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Example: MPEG-1 Psychoacoustic Model I

4. Calculation of Global Masking Thresholds

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 28: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Example: MPEG-1 Psychoacoustic Model I

A - Some portions of the input spectrum require SNR’s > 20 dB

B - Other portions require less than 3 dB SNR

C - Some high frequency portions are masked by the signal itself

D - Very high frequency portions fall below the absolute threshold of hearing.

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 29: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Example: MPEG-1 Psychoacoustic Model I

5. Sub-band Bit Allocation

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 30: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Lab ExperimentsExp 1: Masking Pattern

• Measure absolute hearing thresholds in quiet• Measure absolute hearing thresholds in presence of

narrowband noise masker

Exp 2: Masking Threshold• Measure masking threshold of a 1 kHz tone in the

presence of four different maskers:– Tone– Gaussian Noise– Multiplied Noise– Low-noise Noise

Page 31: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Exp 1: Masking Pattern

Figure removed due to copyright considerations. Please see: Noll, Peter. “MPEG Digital Audio Coding Standards.” The Digital Signal Processing Handbook. Edited by V. K.Madisetti and D. B. Williams. IEEE Press/CRC Press, 1998, pp. 40-1 - 40-28.

Page 32: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Method of Adjustment

Method of Adjustment (aka Békésy tracking method)Target tone is swept through frequency range, and subject must adjust intensity of target tone so that it is just barely detectable

Page 33: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Exp 2: Masking Thresholds

Calculation of tonal & noise masking thresholds:Tonal & noise maskers have different masking effects…

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 34: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Asymmetry of Simultaneous Masking

Noise maskerSNR ~ 4 dB

Tone maskerSNR ~ 24 dB

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 35: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Asymmetry of Simultaneous Masking

Why do tones and noises have different masking effects?

Signal = A(t) ejω(t) + φ(t)

For narrowband Gaussian noise, ejω(t) is approximately the same as a tone centered at the same frequency.

Asymmetry effect is either due to the amplitude term A(t) or to the phase term φ(t).

Page 36: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Asymmetry of Simultaneous Masking

Measure masking effects of “modified” noises:

Multiplied Noise: generated by multiplying a sinusoid at 1 kHz with a low-pass Gaussian noise.

Amplitude => Gaussian NoisePhase => Pure Tone

Low-Noise Noise: Gaussian noise with a temporal envelope that has been smoothed.

Amplitude => Pure TonePhase => Gaussian Noise

Page 37: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Exp 2: Masking Thresholds

Target(Quantization noise)

Masker(Desire signal)

Gaussian noise Tone

Gaussian noise Gaussian noise

Gaussian noise Multiplied noise

Gaussian noise Low-noise noise

1. Measure masking threshold for four different types of masker2. Comparing the modified noise thresholds with the tone & Gaussian

noise thresholds should indicate which component of the Gaussiannoise (Amplitude or Phase) contributes to the asymmetry effect.

Figure removed due to copyright reasons.Please see:Painter, T., and A. Spanias. "Perceptual codingof digital audio." Proceedings of the IEEE 88 (2000): 451-513.

Page 38: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Method: Adaptive Procedure

YY

YY YN

N YN

NN

Y

1 3 5 72 4 6 8

Trial Number

7574737271706968676665

Intensity 9 10 11 12

Threshold = average of reversal points (usually 6 or 7)

Page 39: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

Lab Write-up1) Describe the fundamental concepts behind digital audio &

perceptual audio encoders (e.g. quantization & quantization noise, sub-band coding & bit allocation, tone & noise masking thresholds, etc.)

2) Describe the methods of Experiment 1 and the results you obtained. Explain how the threshold results obtained relate to the maskingthresholds used in perceptual audio encoding.

3) Describe the methods of Experiment 2 and the results you obtained, highlighting the amplitude and phase characteristics of the two “modified” noises used. Based on your data, indicate which component (amplitude or phase) contributes to the asymmetry of simultaneous masking observed.

Page 40: Fundamentals of Perceptual Audio Coding of Perceptual Audio Encoding ... MPEG is a family of encoding standards for digital multimedia ... 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −3

References• Dau, T., Verhey, J., and Kohlrausch, A. (1999). "Intrinsic envelope

fluctuations and modulation-detection thresholds for narrow-band noise carriers," J. Acoust. Soc. Am. 106, 2752-2760.

• Kohlrausch, A., Fassel, R., van der Heijden, M., Kortekaas, R., van de Par, S., Oxenham, A. J., and Püschel, D. (1997). "Detection of tones in low-noise noise: Further evidence for the role of envelope fluctuations," Acta Acustica 83, 659-669.

• Peter Noll, MPEG Digital Audio Coding Standards, Chapter in: IEEE Press/CRC Press "The Digital Signal Processing Handbook” (ed.: V.K. Madisetti and D. B. Williams), pp. 40-1 - 40-28, 1998

• Painter, T., and Spanias, A. (2000). "Perceptual coding of digital audio," Proceedings of the IEEE 88, 451-513.

• Pan, D. (1995). "A tutorial on MPEG/audio compression," IEEE Multimedia Journal