Top Banner
© Fraunhofer IDMT 1 Multimedia Standards SS 2017 Lecture 7 Prof. Dr.-Ing. Karlheinz Brandenburg [email protected] Contact: Dipl.-Inf. Thomas Köllmer [email protected]
58

mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

Aug 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

1

Multimedia Standards

SS 2017

Lecture 7

Prof. Dr.-Ing. Karlheinz Brandenburg

[email protected]

Contact:

Dipl.-Inf. Thomas Köllmer [email protected]

Page 2: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

2

Psychoacoustic Fundamentals

MPEG Audio Coding

Speech Coding

AUDIO CODING

Page 3: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

3

Capabilities of the human ear

Frequency Range: ca. 16 Hz to. 8 - 25 kHz (typically 16Hz – 20 kHz)

Frequency Resolution: ca. 640 steps

Dynamic Range: ca. 120-130 dB

Dynamic Resolution: better than 1dB

Page 4: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

4

Psychoacoustic Fundamentals

Human Hearing

Source: Ars Auditus; http://www.dasp.uni-wuppertal.de/index.php?id=57, 2010

outer ear middleear

inner ear

ear canal

pinn

a

cochlea with organ

of corti

archwaysossicles

eustachian tubeear drum

Page 5: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

5

Psychoacoustic Fundamentals

Schematic drawing of the organ of corti

Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

Page 6: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

6

Psychoacoustic Fundamentals

Preprocessing of sound in the peripheral system

Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

Page 7: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

7

Psychoacoustic Fundamentals

Information processing in the auditory system

Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

Page 8: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

8

Psychoacoustic Fundamentals

Sound perception

Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

Page 9: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

9

Psychoacoustic Fundamentals

Threshold in quiet or absolute threshold

Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

Page 10: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

10

Psychoacoustic Fundamentals

Critical bands (“Frequenzgruppen”) in human hearing:

Different interpretations that produce the same segmentation

Constant distance in the Cochlea

Tones in a critical band above the threshold in quiet: their energy adds up

Tones in a critical band under the threshold in quiet: their energyadds up and might become audible

“Formula” for the width of the frequency bands

for frequencies < 500 Hz: Constant 100Hz width

for frequencies > 500 Hz: 0,2*frequency

Page 11: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

11

Psychoacoustic Fundamentals

Critical bandwidth as a function of frequency

Approximations for low and high frequency ranges are indicated by broken lines.

Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

Page 12: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

12

Psychoacoustic Fundamentals

Pure tones masked by white (broad-band) noise

Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

Page 13: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

13

Psychoacoustic Fundamentals

Narrow band noise masking a tone at different center frequencies

Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

Page 14: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

14

Psychoacoustic Fundamentals

Narrow band noise masking a tone at varying levels (center frequency: 1kHz)

Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

Page 15: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

15

Source: U. Zölzer, “DigitaleAudiosignalverarbeitung”

Psychoacoustic Fundamentals

Masking neighboring bands

Page 16: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

Psychoacoustic Fundamentals

Masking in the time domain / temporal masking effects

Depends on various factors

Duration of the masking signal

Intensity and spectrum of the masker

Time and frequency of both signals

• Source: Zwicker&Fastl “Psychoacoustics Facts and Models”

Page 17: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

17

Psychoacoustic Fundamentals

Temporal masking effects

Post-Masking: corresponds to decay in the effect of the masker expected

Pre-Masking: appears during time before masker is switched on

Quick build-up time for loud maskers

Slower build-up time for faint test sounds

Frequency resolution Blurring in time

Frequency resolution in the ear ⇒ Masking in time

Because of in-ear fast processing between quiet to loud signals, we get Pre-Echoes

Pre-Masking: 1-5 ms

Post-Masking: ~100ms

Page 18: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

18

Psychoacoustic Fundamentals

Pre-Echo: Example without Pre-Echo

Page 19: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

19

Psychoacoustic Fundamentals

Pre-Echo: Example

Page 20: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

20

Audio Examples

Example 10:

Castanets original

Example 11:

Castanets coded with a block size of 2048 samples

Page 21: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

21

Demo: The "13 dB-miracle"

Original signal

Original + white noise, SNR = 13,6 dB

Original + noise at threshold, S/N = 13,6 dB

Difference (modulated white noise)

Difference (noise at threshold)

Page 22: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

22

The "13 dB-miracle”

Page 23: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

The McGurk Effekt

Page 24: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

24

Psychoacoustic Fundamentals

Block diagram of a perceptual audio encoder

AnalysisFilter bank

Quantizationand coding

Serial bitstreammultiplexing

Calculation ofmasking threshold

based on psychoacoustics

Audio in bitstream

Page 25: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

25

Filter BankBit or NoiseAllocation

BitstreamFormatting

PsychoacousticModel

Digital AudioInput

Signal toMask Ratio

EncodedBitstream

QuantizedSamples

The Basic Paradigm of T/F Domain Audio Coding

Page 26: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

26

MPEG Audio Coding

Page 27: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

27

History of Audio Coding

1979 - the „Critical Band Coder“

1982 - „classic ATC“ for Music

1985 - MSC

1987 - OCF

1990 - MUSICAM

1990 - ASPEC

1992 - MPEG 1

1996 - PAC

1997 - MPEG 2 AAC

1999 - MPEG 4 AAC

2002 - HE AAC

2005 - MPEG 4 ALS

2006 - MPEG-D MPEG Surround (MPS)

2010 - MPEG-D Spatial Audio Object Coding (SAOC)

2012 - MPEG-D Unified Speech and Audio Coding (USAC)

2015 - MPEG H High Efficiency Coding and Media Delivery in Heterogeneous Environments

Page 28: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

28

MPEG-1 Audio Main building blocks

Perceptual model: - using psychoacoustics, mostly proprietary

Filter bank: - subdividing the input signal into spectral

components

- more lines ⇒ more coding gain

- longer impulse response ⇒ pre-echo artifacts

Quantization & coding: - this is the step introducing quantization noise

- spectral shape of quantization noise determines

the audibility

- can be designed to leave encoding methods

optional

MPEG-1 Audio

Page 29: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

29

MPEG – 1 Audio

Structure of the Encoder

Page 30: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

30

MPEG – 1 Audio

Structure of the Decoder

Page 31: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

31

MPEG – 1 Audio

Short description of the layers

Layer-1: Frame length: 384 samples (approx. 8ms) Spectral resolution: 32 sub bands Quantization: Block companding (12 Samples)

Layer-2: Frame length: 1152 Samples (approx.24ms) Spectral resolution: 32 sub bands Quantization: Block companding (12 Samples) Usage of the Scale-Factor-Select information

Layer-3: Frame length: 1152 Samples (approx. 24ms) Spectral resolution: 576 spectral lines Quantization: non uniform with Huffman coding Usage of the Scale-Factor-Select information

Page 32: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

32

MPEG – 1 Audio

Joint Stereo Mode for an additional increase of the compression rate

Different possibilities:

Mid/Side (M/S) stereo coding:

Two channels are coded; the left channel contains the sum of both original channels, the right channel contains the difference

Intensity coding:

Either both channels are coded separately (“stereo“-mode) or “Intensity Stereo“-coding is used

For higher frequencies only a mono signal is transmitted, which is adjusted nearby the original stereo position

Page 33: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

33

Trademark by Philips: PASC

Processing of frames with 384 PCM samples each

Signal is split to 32 bandsby a polyphase filter bank

32 frequency bands of 12 samples each

MPEG-1 Audio Layer-1

Page 34: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

34

MPEG-1 Audio Layer-1

Page 35: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

35

MPEG-1 Audio Layer-2

Trademark: MUSICAM

Processing of frames with 1152 PCM samples each

36 subband samples, grouped to 3 blocks with 12 samples each

Layer-2 offers the possibility of bit allocation, scale factors and samples just like Layer-1

Additionally: scale factor select information and packing of bits

Theoretical minimum of the coding / decoding delay: approx. 35 ms

Page 36: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

36

MPEG-1 Audio Layer-2

Bitstream structureLayer-2

Structure of Layer-2subband samples

Page 37: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

37

Layer 3 :

Standard frame length: 1152 samples (24 ms @48 kHz)

Frequency resolution: 576/192 sub-bands

Quantization: non-uniform with Huffman coding – Use of scale factor select Information

One benefit of MP3-formats is that it is a headerless file format, which means that it is not necessary to have a header to play the music.

Allows MP3 streaming

Theoretic minimum delay of the Coder/Decoder is around 59 ms.

MPEG-1 Audio Layer 3

Page 38: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

38

MPEG-1 Audio Layer-3

Two different MDCT block lengths – a long block of 18 samples or a short block of 6 samples

“Joint Stereo” with Mid/Side- and intensity coding:

M/S: Not the left and right channel (L/R), but the Mid and Side channel (M=(L+R)/2, S=(L-R)/2 are transmitted.

Page 39: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

39

MPEG-1 Audio Layer-3

Constant part:

Fixed number of bytes: 17 in mono, 32 in stereo, independent of the bit rate

Header (ISO Standard, as in Layer I and Layer –II)

Additional information for the frame: (e.g. pointer on the variable part)

Additional information per granule (e.g. selection of the Huffman tables)

Variable part:

Also called ”main info”

Scale factors

Huffman coded frequency lines

Additional data

In Layer-III the bit rates can be switched dynamically

Page 40: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

40

MPEG-1 Audio Layer-3 – Bit Reservoirs

Page 41: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

41

Layer 3 Block Diagram

MPEG-1 Audio Layer 3

Page 42: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

42

Bit Stream Syntax

Page 43: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

43

MPEG – 1 Audio – Bit Stream Syntax Layer -1, -2, and -3 Compression

Layer Bits indicate the used Layer

The higher the layer, the better the compression, but more processing power is required

00 reserved

01 Layer III

10 Layer II

11 Layer I

Page 44: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

44

MPEG-1 Audio Bit Stream Syntax

Page 45: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

45

MPEG-1 Audio Bit Stream Syntax – header()

Page 46: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

46

MPEG-1 Audio Bit Stream Syntax – header()

Page 47: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

47

MPEG-1 Audio Bit Stream Syntax – header()

Page 48: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

48

MPEG-1 Audio Bit Stream Syntax – audio_data()

Audio Data Layer-1

Page 49: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

49

MPEG-1 Audio Bit Stream Syntax – audio_data()

Audio Data Layer-2

Page 50: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

50

MPEG-1 Audio Bit Stream Syntax – audio_data()

Audio Data Layer-2

Page 51: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

51

MPEG-1 Audio Bit Stream Syntax – audio_data()

Audio Data Layer-3

Page 52: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

52

MPEG-1 Audio Bit Stream Syntax – audio_data()

Audio Data Layer-3

Page 53: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

53

MPEG Audio Layer-3: Huffman-Code Tables

Page 54: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

54

MPEG Audio Layer-3: Huffman-Code Tables

Page 55: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

55

MPEG Audio Layer-3: Huffman-Code Tables

Page 56: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

56

MPEG Audio Layer-3: Huffman-Code Tables

Page 57: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

57

More on Audio Coding will be covered in the dedicated lecture series!

http://www.tu-ilmenau.de/mt/lehrveranstaltungen/lehre-fuer-master-mt/audio-coding/

Page 58: mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

© Fraunhofer IDMT

58

Organisational issues Preliminary list of lectures – updated version is on the website

* ISO 8601 Representation of dates and times ch. 2.2.10 : calendar week number: ordinal number which identifies a calendar week within its calendar year according to the rule that the first calendar week of a year is that one which includes the first Thursday of that year and that the last calendar week of a calendar year is the week immediately preceding the first calendar week of the next calendar year

Tuesday, 17:00, K-Hs1 Thursday, 13:00, K-Hs2

CW* 14 IntroductionCW15 Standardization I Standardization IICW16CW17CW18 Video Coding ICW19 Video Coding II Video Coding IIICW20CW21 Psychoacoustic FundamentalsCW22 Metadata StandardsCW23 MPEG Audio I MPEG Audio IICW24 Speech CodingCW25CW26CW27 System Standards ICW28 System Standards II System Standards III

Regular Date Alternate Date