Block Diagram of MPEG AAC

Design of MPEG-4 AAC Encoder

Authors:

Chi-Min Liu, Wen-Chieh Lee, Chung-Han Yang, Kang-Yan Peng, Ting Chiou, Tzu-Wen Chang, Yu-Hua Hsiao, Hen-Wen Hue and Chu-Ting Chien

Outline

Introduction

Psychoacoustic Model

M/S Coding

Window Switch

Temporal Noise Shaping

Experiments & Demonstration

Conclusion

Introduction–NCTU-AAC Encoder

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

Introduction–NCTU-AAC Encoder

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

1. Introduction–NCTU-AAC Encoder

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

1. Introduction

Modules Psychoacoustic Model

M/S Coding

Window Switch

Temporal Noise Shaping

Objective Theoretical Frameworks

Quality

Complexity

2. Psychoacoustic Model

Approach

MDCT-based instead of FFT-based.

New Masking Models

Detection of tonal attack band.

Detection of tone-rich signal.

2. Psychoacoustic Model (c.1)

MDCT and FFT

Similar spectrum.

MDCT spectrum is

chaotic due to the

aliasing.

MDCT leads to the

consistent spectrum

for analysis and

encoding process.

2. Psychoacoustic Model (c.2)

DCT Spectrum

Q-Bands instead of Lines or P-Bands

Tone/Noise information based on

Band Flatness instead of Frame Predictivity

For tone-rich signal in band, flatnessb approximates to 0

For noise-rich signal in band, flatnessb approximates to 1

1

0

1

0

1 1 , ,

N

i

ib

N

i

Nib

b

bb x

NAMxGM

AM

GMflatness

2. Psychoacoustic Model--Adaptive TMN and NMT offset

Utilization Human Perception

Insensitivity in high frequency

The masking effect in high frequency is

higher than the lower one

Offset

0

0.5

1

1.5

2

2.5

3

3.5

4

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Offset

2. Psychoacoustic Model–Tonal Attack and Tone-Rich Signals

Tone/Harmonic

Tonal attack.

Tone-rich signals.

Solution

Masking adjustment

Disable window switch

Original Spectrum Reconstructed Spectrum

2. Psychoacoustic Model–Concluding Remarks

New Models Filterbank instead of FFT.

SFM instead of unpredictivity.

Detection of tonal attack bands.

Detection of tonal-rich signals.

Noise masking effect alone.

Results Speedup by 70% and 65% for AAC and MP3.

Quality improves by 0.2 and 0.1 for AAC and MP3.

3. M/S Coding

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

3. M/S Coding

Issues & Approach

Band-Level Switching Decision Viterbi Algorithm from O(249) to O(49)

M/S Psychoacoustic Model Conservative masking threshold

Bit Allocated to M/S Channels Allocation Entropy

Joint Design with Window Switch Coupling

3. M/S Coding-- Viterbi Algorithm

Find the Optimal Solution SLR(i) and SMS(i) represent the optimal accumulated

cost found in i-th band

αLR,LR, αLR,MS, αMS,LR and αMS,MS represent the

transition cost

)0(LRn )47(LRn)1(LRn )48(LRn

)0(MSn )1(MSn )47(MSn

)1(LRS )47(LRS )48(LRS

)48(MSn

)0(LRS

)0(MSS )1(MSS )47(MSS )48(MSS

Scale factor band

LRLR, LRLR,

MSMS , MSMS ,

LRMS ,

MSLR,MSLR,

LRMS ,

3. M/S Coding–Frame-Level Switching

AE_MS < C1 * AE_LR

?

Use M/S Frame Use L/R Frame

False

True

Compare the AE of MS and LR

C1 is a constant factor

3. M/S Coding–M/S Psychoacoustic Model

Noise of Reconstructed Signal

][']['][' kSkMkL iii

][']['][' kSkMkR iii

][][][][

][][]['

kNkNkSkM

kNkLkL

ii

i

SMii

Lii

][][][][

][][]['

kNkNkSkM

kNkRkR

ii

i

SMii

Rii

3. M/S Coding–M/S Psychoacoustic Model

Variance of Noise

TX is the masking threshold of X channel

σX is the variance of X channel

Threshold of M/S Channels

iiSiMiLN LNN T 222

iiSiMiRN RNN T 222

),(5.02

iiiMN RL TTMin

),(5.02

iiiSN RL TTMin

),(5.0iii RLM TTMinT

),(5.0iii RLS TTMinT

3. M/S Coding–Allocation Entropy

Ei is the energy of i-th quantization band

Bi is effective bandwidth of i-th quantization band

Wi is the bandwidth of i-th quantization band

)*(0

)(

iii

iii

ii

i

Channel

BTEif

BTEifBT

E

SMRi

)1log( ii ChanneliChannel SMRWAE

3. M/S Coding–Available Bits in the M/S Channels

Channel Allocation Bits

B is allocated bits for current frame

L/R band

?

AEM = AEM + L_AE[i]

AES = AES + R_AE[i]

AEM = AEM + M_AE[i]

AES = AES + S_AE[i]

i < 49

?

False

False

True

True

BAEAE

AEBit

SM

MM

BAEAE

AEBit

SM

SS

4. Window Switch

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

4. Window Switch

Design Issues

Window Decision


Window Grouping

Joint Design with Other AAC Modules

4. Window Switch–Window Decision

Global Energy Ratio

Zero-Crossing Ratio

Tonal Attack

4. Window Switch–Psychoacoustic Model

Models based on Long Window

Calculate SMRs for Short Windows From

SMRs for Long Windows

band SMRs for long window

band SMRs for short window

4. Window Switch–Window Grouping

Calculate the Scale Factor Bit allocation module calculate the scale

factor for each band.

Error of Scale Factors

Criterion Minimizes the Grouping Number

Eg in each group should be smaller than a threshold M

b gw

bbgwbg bandwidthsharedsfsfE ,,

5. Temporal Noise Shaping

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

5. TNS

Three Artifacts

Error Amplification at

Attack periods

Time-Aliasing

TNS order vs Error.

Design Issues ?

Detection Mechanism

TNS Design

5. TNS

Remarks

Pre-aliasing leads to the tradeoff with Pre-echo

Post-aliasing may be masked by post-aliasing

5. TNS-- Ease Aliasing Artifacts

Combining with Window Switch

Long Start and Long Stop window

6. Experiments


M/S Coding

Window Switch

TNS

Overall

6. Experiments-- Test Samples

Track Time Signal description

1 10 es01 vocal (Suzan Vega)

Speech signal2 8 es02 German speech

3 7 es03 English speech

4 10 sc01 Trumpet solo and orchestra

Complex sound mixtures5 12 sc02 Orchestral piece

6 11 sc03 Contemporary pop music

7 7 si01 Harpsichord

Single instruments8 7 si02 Castanets

9 27 si03 pitch pipe

10 11 sm01 Bagpipes

Simple sound mixtures11 10 sm02 Glockenspiel

12 13 sm03 Plucked strings

6. Experiments–Psychoacoustic Model

Intel vTune 7.0

Psychoacoustic Models

P1: Psychoacoustic Model II

P4: MDCT Psychoacoustic Model

Speed up 72.58% over P1

1 2 3 4 5 Average Speedup (%)

P1 30.24 29.66 29.75 29.96 27.75 29.47 72.58

P4 8.57 8.94 8.00 7.31 7.59 8.08

6. Experiments--Psychoacoustic Model

Speed up 14.59% over P1

Tracks Length P1 P4Percentage

(%)

es01 02:51 26 19 26.92

es02 02:17 19 14 26.32

es03 04:03 36 27 25.00

sc01 02:55 22 18 18.18

sc02 03:23 28 23 17.86

sc03 03:04 27 23 14.81

si01 04:47 39 36 7.69

si02 03:05 30 26 13.33

si03 05:34 49 45 8.16

sm01 04:27 38 35 7.89

sm02 02:01 18 16 11.11

sm03 04:11 38 34 10.53

Average 30.8 26.3 14.59

6. Experiments--Psychoacoustic Model

Category Result

P4 gets better quality than P1 in speech signal,

single instrument and simple sound mixtures

For complex sound mixtures, only sc02 is worse

than P1

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03

P1

P4

6. Experiments– M/S Coding

Environment

Disable bit reservoir,

window switch and

TNS

Uses P4

Improve 0.39 of average ODG

Coding Mode L/R New M/S

es01 -1.57 -0.82

es02 -2.03 -0.55

es03 -2.21 -0.84

sc01 -0.74 -0.54

sc02 -1.11 -0.83

sc03 -0.7 -0.52

si01 -1.16 -1.05

si02 -3.24 -3.01

si03 -1.29 -1.21

sm01 -0.9 -0.93

sm02 -1.54 -1.4

sm03 -1.37 -1.5

Average -1.4883 -1.1

6. Experiments– Window Switch

Coupling Method

Average ODGs of with and without coupling method

are −0.7025 and −0.8483

Bit Rate=128Kbps, Sample Rate=44.1kHz, with Short Window and M/S

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

es01

es02

es03

sc01

sc02

sc03

si01

si02

si03

sm01

sm02

sm03

Averag

e

OD

G

NCTU_AAC without Coupling Method NCTU_AAC with Coupling Method

6. Experiments– TNS

Easing Aliasing Method

Improve quality except sm01

Especially for si02

6. Experiments– Overall

Commercial Encoders Nero 6.3

QuickTime 6.3

Result NCTU-AAC has better

quality in all tracks as compared to Nero 6.3

NCTU-AAC has better quality in 7 tracks as compared to QuickTime 6.3

NCTU-AAC performs better than these two encoders in average

Nero 6.3 QuickTime 6.3 NCTU-AAC

es01 -0.6 -0.32 -0.27

es02 -0.45 -0.11 -0.15

es03 -0.51 0.02 -0.23

sc01 -0.88 -0.22 -0.45

sc02 -1.38 -0.84 -0.66

sc03 -0.84 -0.64 -0.4

si01 -1.32 -0.71 -0.62

si02 -0.82 -0.72 -0.54

si03 -1.59 -0.78 -0.98

sm01 -1.36 -0.75 -0.61

sm02 -0.72 -0.37 -0.53

sm03 -1.29 -0.73 -0.62

Average -0.98 -0.51417 -0.505

Encoders with Audio Patch Method

Nero 6.3Nero6.3

+APM

QuickTime

6.3

QT6.3

+APMNCTU-AAC

NCTU-

AAC

+APM

es01 -0.6 -0.38 -0.32 -0.26 -0.27 -0.28

es02 -0.45 -0.44 -0.11 -0.18 -0.15 -0.14

es03 -0.51 -0.43 0.02 -0.02 -0.23 -0.24

sc01 -0.88 -0.73 -0.22 -0.21 -0.45 -0.43

sc02 -1.38 -0.70 -0.84 -0.43 -0.66 -0.51

sc03 -0.84 -0.40 -0.64 -0.32 -0.4 -0.37

si01 -1.32 -0.52 -0.71 -0.47 -0.62 -0.43

si02 -0.82 -0.63 -0.72 -0.55 -0.54 -0.53

si03 -1.59 -0.64 -0.78 -0.43 -0.98 -0.51

sm01 -1.36 -0.83 -0.75 -0.53 -0.61 -0.46

sm02 -0.72 -0.73 -0.37 -0.38 -0.53 -0.54

sm03 -1.29 -0.55 -0.73 -0.35 -0.62 -0.42

Average -0.98 -0.5817 -0.51417 -0.34417 -0.505 -0.4050

QuickTime 6.3 with APM gets the best quality in average

Conclusion

Quality and Efficiency

Efficient Psychoacoustic Model DCT-based Approach.

Tonal Attack bands and Tone-Rich Signals.

M/S Coding Efficient decision method.

Psychoacoustic model for M/S channels.

Viterbi algorithm.

Window Switch Switch Detection.

New grouping method.

Psychoacosutic Model for Short Window.

Conclusion

TNS Window Detection

New window switch policy

Bit Allocation Single Loop Approach

Bit Reservoir Two-Step Approach.

Filter bank Fast DCT method

Audio Patch Method Zero band and High frequency extension.

5. NCTU- AAC CODEC

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

Patch-

Enable

Decoder

Effect

5. NCTU- AAC CODEC (Patents)

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

Patch-

Enable

Decoder

Effect

SC03 Original

SC03 QT 6.3

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64

SC03 Nero 6.3

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84

SC03 Lame 3.88

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16

SC03 NCTU-AAC

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4

SC03 NCTU-MP3

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91

SC03 QT 6.3+APM

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91 -0.32

SC03 Nero 6.3+APM

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4

SC03 NCTU-AAC+APM

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37

SC03 NCTU-MP3+APM

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37 -0.38

SC03 Lame 3.88 + APM

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37 -0.38 -0.41

Questions

Block Diagram of MPEG AAC

Documents