Top Banner
Design of MPEG-4 AAC Encoder Authors: Chi-Min Liu, Wen-Chieh Lee, Chung-Han Yang, Kang- Yan Peng, Ting Chiou, Tzu-Wen Chang, Yu-Hua Hsiao, Hen-Wen Hue and Chu-Ting Chien
55

Block Diagram of MPEG AAC

Jan 25, 2017

Download

Documents

donhan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Block Diagram of MPEG AAC

Design of MPEG-4 AAC Encoder

Authors:

Chi-Min Liu, Wen-Chieh Lee, Chung-Han Yang, Kang-Yan Peng, Ting Chiou, Tzu-Wen Chang, Yu-Hua Hsiao, Hen-Wen Hue and Chu-Ting Chien

Page 2: Block Diagram of MPEG AAC

Outline

Introduction

Psychoacoustic Model

M/S Coding

Window Switch

Temporal Noise Shaping

Experiments & Demonstration

Conclusion

Page 3: Block Diagram of MPEG AAC

Introduction–NCTU-AAC Encoder

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

Page 4: Block Diagram of MPEG AAC

Introduction–NCTU-AAC Encoder

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

Page 5: Block Diagram of MPEG AAC

1. Introduction–NCTU-AAC Encoder

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

Page 6: Block Diagram of MPEG AAC

1. Introduction

Modules Psychoacoustic Model

M/S Coding

Window Switch

Temporal Noise Shaping

Objective Theoretical Frameworks

Quality

Complexity

Page 7: Block Diagram of MPEG AAC

2. Psychoacoustic Model

Approach

MDCT-based instead of FFT-based.

New Masking Models

Detection of tonal attack band.

Detection of tone-rich signal.

Page 8: Block Diagram of MPEG AAC

2. Psychoacoustic Model (c.1)

MDCT and FFT

Similar spectrum.

MDCT spectrum is

chaotic due to the

aliasing.

MDCT leads to the

consistent spectrum

for analysis and

encoding process.

Page 9: Block Diagram of MPEG AAC

2. Psychoacoustic Model (c.2)

DCT Spectrum

Q-Bands instead of Lines or P-Bands

Tone/Noise information based on

Band Flatness instead of Frame Predictivity

For tone-rich signal in band, flatnessb approximates to 0

For noise-rich signal in band, flatnessb approximates to 1

1

0

1

0

1 1 , ,

N

i

ib

N

i

Nib

b

bb x

NAMxGM

AM

GMflatness

Page 10: Block Diagram of MPEG AAC

2. Psychoacoustic Model--Adaptive TMN and NMT offset

Utilization Human Perception

Insensitivity in high frequency

The masking effect in high frequency is

higher than the lower one

Offset

0

0.5

1

1.5

2

2.5

3

3.5

4

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Offset

Page 11: Block Diagram of MPEG AAC

2. Psychoacoustic Model–Tonal Attack and Tone-Rich Signals

Tone/Harmonic

Tonal attack.

Tone-rich signals.

Solution

Masking adjustment

Disable window switch

Original Spectrum Reconstructed Spectrum

Page 12: Block Diagram of MPEG AAC

2. Psychoacoustic Model–Concluding Remarks

New Models Filterbank instead of FFT.

SFM instead of unpredictivity.

Detection of tonal attack bands.

Detection of tonal-rich signals.

Noise masking effect alone.

Results Speedup by 70% and 65% for AAC and MP3.

Quality improves by 0.2 and 0.1 for AAC and MP3.

Page 13: Block Diagram of MPEG AAC

3. M/S Coding

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

Page 14: Block Diagram of MPEG AAC

3. M/S Coding

Issues & Approach

Band-Level Switching Decision Viterbi Algorithm from O(249) to O(49)

M/S Psychoacoustic Model Conservative masking threshold

Bit Allocated to M/S Channels Allocation Entropy

Joint Design with Window Switch Coupling

Page 15: Block Diagram of MPEG AAC

3. M/S Coding-- Viterbi Algorithm

Find the Optimal Solution SLR(i) and SMS(i) represent the optimal accumulated

cost found in i-th band

αLR,LR, αLR,MS, αMS,LR and αMS,MS represent the

transition cost

)0(LRn )47(LRn)1(LRn )48(LRn

)0(MSn )1(MSn )47(MSn

)1(LRS )47(LRS )48(LRS

)48(MSn

)0(LRS

)0(MSS )1(MSS )47(MSS )48(MSS

Scale factor band

LRLR, LRLR,

MSMS , MSMS ,

LRMS ,

MSLR,MSLR,

LRMS ,

Page 16: Block Diagram of MPEG AAC

3. M/S Coding–Frame-Level Switching

AE_MS < C1 * AE_LR

?

Use M/S Frame Use L/R Frame

False

True

Compare the AE of MS and LR

C1 is a constant factor

Page 17: Block Diagram of MPEG AAC

3. M/S Coding–M/S Psychoacoustic Model

Noise of Reconstructed Signal

][']['][' kSkMkL iii

][']['][' kSkMkR iii

][][][][

][][]['

kNkNkSkM

kNkLkL

ii

i

SMii

Lii

][][][][

][][]['

kNkNkSkM

kNkRkR

ii

i

SMii

Rii

Page 18: Block Diagram of MPEG AAC

3. M/S Coding–M/S Psychoacoustic Model

Variance of Noise

TX is the masking threshold of X channel

σX is the variance of X channel

Threshold of M/S Channels

iiSiMiLN LNN T 222

iiSiMiRN RNN T 222

),(5.02

iiiMN RL TTMin

),(5.02

iiiSN RL TTMin

),(5.0iii RLM TTMinT

),(5.0iii RLS TTMinT

Page 19: Block Diagram of MPEG AAC

3. M/S Coding–Allocation Entropy

Ei is the energy of i-th quantization band

Bi is effective bandwidth of i-th quantization band

Wi is the bandwidth of i-th quantization band

)*(0

)(

iii

iii

ii

i

Channel

BTEif

BTEifBT

E

SMRi

)1log( ii ChanneliChannel SMRWAE

Page 20: Block Diagram of MPEG AAC

3. M/S Coding–Available Bits in the M/S Channels

Channel Allocation Bits

B is allocated bits for current frame

L/R band

?

AEM = AEM + L_AE[i]

AES = AES + R_AE[i]

AEM = AEM + M_AE[i]

AES = AES + S_AE[i]

i < 49

?

False

False

True

True

BAEAE

AEBit

SM

MM

BAEAE

AEBit

SM

SS

Page 21: Block Diagram of MPEG AAC

4. Window Switch

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

Page 22: Block Diagram of MPEG AAC

4. Window Switch

Design Issues

Window Decision

Psychoacoustic Model

Window Grouping

Joint Design with Other AAC Modules

Page 23: Block Diagram of MPEG AAC

4. Window Switch–Window Decision

Global Energy Ratio

Zero-Crossing Ratio

Tonal Attack

Page 24: Block Diagram of MPEG AAC

4. Window Switch–Psychoacoustic Model

Models based on Long Window

Calculate SMRs for Short Windows From

SMRs for Long Windows

band SMRs for long window

band SMRs for short window

Page 25: Block Diagram of MPEG AAC

4. Window Switch–Window Grouping

Calculate the Scale Factor Bit allocation module calculate the scale

factor for each band.

Error of Scale Factors

Criterion Minimizes the Grouping Number

Eg in each group should be smaller than a threshold M

b gw

bbgwbg bandwidthsharedsfsfE ,,

Page 26: Block Diagram of MPEG AAC

5. Temporal Noise Shaping

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

Page 27: Block Diagram of MPEG AAC

5. TNS

Three Artifacts

Error Amplification at

Attack periods

Time-Aliasing

TNS order vs Error.

Design Issues ?

Detection Mechanism

TNS Design

Page 28: Block Diagram of MPEG AAC

5. TNS

Remarks

Pre-aliasing leads to the tradeoff with Pre-echo

Post-aliasing may be masked by post-aliasing

Page 29: Block Diagram of MPEG AAC

5. TNS-- Ease Aliasing Artifacts

Combining with Window Switch

Long Start and Long Stop window

Page 30: Block Diagram of MPEG AAC

6. Experiments

Psychoacoustic Model

M/S Coding

Window Switch

TNS

Overall

Page 31: Block Diagram of MPEG AAC

6. Experiments-- Test Samples

Track Time Signal description

1 10 es01 vocal (Suzan Vega)

Speech signal2 8 es02 German speech

3 7 es03 English speech

4 10 sc01 Trumpet solo and orchestra

Complex sound mixtures5 12 sc02 Orchestral piece

6 11 sc03 Contemporary pop music

7 7 si01 Harpsichord

Single instruments8 7 si02 Castanets

9 27 si03 pitch pipe

10 11 sm01 Bagpipes

Simple sound mixtures11 10 sm02 Glockenspiel

12 13 sm03 Plucked strings

Page 32: Block Diagram of MPEG AAC

6. Experiments–Psychoacoustic Model

Intel vTune 7.0

Psychoacoustic Models

P1: Psychoacoustic Model II

P4: MDCT Psychoacoustic Model

Speed up 72.58% over P1

1 2 3 4 5 Average Speedup (%)

P1 30.24 29.66 29.75 29.96 27.75 29.47 72.58

P4 8.57 8.94 8.00 7.31 7.59 8.08

Page 33: Block Diagram of MPEG AAC

6. Experiments--Psychoacoustic Model

Speed up 14.59% over P1

Tracks Length P1 P4Percentage

(%)

es01 02:51 26 19 26.92

es02 02:17 19 14 26.32

es03 04:03 36 27 25.00

sc01 02:55 22 18 18.18

sc02 03:23 28 23 17.86

sc03 03:04 27 23 14.81

si01 04:47 39 36 7.69

si02 03:05 30 26 13.33

si03 05:34 49 45 8.16

sm01 04:27 38 35 7.89

sm02 02:01 18 16 11.11

sm03 04:11 38 34 10.53

Average 30.8 26.3 14.59

Page 34: Block Diagram of MPEG AAC

6. Experiments--Psychoacoustic Model

Category Result

P4 gets better quality than P1 in speech signal,

single instrument and simple sound mixtures

For complex sound mixtures, only sc02 is worse

than P1

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03

P1

P4

Page 35: Block Diagram of MPEG AAC

6. Experiments– M/S Coding

Environment

Disable bit reservoir,

window switch and

TNS

Uses P4

Improve 0.39 of average ODG

Coding Mode L/R New M/S

es01 -1.57 -0.82

es02 -2.03 -0.55

es03 -2.21 -0.84

sc01 -0.74 -0.54

sc02 -1.11 -0.83

sc03 -0.7 -0.52

si01 -1.16 -1.05

si02 -3.24 -3.01

si03 -1.29 -1.21

sm01 -0.9 -0.93

sm02 -1.54 -1.4

sm03 -1.37 -1.5

Average -1.4883 -1.1

Page 36: Block Diagram of MPEG AAC

6. Experiments– Window Switch

Coupling Method

Average ODGs of with and without coupling method

are −0.7025 and −0.8483

Bit Rate=128Kbps, Sample Rate=44.1kHz, with Short Window and M/S

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

es01

es02

es03

sc01

sc02

sc03

si01

si02

si03

sm01

sm02

sm03

Averag

e

OD

G

NCTU_AAC without Coupling Method NCTU_AAC with Coupling Method

Page 37: Block Diagram of MPEG AAC

6. Experiments– TNS

Easing Aliasing Method

Improve quality except sm01

Especially for si02

Page 38: Block Diagram of MPEG AAC

6. Experiments– Overall

Commercial Encoders Nero 6.3

QuickTime 6.3

Result NCTU-AAC has better

quality in all tracks as compared to Nero 6.3

NCTU-AAC has better quality in 7 tracks as compared to QuickTime 6.3

NCTU-AAC performs better than these two encoders in average

Nero 6.3 QuickTime 6.3 NCTU-AAC

es01 -0.6 -0.32 -0.27

es02 -0.45 -0.11 -0.15

es03 -0.51 0.02 -0.23

sc01 -0.88 -0.22 -0.45

sc02 -1.38 -0.84 -0.66

sc03 -0.84 -0.64 -0.4

si01 -1.32 -0.71 -0.62

si02 -0.82 -0.72 -0.54

si03 -1.59 -0.78 -0.98

sm01 -1.36 -0.75 -0.61

sm02 -0.72 -0.37 -0.53

sm03 -1.29 -0.73 -0.62

Average -0.98 -0.51417 -0.505

Page 39: Block Diagram of MPEG AAC

Encoders with Audio Patch Method

Nero 6.3Nero6.3

+APM

QuickTime

6.3

QT6.3

+APMNCTU-AAC

NCTU-

AAC

+APM

es01 -0.6 -0.38 -0.32 -0.26 -0.27 -0.28

es02 -0.45 -0.44 -0.11 -0.18 -0.15 -0.14

es03 -0.51 -0.43 0.02 -0.02 -0.23 -0.24

sc01 -0.88 -0.73 -0.22 -0.21 -0.45 -0.43

sc02 -1.38 -0.70 -0.84 -0.43 -0.66 -0.51

sc03 -0.84 -0.40 -0.64 -0.32 -0.4 -0.37

si01 -1.32 -0.52 -0.71 -0.47 -0.62 -0.43

si02 -0.82 -0.63 -0.72 -0.55 -0.54 -0.53

si03 -1.59 -0.64 -0.78 -0.43 -0.98 -0.51

sm01 -1.36 -0.83 -0.75 -0.53 -0.61 -0.46

sm02 -0.72 -0.73 -0.37 -0.38 -0.53 -0.54

sm03 -1.29 -0.55 -0.73 -0.35 -0.62 -0.42

Average -0.98 -0.5817 -0.51417 -0.34417 -0.505 -0.4050

QuickTime 6.3 with APM gets the best quality in average

Page 40: Block Diagram of MPEG AAC

Conclusion

Quality and Efficiency

Efficient Psychoacoustic Model DCT-based Approach.

Tonal Attack bands and Tone-Rich Signals.

M/S Coding Efficient decision method.

Psychoacoustic model for M/S channels.

Viterbi algorithm.

Window Switch Switch Detection.

New grouping method.

Psychoacosutic Model for Short Window.

Page 41: Block Diagram of MPEG AAC

Conclusion

TNS Window Detection

New window switch policy

Bit Allocation Single Loop Approach

Bit Reservoir Two-Step Approach.

Filter bank Fast DCT method

Audio Patch Method Zero band and High frequency extension.

Page 42: Block Diagram of MPEG AAC

5. NCTU- AAC CODEC

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

Patch-

Enable

Decoder

Effect

Page 43: Block Diagram of MPEG AAC

5. NCTU- AAC CODEC (Patents)

Filterbank

Bit

-Str

eam

Pack

ing

TNS

M/S

Psychoacoustic

Model

Audio in

Bit Allocation

W-Switch

Quantization

Bit Reservoir

VLC

Patch-

Enable

Decoder

Effect

Page 44: Block Diagram of MPEG AAC

SC03 Original

Page 45: Block Diagram of MPEG AAC

SC03 QT 6.3

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64

Page 46: Block Diagram of MPEG AAC

SC03 Nero 6.3

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84

Page 47: Block Diagram of MPEG AAC

SC03 Lame 3.88

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16

Page 48: Block Diagram of MPEG AAC

SC03 NCTU-AAC

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4

Page 49: Block Diagram of MPEG AAC

SC03 NCTU-MP3

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91

Page 50: Block Diagram of MPEG AAC

SC03 QT 6.3+APM

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91 -0.32

Page 51: Block Diagram of MPEG AAC

SC03 Nero 6.3+APM

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4

Page 52: Block Diagram of MPEG AAC

SC03 NCTU-AAC+APM

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37

Page 53: Block Diagram of MPEG AAC

SC03 NCTU-MP3+APM

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37 -0.38

Page 54: Block Diagram of MPEG AAC

SC03 Lame 3.88 + APM

QT6.3 Nero

6.3

Lame 3.88

NCTU

-AAC

NCTU

-MP3

QT6.3

+APM

Nero

6.3

+APM

NCTU

-AAC

+APM

NCTU

-MP3

+APM

Lame

3.88

+APM

-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37 -0.38 -0.41

Page 55: Block Diagram of MPEG AAC

Questions