Design of MPEG-4 AAC Encoder Authors: Chi-Min Liu, Wen-Chieh Lee, Chung-Han Yang, Kang- Yan Peng, Ting Chiou, Tzu-Wen Chang, Yu-Hua Hsiao, Hen-Wen Hue and Chu-Ting Chien
Design of MPEG-4 AAC Encoder
Authors:
Chi-Min Liu, Wen-Chieh Lee, Chung-Han Yang, Kang-Yan Peng, Ting Chiou, Tzu-Wen Chang, Yu-Hua Hsiao, Hen-Wen Hue and Chu-Ting Chien
Outline
Introduction
Psychoacoustic Model
M/S Coding
Window Switch
Temporal Noise Shaping
Experiments & Demonstration
Conclusion
Introduction–NCTU-AAC Encoder
Filterbank
Bit
-Str
eam
Pack
ing
TNS
M/S
Psychoacoustic
Model
Audio in
Bit Allocation
W-Switch
Quantization
Bit Reservoir
VLC
Introduction–NCTU-AAC Encoder
Filterbank
Bit
-Str
eam
Pack
ing
TNS
M/S
Psychoacoustic
Model
Audio in
Bit Allocation
W-Switch
Quantization
Bit Reservoir
VLC
1. Introduction–NCTU-AAC Encoder
Filterbank
Bit
-Str
eam
Pack
ing
TNS
M/S
Psychoacoustic
Model
Audio in
Bit Allocation
W-Switch
Quantization
Bit Reservoir
VLC
1. Introduction
Modules Psychoacoustic Model
M/S Coding
Window Switch
Temporal Noise Shaping
Objective Theoretical Frameworks
Quality
Complexity
2. Psychoacoustic Model
Approach
MDCT-based instead of FFT-based.
New Masking Models
Detection of tonal attack band.
Detection of tone-rich signal.
2. Psychoacoustic Model (c.1)
MDCT and FFT
Similar spectrum.
MDCT spectrum is
chaotic due to the
aliasing.
MDCT leads to the
consistent spectrum
for analysis and
encoding process.
2. Psychoacoustic Model (c.2)
DCT Spectrum
Q-Bands instead of Lines or P-Bands
Tone/Noise information based on
Band Flatness instead of Frame Predictivity
For tone-rich signal in band, flatnessb approximates to 0
For noise-rich signal in band, flatnessb approximates to 1
1
0
1
0
1 1 , ,
N
i
ib
N
i
Nib
b
bb x
NAMxGM
AM
GMflatness
2. Psychoacoustic Model--Adaptive TMN and NMT offset
Utilization Human Perception
Insensitivity in high frequency
The masking effect in high frequency is
higher than the lower one
Offset
0
0.5
1
1.5
2
2.5
3
3.5
4
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Offset
2. Psychoacoustic Model–Tonal Attack and Tone-Rich Signals
Tone/Harmonic
Tonal attack.
Tone-rich signals.
Solution
Masking adjustment
Disable window switch
Original Spectrum Reconstructed Spectrum
2. Psychoacoustic Model–Concluding Remarks
New Models Filterbank instead of FFT.
SFM instead of unpredictivity.
Detection of tonal attack bands.
Detection of tonal-rich signals.
Noise masking effect alone.
Results Speedup by 70% and 65% for AAC and MP3.
Quality improves by 0.2 and 0.1 for AAC and MP3.
3. M/S Coding
Filterbank
Bit
-Str
eam
Pack
ing
TNS
M/S
Psychoacoustic
Model
Audio in
Bit Allocation
W-Switch
Quantization
Bit Reservoir
VLC
3. M/S Coding
Issues & Approach
Band-Level Switching Decision Viterbi Algorithm from O(249) to O(49)
M/S Psychoacoustic Model Conservative masking threshold
Bit Allocated to M/S Channels Allocation Entropy
Joint Design with Window Switch Coupling
3. M/S Coding-- Viterbi Algorithm
Find the Optimal Solution SLR(i) and SMS(i) represent the optimal accumulated
cost found in i-th band
αLR,LR, αLR,MS, αMS,LR and αMS,MS represent the
transition cost
)0(LRn )47(LRn)1(LRn )48(LRn
)0(MSn )1(MSn )47(MSn
)1(LRS )47(LRS )48(LRS
)48(MSn
)0(LRS
)0(MSS )1(MSS )47(MSS )48(MSS
Scale factor band
LRLR, LRLR,
MSMS , MSMS ,
LRMS ,
MSLR,MSLR,
LRMS ,
3. M/S Coding–Frame-Level Switching
AE_MS < C1 * AE_LR
?
Use M/S Frame Use L/R Frame
False
True
Compare the AE of MS and LR
C1 is a constant factor
3. M/S Coding–M/S Psychoacoustic Model
Noise of Reconstructed Signal
][']['][' kSkMkL iii
][']['][' kSkMkR iii
][][][][
][][]['
kNkNkSkM
kNkLkL
ii
i
SMii
Lii
][][][][
][][]['
kNkNkSkM
kNkRkR
ii
i
SMii
Rii
3. M/S Coding–M/S Psychoacoustic Model
Variance of Noise
TX is the masking threshold of X channel
σX is the variance of X channel
Threshold of M/S Channels
iiSiMiLN LNN T 222
iiSiMiRN RNN T 222
),(5.02
iiiMN RL TTMin
),(5.02
iiiSN RL TTMin
),(5.0iii RLM TTMinT
),(5.0iii RLS TTMinT
3. M/S Coding–Allocation Entropy
Ei is the energy of i-th quantization band
Bi is effective bandwidth of i-th quantization band
Wi is the bandwidth of i-th quantization band
)*(0
)(
iii
iii
ii
i
Channel
BTEif
BTEifBT
E
SMRi
)1log( ii ChanneliChannel SMRWAE
3. M/S Coding–Available Bits in the M/S Channels
Channel Allocation Bits
B is allocated bits for current frame
L/R band
?
AEM = AEM + L_AE[i]
AES = AES + R_AE[i]
AEM = AEM + M_AE[i]
AES = AES + S_AE[i]
i < 49
?
False
False
True
True
BAEAE
AEBit
SM
MM
BAEAE
AEBit
SM
SS
4. Window Switch
Filterbank
Bit
-Str
eam
Pack
ing
TNS
M/S
Psychoacoustic
Model
Audio in
Bit Allocation
W-Switch
Quantization
Bit Reservoir
VLC
4. Window Switch
Design Issues
Window Decision
Psychoacoustic Model
Window Grouping
Joint Design with Other AAC Modules
4. Window Switch–Window Decision
Global Energy Ratio
Zero-Crossing Ratio
Tonal Attack
4. Window Switch–Psychoacoustic Model
Models based on Long Window
Calculate SMRs for Short Windows From
SMRs for Long Windows
band SMRs for long window
band SMRs for short window
4. Window Switch–Window Grouping
Calculate the Scale Factor Bit allocation module calculate the scale
factor for each band.
Error of Scale Factors
Criterion Minimizes the Grouping Number
Eg in each group should be smaller than a threshold M
b gw
bbgwbg bandwidthsharedsfsfE ,,
5. Temporal Noise Shaping
Filterbank
Bit
-Str
eam
Pack
ing
TNS
M/S
Psychoacoustic
Model
Audio in
Bit Allocation
W-Switch
Quantization
Bit Reservoir
VLC
5. TNS
Three Artifacts
Error Amplification at
Attack periods
Time-Aliasing
TNS order vs Error.
Design Issues ?
Detection Mechanism
TNS Design
5. TNS
Remarks
Pre-aliasing leads to the tradeoff with Pre-echo
Post-aliasing may be masked by post-aliasing
5. TNS-- Ease Aliasing Artifacts
Combining with Window Switch
Long Start and Long Stop window
6. Experiments
Psychoacoustic Model
M/S Coding
Window Switch
TNS
Overall
6. Experiments-- Test Samples
Track Time Signal description
1 10 es01 vocal (Suzan Vega)
Speech signal2 8 es02 German speech
3 7 es03 English speech
4 10 sc01 Trumpet solo and orchestra
Complex sound mixtures5 12 sc02 Orchestral piece
6 11 sc03 Contemporary pop music
7 7 si01 Harpsichord
Single instruments8 7 si02 Castanets
9 27 si03 pitch pipe
10 11 sm01 Bagpipes
Simple sound mixtures11 10 sm02 Glockenspiel
12 13 sm03 Plucked strings
6. Experiments–Psychoacoustic Model
Intel vTune 7.0
Psychoacoustic Models
P1: Psychoacoustic Model II
P4: MDCT Psychoacoustic Model
Speed up 72.58% over P1
1 2 3 4 5 Average Speedup (%)
P1 30.24 29.66 29.75 29.96 27.75 29.47 72.58
P4 8.57 8.94 8.00 7.31 7.59 8.08
6. Experiments--Psychoacoustic Model
Speed up 14.59% over P1
Tracks Length P1 P4Percentage
(%)
es01 02:51 26 19 26.92
es02 02:17 19 14 26.32
es03 04:03 36 27 25.00
sc01 02:55 22 18 18.18
sc02 03:23 28 23 17.86
sc03 03:04 27 23 14.81
si01 04:47 39 36 7.69
si02 03:05 30 26 13.33
si03 05:34 49 45 8.16
sm01 04:27 38 35 7.89
sm02 02:01 18 16 11.11
sm03 04:11 38 34 10.53
Average 30.8 26.3 14.59
6. Experiments--Psychoacoustic Model
Category Result
P4 gets better quality than P1 in speech signal,
single instrument and simple sound mixtures
For complex sound mixtures, only sc02 is worse
than P1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03
P1
P4
6. Experiments– M/S Coding
Environment
Disable bit reservoir,
window switch and
TNS
Uses P4
Improve 0.39 of average ODG
Coding Mode L/R New M/S
es01 -1.57 -0.82
es02 -2.03 -0.55
es03 -2.21 -0.84
sc01 -0.74 -0.54
sc02 -1.11 -0.83
sc03 -0.7 -0.52
si01 -1.16 -1.05
si02 -3.24 -3.01
si03 -1.29 -1.21
sm01 -0.9 -0.93
sm02 -1.54 -1.4
sm03 -1.37 -1.5
Average -1.4883 -1.1
6. Experiments– Window Switch
Coupling Method
Average ODGs of with and without coupling method
are −0.7025 and −0.8483
Bit Rate=128Kbps, Sample Rate=44.1kHz, with Short Window and M/S
-1.6
-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
es01
es02
es03
sc01
sc02
sc03
si01
si02
si03
sm01
sm02
sm03
Averag
e
OD
G
NCTU_AAC without Coupling Method NCTU_AAC with Coupling Method
6. Experiments– TNS
Easing Aliasing Method
Improve quality except sm01
Especially for si02
6. Experiments– Overall
Commercial Encoders Nero 6.3
QuickTime 6.3
Result NCTU-AAC has better
quality in all tracks as compared to Nero 6.3
NCTU-AAC has better quality in 7 tracks as compared to QuickTime 6.3
NCTU-AAC performs better than these two encoders in average
Nero 6.3 QuickTime 6.3 NCTU-AAC
es01 -0.6 -0.32 -0.27
es02 -0.45 -0.11 -0.15
es03 -0.51 0.02 -0.23
sc01 -0.88 -0.22 -0.45
sc02 -1.38 -0.84 -0.66
sc03 -0.84 -0.64 -0.4
si01 -1.32 -0.71 -0.62
si02 -0.82 -0.72 -0.54
si03 -1.59 -0.78 -0.98
sm01 -1.36 -0.75 -0.61
sm02 -0.72 -0.37 -0.53
sm03 -1.29 -0.73 -0.62
Average -0.98 -0.51417 -0.505
Encoders with Audio Patch Method
Nero 6.3Nero6.3
+APM
QuickTime
6.3
QT6.3
+APMNCTU-AAC
NCTU-
AAC
+APM
es01 -0.6 -0.38 -0.32 -0.26 -0.27 -0.28
es02 -0.45 -0.44 -0.11 -0.18 -0.15 -0.14
es03 -0.51 -0.43 0.02 -0.02 -0.23 -0.24
sc01 -0.88 -0.73 -0.22 -0.21 -0.45 -0.43
sc02 -1.38 -0.70 -0.84 -0.43 -0.66 -0.51
sc03 -0.84 -0.40 -0.64 -0.32 -0.4 -0.37
si01 -1.32 -0.52 -0.71 -0.47 -0.62 -0.43
si02 -0.82 -0.63 -0.72 -0.55 -0.54 -0.53
si03 -1.59 -0.64 -0.78 -0.43 -0.98 -0.51
sm01 -1.36 -0.83 -0.75 -0.53 -0.61 -0.46
sm02 -0.72 -0.73 -0.37 -0.38 -0.53 -0.54
sm03 -1.29 -0.55 -0.73 -0.35 -0.62 -0.42
Average -0.98 -0.5817 -0.51417 -0.34417 -0.505 -0.4050
QuickTime 6.3 with APM gets the best quality in average
Conclusion
Quality and Efficiency
Efficient Psychoacoustic Model DCT-based Approach.
Tonal Attack bands and Tone-Rich Signals.
M/S Coding Efficient decision method.
Psychoacoustic model for M/S channels.
Viterbi algorithm.
Window Switch Switch Detection.
New grouping method.
Psychoacosutic Model for Short Window.
Conclusion
TNS Window Detection
New window switch policy
Bit Allocation Single Loop Approach
Bit Reservoir Two-Step Approach.
Filter bank Fast DCT method
Audio Patch Method Zero band and High frequency extension.
5. NCTU- AAC CODEC
Filterbank
Bit
-Str
eam
Pack
ing
TNS
M/S
Psychoacoustic
Model
Audio in
Bit Allocation
W-Switch
Quantization
Bit Reservoir
VLC
Patch-
Enable
Decoder
Effect
5. NCTU- AAC CODEC (Patents)
Filterbank
Bit
-Str
eam
Pack
ing
TNS
M/S
Psychoacoustic
Model
Audio in
Bit Allocation
W-Switch
Quantization
Bit Reservoir
VLC
Patch-
Enable
Decoder
Effect
SC03 Original
SC03 QT 6.3
QT6.3 Nero
6.3
Lame 3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
-0.64
SC03 Nero 6.3
QT6.3 Nero
6.3
Lame 3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
-0.64 -0.84
SC03 Lame 3.88
QT6.3 Nero
6.3
Lame 3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
-0.64 -0.84 -1.16
SC03 NCTU-AAC
QT6.3 Nero
6.3
Lame 3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
-0.64 -0.84 -1.16 -0.4
SC03 NCTU-MP3
QT6.3 Nero
6.3
Lame 3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
-0.64 -0.84 -1.16 -0.4 -0.91
SC03 QT 6.3+APM
QT6.3 Nero
6.3
Lame 3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
-0.64 -0.84 -1.16 -0.4 -0.91 -0.32
SC03 Nero 6.3+APM
QT6.3 Nero
6.3
Lame 3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4
SC03 NCTU-AAC+APM
QT6.3 Nero
6.3
Lame 3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37
SC03 NCTU-MP3+APM
QT6.3 Nero
6.3
Lame 3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37 -0.38
SC03 Lame 3.88 + APM
QT6.3 Nero
6.3
Lame 3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
-0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37 -0.38 -0.41
Questions