Digital Speech Processing Digital Speech Processing— Lecture 16 Lecture 16 Lecture 16 Lecture 16 Speech Coding Methods Speech Coding Methods Based on Speech Waveform Based on Speech Waveform Representations and Representations and Speech Models Speech Models—Adaptive Adaptive 1 and Differential Coding and Differential Coding
91
Embed
Digital Speech ProcessingDigital Speech Processing ... · • advantage of feedback adaptation is that neither Δ[n] nor G[n] needs to be transmitted to the decoder since they can
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Digital Speech ProcessingDigital Speech Processing——Lecture 16Lecture 16Lecture 16Lecture 16
Speech Coding Methods Speech Coding Methods Based on Speech Waveform Based on Speech Waveform pp
Representations and Representations and Speech ModelsSpeech Models——Adaptive Adaptive
Speech Waveform CodingSpeech Waveform Coding--Summary of Summary of P t 1P t 1Part 1Part 1
1. Probability density function for speech samples
21 102 2
| |
( ) ( )σ
σ σ
−
= =x
x
x x
p x e pGamma
1 2 323 0
8
/
( ) ( )| |
σ
πσ
−⎡ ⎤= = ∞⎢ ⎥⎢ ⎥⎣ ⎦
x
x
x
p x e px
Laplacian
2. Coding paradigms
• uniform -- divide interval from +X to –X into 2B intervals of length Δuniform -- divide interval from +Xmax to Xmax into 2 intervals of length Δ=(2Xmax/2B) for a B-bit quantizer
ΔΔ Δ Δ Δ Δ Δ
2
ΔΔ Δ Δ Δ Δ Δ
-Xmax=4σx +Xmax=4σx
Speech Waveform CodingSpeech Waveform Coding--Summary of Summary of Part 1Part 1 p(e)Part 1Part 1
6 4 77 20 10 max
ˆ( ) ( ) ( )
. log
sensitivity to / ( varies a lot!!!)σ
σ σ
= +
⎡ ⎤= + − ⎢ ⎥
⎣ ⎦i
x
x n x n e n
XSNR B
X
1/Δ
p(e)
20 10
max
max max
sensitivity to / ( varies a lot!!!) not great use of bits for actual speech densities!
max insensitive to / over a wide range for large σ μ⎣ ⎦
i xX
• maximum SNR coding — match signal quantization intervals to model probability distribution (Gamma, Laplacian)
• interesting—at least theoretically
5
Adaptive QuantizationAdaptive QuantizationAdaptive QuantizationAdaptive Quantization• linear quantization => SNR depends on σx being
t t (thi i l l t th )constant (this is clearly not the case)• instantaneous companding => SNR only weakly
dependent on Xmax/σx for large μ-law compression (100-500)500)
• optimum SNR => minimize σe2 when σx
2 is known, non-uniform distribution of quantization levels
QQuantization dilemma: want to choose quantization step size large enough to accomodate maximum peak-to-peak range of x(n); at the same time need to make the quantization step size small so as to minimize thequantization step size small so as to minimize the quantization error– the non-stationary nature of speech (variability across sounds,
speakers, backgrounds) compounds this problem greatly
6
p , g ) p p g y
Solutions to Quantization DilemnaSolutions to Quantization DilemnaSolutions to Quantization DilemnaSolutions to Quantization Dilemna
• Solution 1 - let Δ vary to Adaptive Quantization:
ymatch the variance of the input signal => Δ(n)
• Solution 2 - use aSolution 2 use a variable gain, G(n),followed by a fixed quantizer step size Δ =>quantizer step size, Δ > keep signal variance of y(n)=G(n)x(n) constant
Case 1: Δ(n) proportional to σx => quantization levels and ranges would be linearly scaled to match σx
2 => need to reliably estimate σx2
Case 2 G(n) proportional to 1/ σ to gi e σ 2 constant
7
Case 2: G(n) proportional to 1/ σx to give σy2 ≈ constant
• need reliable estimate of σx2 for both types of adaptive quantization
Types of Adaptive QuantizationTypes of Adaptive QuantizationTypes of Adaptive QuantizationTypes of Adaptive Quantization• instantaneous-amplitude changes reflect sample-
[n] is q anti ed sing Δ[n] >• x[n] is quantized using Δ[n] => c[n] and Δ[n] need to be transmitted to the decoder
• if c’[n]=c[n] and Δ’[n]=Δ[n] => no
ˆ ˆ[ [ [ ]′ =x n x n
• if c [n]=c[n] and Δ [n]=Δ[n] => no errors in channel, and
Don’t have x[n] at the decoder to estimate Δ[n] => need to transmit Δ[n]; this is
9
Don t have x[n] at the decoder to estimate Δ[n] => need to transmit Δ[n]; this is a major drawback of feed forward adaptation
FeedFeed--Forward QuantizerForward Quantizer
Can’t estimate G[n] at
10
Can t estimate G[n] at the decoder => it has to
be transmitted
time varying gain, G[n] => c[n] and G[n] need to be transmitted to the decoder
Feed Forward QuantizersFeed Forward QuantizersFeed Forward QuantizersFeed Forward Quantizers• feed forward systems make estimates of σx
2 , then make Δ or the quantization levels proportional to σ or the gain is inverselyquantization levels proportional to σx , or the gain is inversely proportional to σx
Slowly Adapting Gain ControlSlowly Adapting Gain Control2 2 2[ ] [ 1] [ 1](1 )2 2 2
11 2
[ ] [ 1] [ 1](1 )
[ ](1 )n
n m
m
n n x n
x m
σ ασ α
α α−
− −
= ∞
= − + − −
= −∑0 99.α =
m=−∞
02
[ ][ ]σ
=GG n
n[ ] [ ] [ ]=y n G n x n
[ ]
{ }ˆ[ ] [ ] [ ]Δ=y n Q G n x n
20[ ] [ ]σΔ = Δn n
{ }[ ]ˆ[ ] [ ]Δ= nx n Q x nmin max[ ]≤ ≤G G n G
12
min max
min max[ ]Δ ≤ Δ ≤ Δnα =0.99 => brings up level in low amplitude regions => time constant of 100 samples (12.5 msec for 8 kHz sampling rate) => syllabic rate
Rapidly Adapting Gain ControlRapidly Adapting Gain Control2 2 2[ ] [ 1] [ 1](1 )2 2 2
11 2
[ ] [ 1] [ 1](1 )
[ ](1 )n
n m
m
n n x n
x m
σ ασ α
α α−
− −
= ∞
= − + − −
= −∑0 9.α =
m=−∞
02
[ ][ ]σ
=GG n
n[ ] [ ] [ ]=y n G n x n
[ ]
{ }ˆ[ ] [ ] [ ]Δ=y n Q G n x n2
0[ ] [ ]σΔ = Δn n{ }ˆ[ ] [ ]=x n Q x n{ }[ ][ ] [ ]Δ= nx n Q x n
min max[ ]≤ ≤G G n G α = 0.9 => system reacts to amplitude variations more
13
min max
min max[ ]Δ ≤ Δ ≤ Δny p
rapidly => provides better approximation to σy2 =
constant => time constant of 9 samples (1 msec at 8 kHz) for change => instantaneous rate
Feed Forward QuantizersFeed Forward QuantizersFeed Forward QuantizersFeed Forward Quantizers• Δ[n] and G[n] vary slowly compared to x[n]
– they must be sampled and transmitted as part of the waveform coder parameters
– rate of sampling depends on the bandwidth of the lowpass filter, h[n]—for α = 0.99, the rate is about 13 Hz; for α = 0.9, the rate is about 135 Hz
it is reasonable to place limits on the variation of [ ] or [ ], of the form• Δ n G n
2
min max
min max
[ ][ ]
for obtaining constant over a 40 dB range in signal levels
2 ˆ [ ] based only on past values of [ ]two typical windows/filters areσ
=−∞ =
••
∑ ∑m m
n x n
1 10
two typical windows/filters are1. [ ] α −= ≥
=
nh n notherwise
1
2 1 10
. [ ] /= ≤ ≤=
h n M n Motherwise
12 21 ˆ[ ] [ ]σ
−
= −
=
•
∑n
m n Mn x m
Mcan use very short window lengths (e g 2) to achieve=M
17
• can use very short window lengths (e.g., 2) to achieve12 dB SNR for a 3 bit quantizer
==
MB
Alternative Approach to AdaptationAlternative Approach to AdaptationAlternative Approach to AdaptationAlternative Approach to Adaptation
1 2 3 411
[ ] [ ]; { , , , }| [ ] |
Δ = ⋅Δ − =
∝ −
n P n P P P P PP c n
[ ]2
1 1
| [ ] |[ ] [ ]
ˆ( ) [ ] [ ]
[ ] only depends on [ ] and [ ]
Δ= + Δ
• Δ Δ − −
n sign c nx n n c n
n n c n[ ] o y depe ds o [ ] a d [ ] only need to transmit codewords
also necessary to impose the limits[
=>
•Δ ≤ Δ
c
] ≤ Δnmin [Δ ≤ Δ max
max min
] the ratio / controls the dynamic
range of the quantizer
≤ Δ
• Δ Δ
n
18
Adaptation GainAdaptation GainAdaptation GainAdaptation Gain• key issue is how should P vary withkey issue is how should P vary with
|c[n-1]|– if (c[n-1[ is either largest positive or largest ( [ [ g p g
negative codeword, then quantizer is overloaded and the quantizer step size is too small => P > 1small => P4 > 1
– if c[n-1] is either smallest positive or negative codeword, then quantization error is too largecodeword, then quantization error is too large => P1 < 1
– need choices for P2 and P3
19
Adaptation GainAdaptation GainAdaptation GainAdaptation Gain
1 2 | [ 1] |+ −=
c nQ2 1
=−BQ
• shaded area is variation in range of P values due to different speech sounds or pdifferent B values
Can see that step size increases (P>1) are
20
Can see that step size increases (P 1) are more vigorous than step size decreases (P<1) since signal growth needs to be kept within quantizer range to avoid ‘overloads’
• optimal values of P• optimal values of P for B=2,3,4,5
• improvements in SNR
• 4-7 dB improvement over μ-law
21
• 2-4 dB improvement over non-adaptive optimum quantizers
Quantization of Speech Model Quantization of Speech Model PPParametersParameters
• Excitation and vocal tract (linear system) are characterized by sets of parameters which can be estimated from a speech signal by LP or cepstral processing.
22
• We can use the set of estimated parameters to synthesize an approximation to the speech signal whose quality depends of a range of factors.
Quantization of Speech Model Quantization of Speech Model PPParametersParameters
• Quality and data rate of synthesis dependsQuality and data rate of synthesis depends on:
the ability of the model to represent speech– the ability of the model to represent speech– the ability to reliably and accurately estimate
the parameters of the modelthe parameters of the model– the ability to quantize the parameters in order
to obtain a low data rate digital representationto obtain a low data rate digital representation that will yield a high quality reproduction of the speech signal
23
ClosedClosed--Loop and OpenLoop and Open--Loop Loop S h C dS h C dSpeech CodersSpeech Coders
ClosedClosed--looploop – used in a feedback loop where the synthetic speech output is compared to the input signal and the resultingsignal, and the resulting difference used to determine the excitation for the vocal tract model.
OpenOpen--looploop – the parameters of the model are estimated directlyare estimated directly from the speech signal with no feedback as to the quality of the
lti th ti
24
resulting synthetic speech.
Scalar QuantizationScalar QuantizationScalar QuantizationScalar Quantization• Scalar quantization – treat each model parameter
t l d ti i fi d b f bitseparately and quantize using a fixed number of bits– need to measure (estimate) statistics of each parameter, i.e.,
mean, variance, minimum/maximum value, pdf, etc.each parameter has a different quantizer with a different number– each parameter has a different quantizer with a different number of bits allocated
• Example of scalar quantization– pitch period typically ranges from 20-150 samples (at 8 kHzpitch period typically ranges from 20 150 samples (at 8 kHz
sampling rate) => need about 128 values (7-bits) uniformly over the range of pitch periods, including value of zero for unvoiced/backgroundamplitude parameter might be quantized with a μ law quantizer– amplitude parameter might be quantized with a μ-law quantizer using 4-5 bits per sample
– using a frame rate of 100 frames/sec, you would need about 700 bps for pitch period and 400-500 bps for amplitude
• Each PARCOR coefficient transformed to range: –π/2<sin-1(ki)<π/2 and then quantized with boththen quantized with both a 4-bit and a 3-bit uniform quantizer.
• Total rate of quantized• Total rate of quantized representation of speech about 5000 bps.
26
Techniques of Vector Techniques of Vector QuantizationQuantization
(b) Single element codebook with cluster centroid (0-bit codebook)
(c) Two element codebook with two cluster centers (1-bit codebook)
(d) Four element codebook ith f l t twith four cluster centers
(2-bit codebook)
(e) Eight element codebook ith i ht l t t
30
with eight cluster centers (3-bit codebook)
Toy Example of VQ CodingToy Example of VQ CodingToy Example of VQ CodingToy Example of VQ Coding• 2-pole model of the vocal tract => 4 reflection coefficients
1. Scalar Quantization -assume 4 values for each reflection coefficient => 2-bits x 4 coefficients = 8 bits/frame
2. Vector Quantization -only 4 possible vectors => 2-bits to choose which of the 4 vectors to use for each frame (pointer into a codebook)
• this works because the scalar components of each vector are highly correlated
31
• if scalar components are independent => VQ offers no advantage over scalar quantization
Elements of a VQ ImplementationElements of a VQ ImplementationElements of a VQ ImplementationElements of a VQ Implementation
1. A large training set of analysis vectors; X {X X X } L h ld b h l th th iX={X1,X2,…,XL}, L should be much larger than the size of the codebook, M, i.e., 10-100 times the size of M.
2. A measure of distance, dij=d(Xi,Xj), between a pair of l i t b th f l t i th t i i t
j janalysis vectors, both for clustering the training set as well as for classifying test set vectors into unique codebook entries.
3 A centroid computation procedure and a centroid3. A centroid computation procedure and a centroid splitting procedure.
4. A classification procedure for arbitrary analysis vectors that chooses the codebook vector closest in distancethat chooses the codebook vector closest in distance to the input vector, providing the codebook index of the resulting nearest codebook vector.
The VQ Training SetThe VQ Training SetThe VQ Training SetThe VQ Training Set• The VQ training set of L≥10M vectors shouldThe VQ training set of L≥10M vectors should
span the anticipated range of:– talkers, ranging in age, accent, gender, speaking rate,
speaking levels, etc.– speaking conditions, range from quiet rooms, to
automobiles to noisy work placesautomobiles, to noisy work places– transducers and transmission systems, including a
range of microphones, telephone handsets, g p pcellphones, speakerphones, etc.
– speech, including carefully recorded material, conversational speech telephone queries etc
34
conversational speech, telephone queries, etc.
The VQ Distance MeasureThe VQ Distance Measure• The VQ distance measure depends critically
on the nature of the analysis vector, X.y– If X is a log spectral vector, then a possible
distance measure would be an Lp log spectral distance of the form:distance, of the form:
1/
1( , ) | |
pRk k p
i j i jk
d X X x x⎡ ⎤= −⎢ ⎥⎣ ⎦∑
1k=⎣ ⎦
• If X is a cepstral vector, then the distance measure might well be a cepstral distance ofmeasure might well be a cepstral distance of the form:
1/22( ) ( )
Rk kd X X x x⎡ ⎤
⎢ ⎥∑35
1( , ) ( )i j i j
kd X X x x
=
= −⎢ ⎥⎣ ⎦∑
Clustering Training VectorsClustering Training VectorsClustering Training VectorsClustering Training Vectors• Goal is to cluster the set of L training vectors into a set
f M d b k t i li d Ll dof M codebook vectors using generalized Lloyd algorithm (also known as the K-means clustering algorithm) with the following steps:1 Initialization arbitrarily choose M vectors (initially out of the1. Initialization – arbitrarily choose M vectors (initially out of the
training set of L vectors) as the initial set of codewords in the codebook
2. Nearest Neighbor Search – for each training vector, find the ( )codeword in the current codebook that is closest (in distance)
and assign that vector to the corresponding cell3. Centroid Update – update the codeword in each cell to the
centroid of all the training vectors assigned to that cell in the g gcurrent iteration
4. Iteration – repeat steps 2 and 3 until the average distance between centroids at successive iterations falls below a preset threshold
36
threshold
Clustering Training VectorsClustering Training VectorsClustering Training VectorsClustering Training Vectors
VoronoiVoronoi regions and
centroidscentroids
37
Centroid ComputationCentroid ComputationAssume we have a set of vectorsV
1 2{ , ,..., }.
Assume we have a set of vectors, where all vectors are assigned to cluster
C C C CV
VX X X X
V C=
i
The centroid of the set is defined as the vector that minimizes the average distortion,
CXYi
1 i.e.,
V
1
1min ( , )
The solution for the centroid is highly dependent
VCiY i
Y d X YV =
= ∑ion the choice of distance measure. When both and are measured in a -dimensional space withthe norm the cen
CiX
Y KL troid is the mean of the vector set2the norm, the cenL
1
1troid is the mean of the vector set
V
Ci
iY X
V =
= ∑
381 When using an distance measure, the centroid is the
median vector of the set of vectors assigned to the given class.Li
The classification procedure for arbitrary test set vectors i p yis a full search through the codebook to find the "best" (minimumdistance) match. If we denote the codebook vectors of an -vector codeboMi
1(
okas for , and we denote the vector to be classified
d t ti d) th th i d f th b tiCB i M
X i≤ ≤
(
arg min ( )
and vector quantized) as , then the index, , of the bestcodebook entry is:
X i
i d X CB
∗
∗ =1
arg min ( , ) ii Mi d X CB
≤ ≤∗
39
Binary Split Codebook DesignBinary Split Codebook Designy p gy p g1. Design a 1-vector codebook; the single vector in the
codebook is the centroid of the entire set of training vectorsvectors
2. Double the size of the codebook by splitting each current codebook vector, Ym, according to the rule:
(1 )
(1 )m m
m m
Y Y
Y Y
ε
ε
+
−
= +
= −
where m varies from 1 to the size of the current codebook, and epsilon is a splitting parameter (0.01 typically)typically)
3. Use the K-means clustering algorithm to get the best set of centroids for the split codebook
40
4. Iterate steps 2 and 3 until a codebook of size M is designed.
Differential QuantizationDifferential QuantizationDifferential QuantizationDifferential Quantization• we have carried instantaneous quantization of q
x[n] as far as possible• time to consider correlations between speech
l d i i diff i lsamples separated in time => differential quantization
• high correlation values => signal does not• high correlation values => signal does not change rapidly in time => difference between adjacent samples should have lower variance than the signal itselfdifferential quantization can increase SNR at a given
47
bit rate, or lower bit rate for a given SNR
Example of Difference SignalExample of Difference SignalExample of Difference SignalExample of Difference Signal
Differential QuantizationDifferential QuantizationDifferential QuantizationDifferential Quantization• difference signal, d[n], is quantized - not x[n]• quantizer can be fixed, or adaptive, uniform or non-uniform• quantizer parameters are adjusted to match the variance of d[n]
ˆ[ ] [ ] [ ] quantized input has same quantization
= + −
= + −= + −
d n d n e n e n
x n x n d n x dx n x n e n [ ] [ ] [ ] quantized input has same quantization
error as the difference signal if σ= + −
=> d
x n x n e n22 , error is smaller
independent of predictor, , quantized [ ] differs from i d [ ] b [ ] h i i f h diff i l!
σ<•
x
P x nunquantized [ ] by [ ], the quantization error of the difference signal!
good prediction gives lower quantization =>x n e n
error than quantizing input directly
52
Differential QuantizationDifferential QuantizationDifferential QuantizationDifferential Quantization• quantized difference signal is encoded into c(n)quantized difference signal is encoded into c(n)
• first reconstruct the quantized difference signal from the decoder codeword, c’[n] and the step size Δ
53
• next reconstruct the quantized input signal using the same predictor, P, as used in the encoder
SNR for Differential QuantizationSNR for Differential QuantizationSNR for Differential QuantizationSNR for Differential Quantization the SNR of the differential coding system is•
2 2
22
[ ]
[ ]σσ
⎡ ⎤⎣ ⎦= =⎡ ⎤⎣ ⎦
x
e
E x nSNR
E e n
22
2 2
σσσ σ
⎣ ⎦
= ⋅ = ⋅dxP Q
d e
SNR G SNR
2
where
signal to quantizing noise ratio of the quantizerσ
•
= =
d e
dSNR 2
2
signal-to-quantizing-noise ratio of the quantizer
σ
σ
= =
=
Qe
xP
SNR
G 2 gain due to differential quantization=
54
P 2 g qσd
SNR for Differential QuantizationSNR for Differential QuantizationSNR for Differential QuantizationSNR for Differential Quantization
depends on chosen quantizer and can be• QSNR depends on chosen quantizer and can be maximized using all of the previous quantization methods (uniform non uniform optimal)
QSNR
methods (uniform, non-uniform, optimal), hopefully, 1, is the gain in due to
differential coding• >PG SNRdifferential coding
w•2
ant to choose the predictor, , to P2
2
maximize since is fixed, then we
need to minimize , i.e., design the best predictor
σ⇒P x
d
G
σ P
55
Predictor for Differential Predictor for Differential Q i iQ i iQuantizationQuantization
consider class of linear predictors,
ˆ [ ] [ ]α
•
= −∑p
k
P
x n x n k1
[ ] [ ]
[ ] is a linear combination of previous quantized values of [ ]the predictor -transform is
=
••
∑ kk
x n x nz
11
the predictor transform is
( ) ( ) --α −
=
•
= = −∑p
kk
k
z
P z z A z predictor system function
1 with predictor impulse response coefficients (FIR filter)
[ ] α•
= ≤ ≤kp n k p
56
0= otherwise
Predictor for Differential Predictor for Differential Q i iQ i iQuantizationQuantization
ˆthe reconstructed signal is the output [ ] of a systemx n
1 1 1
the reconstructed signal is the output, [ ], of a system with system function
ˆ ( )( )
• x n
X zH z
1
11 ( ) ˆ ( ) ( )( )
where the input to the system is the quantized difference
α −
=
= = = =−−
•
∑p
kk
k
H zP z A zD zz
where the input to the system is the quantized difference
sig ˆnal, [ ]
ˆ ˆ ˆ[ ] [ ] [ ]α= − −∑p
k
d n
d n x n x n k1
[ ] [ ] [ ]
where
ˆ[ ] [ ]
α=
•
∑
∑
kk
p
d n x n x n k
k
571
[ ] [ ]α=
− =∑ kk
x n k x n
Predictor for Differential Predictor for Differential Q i iQ i iQuantizationQuantization
2
( )
2
22 2
to solve for optimum predictor, need expression for
[ ] [ ] [ ]
σ
σ
•
⎡ ⎤⎡ ⎤= = −⎣ ⎦ ⎣ ⎦
d
d E d n E x n x n
2
ˆ[ ] [ ]α
⎣ ⎦ ⎣ ⎦⎡ ⎤⎡ ⎤⎢ ⎥= − −⎢ ⎥⎢ ⎥⎣ ⎦⎣ ⎦
∑p
kE x n x n k1
2
[ ] [ ] [ ]α α
=⎢ ⎥⎣ ⎦⎣ ⎦⎡ ⎤⎛ ⎞⎢ ⎥= − − − −⎜ ⎟∑ ∑
k
p p
E x n x n k e n k1 1
[ ] [ ] [ ]
ˆ( [ ] [ ] [ ])
α α= =
⎢ ⎥= ⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦
= +
∑ ∑k kk k
E x n x n k e n k
x n x n e n
58
Solution for Optimum PredictorSolution for Optimum PredictorSolution for Optimum PredictorSolution for Optimum Predictor
21want to choose { } to minimizeα σ• ≤ ≤ =>j p2
2
1 want to choose { }, , to minimize
differentiate wrt , set derivatives to zero, giving
α σ
σ α
σ
• ≤ ≤ =>
⎡ ⎤⎛ ⎞∂
j d
d j
p
j p
( ) ( )1
2
0 1
[ ] [ ] [ ] [ ] [ ]σα
α =
⎡ ⎤⎛ ⎞∂= − − − + − ⋅ − + −⎢ ⎥⎜ ⎟∂ ⎝ ⎠⎣ ⎦= ≤ ≤
∑dk
kj
E x n x n k e n k x n j e n j
j p which can be wri•
( )2
0 1
tten in the more compact formˆ ˆ [ ] [ ] [ ] [ ] [ ] ,⎡ ⎤− − = ⋅ − = ≤ ≤⎡ ⎤⎣ ⎦⎣ ⎦E x n x n x n j E d n x n j j p
2 the predictor coefficients that minimize are the ones that make the difference signal, [ ], be uncorrelated w
σ• d
d n1
ith past ˆvalues of the predictor input [ ]− ≤ ≤x n j j p
59
1values of the predictor input, [ ], ≤ ≤x n j j p
Solution for AlphasSolution for AlphasSolution for AlphasSolution for Alphas( ) 0 1ˆ ˆ [ ] [ ] [ ] [ ] [ ] ,
b i ti f diff ti l di
⎡ ⎤− − = ⋅ − = ≤ ≤⎡ ⎤⎣ ⎦⎣ ⎦E x n x n x n j E d n x n j j p
basic equations of differential codingˆ[ ] [ ] [ ] quantization of difference signalˆ[ ] [ ] [ ] error same for original signal
•
= += +
d n d n e nx n x n e n[ ] [ ] [ ] error same for original signalˆ[ ] [ ]
= +
= +
x n x n e n
x n x n ˆ[ ] feedback loop for signal
ˆ[ ] [ ] prediction loop based on quantized inputα∑p
d n
x n x n k1
[ ] [ ] prediction loop based on quantized input
ˆ ˆ ˆ[ ] [ ] [ ] direct substitution
α
α
=
= −
= − −
∑
∑
kk
p
k
x n x n k
d n x n x n k
[ ]
1
ˆ[ ] [ ] [ ]
ˆ[ ] [ ] [ ] [ ]α α
=
− = − + −
+
∑
∑ ∑
k
p p
x n j x n j e n j
x n x n k x n k e n k
60
[ ]1 1
[ ] [ ] [ ] [ ]α α= =
= − = − + −∑ ∑k kk k
x n x n k x n k e n k
Solution for AlphasSolution for AlphasSolution for AlphasSolution for Alphas( ) ˆ ˆ [ ] [ ] [ ] [ ] [ ] 0, 1
ˆ[ ] [ ] [ ]
⎡ ⎤− − = ⋅ − = ≤ ≤⎡ ⎤⎣ ⎦⎣ ⎦= +
E x n x n x n j E d n x n j j p
x n j x n j e n j
[ ]1 1
[ ] [ ] [ ]
ˆ[ ] [ ] [ ] [ ]α α= =
− = − + −
= − = − + −∑ ∑p p
k kk k
x n j x n j e n j
x n x n k x n k e n k
[ ] [ ] [ ] [ ]
[ ]
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]
[ ] [ ] [ ]α
− + − − − − −
⎡ ⎤= − + − −⎢ ⎥
⎣ ⎦∑
p
k
E x n x n j E x n e n j E x n x n j E x n e n j
E x n k e n k x n j +[ ]1=
⎢ ⎥⎣ ⎦∑ kk
[ ]1
[ ] [ ] [ ]α=
⎡ ⎤− + − −⎢ ⎥
⎣ ⎦∑
p
kk
E x n k e n k e n j
[ ] [ ]
1
1 1[ ] [ ] [ ] [ ]α α
=
= =
⎣ ⎦
= − − + − − +∑ ∑
k
p p
k kk k
E x n k x n j E e n k x n j
61[ ] [ ]
1 1[ ] [ ] [ ] [ ]α α
= =
− − + − −∑ ∑p p
k kk k
E x n k e n j E e n k e n j
Solution for Optimum PredictorSolution for Optimum PredictorSolution for Optimum PredictorSolution for Optimum Predictor solution for - first expand terms to giveα• k
[ ] [ ] [ ]1
[ ] [ ] [ ] [ ] [ ] [ ]α=
− ⋅ + − ⋅ = − ⋅ −∑k
p
kk
E x n j x n E e n j x n E x n j x n k
[ ] [ ]
[ ]1 1
[ ] [ ] [ ] [ ]α α= =
+ − ⋅ − + − ⋅ −∑ ∑
∑
p p
k kk kp
E e n j x n k E x n j e n k
[ ]1
1[ ] [ ] ,
assume fine quantization so th
α=
+ − ⋅ − ≤ ≤
•
∑ kk
E e n j e n k j p
at [ ] is uncorrelated with [ ], ande n x n
[ ][ ] 2
0[ ] is stationary white noise (zero mean), giving
[ ] [ ] , ,− ⋅ − = ∀
e nE x n j e n k n j k
62
[ ] 2[ ] [ ] [ ]σ δ− ⋅ − = ⋅ −eE e n j e n k j k
Solution for Optimum PredictorSolution for Optimum Predictor
2 1
we can now simplify solution to form
[ ] [ ] [ ]φ φ δ
•
⎡ ⎤ ≤ ≤∑p
j j k j k j2
1
2
1 [ ] [ ] [ ] ,
where [ ] is the autocorrelation of [ ]. Defining terms
where is a Toeplitz matrix can be computed via well understood
σ−• =>
x e
C C
2 2
numerical methods the problem here is that depends on / , but depends on
coefficients of the predictoσ σ
α• = x e
k
C SNR SNRr, which depend on bit of a dilemma=>SNR
64
coe c e ts o t e p ed ctoαk , c depe d o b t o a d e aS
Solution for Optimum PredictorSolution for Optimum PredictorSolution for Optimum PredictorSolution for Optimum Predictor
1
1 special case of , where we can solve directly for of this first order linear predictor, asα• =p
11
1 1[ ]
/ρα =
+ SNR1 1 1 can see that [ ]
we will look further at this special case laterα ρ• < <
•
65
Solution for Optimum PredictorSolution for Optimum Predictor
( ) ( )2
in spite of problems in solving for optimum predictor coefficients, we can solve for the prediction gain, , in terms of the coefficients, as
[ ] [ ] [ ] [ ]
α
σ
•
⎡ ⎤= − ⋅ −⎣ ⎦
P jG
E x n x n x n x n( ) ( )( )
[ ] [ ] [ ] [ ]
[ ] [ ] [ ]
σ ⎡ ⎤= ⎣ ⎦⎡= − ⋅
d E x n x n x n x n
E x n x n x n ( )( )
[ ] [ ] [ ]
where the term [ ] [ ] is the prediction error; we can show that the d t i th i b i i th di ti i
⎤ ⎡ ⎤− − ⋅⎣ ⎦ ⎣ ⎦• −
E x n x n x n
x n x nsecond term in the expression above is zero, i.e., the prediction error is uncorrelated with the pred
( )2
iction value; thus [ ] [ ] [ ]σ ⎡ ⎤= − ⋅⎣ ⎦
⎡ ⎤
d E x n x n x n
( )2
1[ ] [ ] [ ]) [ ]
assuming uncorrelated signal and noise, we get
α=
⎡ ⎤⎡ ⎤= − − + − ⋅⎢ ⎥⎣ ⎦ ⎣ ⎦•
⎡ ⎤
∑p
kk
p p
E x n E x n k e n k x n
2 2 2
1 11 [ ] [ ]
(
σ σ α φ σ α ρ= =
⎡ ⎤= − = −⎢ ⎥
⎣ ⎦∑ ∑
p p
d x k x kk k
P
k k
G 1) for optimum values of [ ]
α=
∑opt kp
k
661
1 [ ]α ρ=
−∑ kk
k
First Order Predictor SolutionFirst Order Predictor Solution
2 2
1
1
For the case we can examine effects of sub-optimum value of on the quantity / . α σ σ
• =
=P x d
pG
( ) 11 1
The optimum solution is:
[ ]α ρ
•
=−P opt
G11 1
1
[ ] Consider choosing an arbitrary value for ; then we
α ρα•
2 2 2 2 21 1 11 2 1
get
[ ]σ σ α ρ α α σ⎡ ⎤= − + +⎣ ⎦d x e
( )
1 1 1
1
[ ]
Giving the sub-optimum result
ρ⎣ ⎦•
=
d x e
G( ) 21 1
21
1 2 1 1 1
[ ] ( / )
Where the term / represents the increase in i f [ ] d t th f db k f th
α ρ α
α
=− + +
•
P arbG
SNR
SNRd i l [ ]
67
variance of [ ] due to the feedback of thed n error signal [ ]. e n
First Order Predictor SolutionFirst Order Predictor Solution( )
2
Can reformulate as
α
• P arbG
( )1
21 1
1
1 2 1
[ ]
α
α ρ α
−=
− +Q
P arb
SNRG
2
11
1
for any value of (including the optimum value). Consider the case of ( )
αα ρ
•
• =
( )
2
2
2 2
111 11
1 1 1 1
[ ][
[ ] [ ]
ρρ
ρ ρ
−⎡ ⎤
= = ⋅ −⎢ ⎥− −⎣ ⎦Q
P subopt
SNRG )⎡ ⎤
⎢ ⎥⎣ ⎦QSNR1 1 1 1[ ] [ ]ρ ρ⎣ ⎦
the gain in prediction is a product of the prediction gain without the quantizer, reduced by the loss due to feedback
⎣ ⎦•
QSNR
68of the error signal.
First Order Predictor SolutionFirst Order Predictor SolutionFirst Order Predictor SolutionFirst Order Predictor Solution1 We showed before that the optimum value of α•
[ ]1 1 1was [ ] / / If we neglect the term in 1/SNR (usually very small), then
ρ +
•
SNR
1 1
1
[ ] and the gain due to prediction is
α ρ=
( ) 2
11 1
[ ]
Thus tρ
=−
•
P optG
1 0here is a prediction gain so long as [ ]ρ ≠ Thus t•
( )
1 01 0 8
2 77
here is a prediction gain so long as [ ] It is reasonable to assume that for speech, [ ] . , giving
. (or 4.43 dB)
ρρ
≠• >
>P optG
69
( )P opt
Differential Quantizationˆ[ ] [ ]
1α
=
= −∑p
kk
x n x n kQ+
Δ
x[n] d[n] d[n]^
[ ] [ ] [ ]ˆ[ ] [ ] [ ]
= −
= +
d n x n x n
d n d n e n
Q
P
+
+
+ -
x[n]~ x[n]^[ ] [ ] [ ]
ˆˆ[ ] [ ] [ ]ˆ[ ] [ ] [ ]
+
= +
d n d n e n
x n x n d n22 2
2 2 2
σσ σ= = = ⋅dx x
p QSNR G SNR [ ] [ ] [ ]= +x n x n e n
[ ]
2 2 2
1
First Order Predictor:1
σ σ σ
ρα =
p Qe d e
The error, e[n], in quantizing d[n] is the same as the error in/
[ ][ ]
1
2
2
1 11 11
1 1
α
ρρ
+⎛ ⎞⎛ ⎞
= −⎜ ⎟⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠p
Q
SNR
GSNR
d[n] is the same as the error in representing x[n]
70[ ]21
1 1ρ⎛ ⎞
≈ ⎜ ⎟−⎝ ⎠
Prediction gain dependent on ρ[1], the first correlation coefficient
LongLong--Term Spectrum and CorrelationTerm Spectrum and Correlation
Measured with 32-pointHamming window
0 85790.85790.5680
0.25360.2536
71
Computed Prediction GainComputed Prediction Gain
72
Actual Prediction Gains for SpeechActual Prediction Gains for SpeechActual Prediction Gains for SpeechActual Prediction Gains for Speech
i ti i i 4 k• variation in gain across 4 speakers
• can get about 6 dB improvement in SNR => 1 extra bit equivalent in quantization but at a price ofquantization—but at a price of increased complexity in quantization
• differential quantization works!!
• gain in SNR depends on signal correlationscorrelations
• fixed predictor cannot be optimum for all speakers and for all speech
Delta ModulationDelta ModulationDelta ModulationDelta Modulation• simplest form of differential quantization is in delta modulation (DM)• sampling rate chosen to be many times the Nyquist rate for the input
signal => adjacent samples are highly correlated• in the limit as T 0, we expect
2[1] 0 as φ σ→ →x T
• this leads to a high ability to predict x[n] from past samples, with the variance of the prediction error being very low, leading to a high
di ti i > i l 1 bit (2 l l) ti > th bitprediction gain => can use simple 1-bit (2-level) quantizer => the bit rate for DM systems is just the (high) sampling rate of the signal
75
Linear Delta ModulationLinear Delta Modulation
ˆ[ ] [ ] 0 [ ] 0
2-level quantizer with fixed step size, , with quantizer form
( )
•Δ
= Δ > =d n if d n c n[ ] 0 [ ] 1
1
( )using simple first order predictor
with optimum prediction gain
= −Δ < =•
if d n c n
( ) 2
11 [1]
[1]
as ρ
ρ
=−
• →
P optG
( )1,
(qualitatively
→∞P optG
(qualitatively
only since the assumptions under which the equation was derived
76
[1] 1)break down as ρ →
Illustration of DMIllustration of DMˆˆ ˆ[ ] [ 1] [ ]
1ˆ
basic equations of DM are
when (essentially digital integration or accumulation of increments of )
α
•
= − +• ≈ ± Δ
x n x n d nα
d ˆ[ ] [ ] [ 1] [ ] [ 1] [ 1][ ]
is a first backward
= − − = − − − −•
d n x n x n x n x n e nd n [ ]
( )( )max | |
difference of , or an approximation to the derivative of the input how big do we make --at maximum slope of we need• Δ
Δ≥
a
a
x nx t
dx tmax | |
or else the reconstructed signal will lag the actua
≥
•T dt
ˆ[ ]
l signal called 'slope overload' condition--resultingin quantization error called 'slope overload distortion' since can only increase by fixed increments of , fixed DM is called linear DM o
=>
• Δx n r LDM[ ] y y ,
slope overloadoverload condition
granular noise
77
DM Granular NoiseDM Granular NoiseDM Granular NoiseDM Granular Noise( ) when has small slope, determines the peak error when• Δ =>ax t
( ) 0ˆ[ ]
, quantizer will be alternating sequence of 0's and 1's, and will alternate around zero with peak variation of this c=
Δ =>ax t
x n onditionis called "granular noise"is called granular noise
• need large step size to handle wide dynamic range
• need small step size to accurately represent low level signals
with LDM we need to worry about dynamic range• with LDM we need to worry about dynamic range and amplitude of the difference signal => choose Δ to minimize mean-squared quantization error (a
78
compromise between slope overload and granular noise)
Performance of DM SystemsPerformance of DM SystemsPerformance of DM SystemsPerformance of DM Systems normalized step size defined as•
Δ
( )1/22
0
[ ] [ 1]
/ (2 )
oversampling index defined as
Δ
⎡ ⎤− −⎣ ⎦•
= S N
E x n x n
F F F0 ( ) where is the sampling rate of the DM
and is the Nyquist frequency of the signal the tota
•
•
S N
S
N
FF
2l bit rate of the DM is
BR F F F02 = = ⋅S NBR F F F
• can see that for given value of F there is an optimum value of Δ• can see that for given value of F0, there is an optimum value of Δ
• optimum SNR increases by 9 dB for each doubling of F0 => this is better than the 6 dB obtained by increasing the number of bits/sample by 1 bit
• curves are very sharp around optimum value of Δ => SNR is very sensitive to input level
79
• curves are very sharp around optimum value of Δ => SNR is very sensitive to input level
• for SNR=35 dB, for FN=3 kHz => 200 Kbps rate
• for toll quality need much higher rates
Adaptive Delta ModAdaptive Delta Mod
[ ] [ 1]
step size adaptation for DM (from codewords)
•
Δ = ⋅Δ −n M n
min max[ ][ ]
[ 1] [ ][ ]
is a function of and , since depends
l th i f
Δ ≤ Δ ≤ Δ
•−
nM c n
c n c nd[ ]
ˆ[ ] [ ] [ 1][ ]
only on the sign of the sign of ca
α= − −•
d nd n x n x n
d n n be determined before theactual
ˆ[ ][ ]
determined before theactual
quantized value which needs the new value of for evaluation
• fixed predictors can give from 4-11 dB• fixed predictors can give from 4-11 dB SNR improvement over direct
ti ti (PCM)quantization (PCM)• most of the gain occurs with first g
order predictor• prediction up to 4th or 5th order helps• prediction up to 4th or 5th order helps
85
DPCM with Adaptive QuantizationDPCM with Adaptive QuantizationDPCM with Adaptive QuantizationDPCM with Adaptive Quantization• quantizer step size proportional to varianceproportional to variance at quantizer input
• can use d[n] or x[n] to control step sizecontrol step size
• get 5 dB improvement in SNR over μ-law non-adaptive PCM
• get 6 dB improvement in SNR using differential gconfiguration with fixed prediction => ADPCM is about 10-11 dB SNR
DPCM with Adaptive PredictionDPCM with Adaptive PredictionDPCM with Adaptive PredictionDPCM with Adaptive Prediction
need adaptive prediction to handle pnon-stationarity of
speech
88
DPCM with Adaptive PredictionDPCM with Adaptive PredictionDPCM with Adaptive PredictionDPCM with Adaptive Prediction prediction coefficients assumed to be time-dependent of the form•
∑p
1
ˆ[ ] [ ] [ ]
[ ]
assume speech properties remain fixed over short time intervalschoose to minimize the average s
α
α
=
= −
••
∑ kk
x n n x n k
n quared prediction error[ ] choose to minimize the average sα• k n quared prediction errorover short intervals the optimum predictor coefficients satisfy the relationships•
1[ ] [ ] [ ], 1, 2,...,
[ ]
where is the short-time autocorrelation function
α=
= − =
•
∑p
n k nk
n
R j n R j k j p
R j of the form
[ ] [ ] [ ] [ ] [ ], 0
[ ]
is window positioned at sample of input
∞
=−∞
= − + − − ≤ ≤
•
∑nm
R j x m w n m x j m w n m j j p
w n - m n
89
update 's every 10-20 msecα•
Prediction Gain for DPCM with Prediction Gain for DPCM with Ad i P di iAd i P di iAdaptive PredictionAdaptive Prediction
[ ]2
10 10 2
[ ]10log 10log
[ ]
⎡ ⎤⎡ ⎤⎣ ⎦⎢ ⎥=⎡ ⎤⎢ ⎥⎣ ⎦⎣ ⎦
P
E x nG
E d n[ ]
fixed prediction 10.5 dB prediction gain for large
⎡ ⎤⎢ ⎥⎣ ⎦⎣ ⎦• →
E d n
padaptive prediction 14 dB gain for
large adaptive prediction more robust to
• →
•p
speaker, speech material
90
Comparison of CodersComparison of CodersComparison of CodersComparison of Coders
• 6 dB between curves
• sharp increase in pSNR with both fixed prediction and adaptive quantizationquantization
• almost no gain for adapting first orderadapting first order predictor