DPCM

1

ELE4607 Advanced Digital Communications

Module 4: DifferentialCoding

Differential Coding

2

• In encoding an analog signal, need to quantize it (A/D or Analog-to-Digital conversion).

• Quantization itself is complex – covered in a separate module.• Output of quantizer is a series of numbers every T seconds.• If those numbers have a smaller dynamic range, it follows that fewer

bits are required to represent the source, hence a bit rate saving.

3

Differential CodingHence the basic idea of differential coding:

• Encoder generates a prediction of the next sample• Decoder generates the same prediction• The error, defined as (actual - predicted), is transmitted.• If the prediction is good, fewer bits are required to transmit the

error signal than the raw samples themselves.

4

Differential Coding• Direct quantization is sometimes called PCM, for Pulse Code Modula-

tion.

• Differential quantization is thus called Differential PCM or DPCM.• Used directly in many systems or in conjunction with other algorithms,

for example transform image coding (separate module)

Differential Coding TheoryA model for a signal is

s(n) = sˆ(n) + e(n)

(1)

5

s(n) is the signal sample at instant nsˆ(n) is an estimation (or approximation) of the signale(n) is an error term

• sˆ(n) is the “deterministic component”.• e(n) is the “stochastic” or random (non-predictable) component, de-

scribed in statistical terms

Differential Coding TheorySignal model:

s(n) = sˆ(n) + e(n)

(2)

6

or equivalently,

where

e(n) = s(n) − sˆ(n) (3)

s(n) is the signal sample at instant nsˆ(n) is an estimation (or approximation) of the signale(n) is an error term

Differential Coding TheoryPrediction formed by a weighted linear sum:

sˆ(n) = a1s(n − 1) + a2s(n − 2) + · · · + aps(n − P )P

where

= X

ak s(n − k) (4)k=1

ak is the kth prediction coefficientP is the predictor order

• Both encoder and decoder run this prediction synchronously.

• Note that it is effectively a form of discrete-time filter.• In speech coding, encoder is said to be the “analysis” filter, and

decoder is the “synthesis” filter.

Problem: error e(n) is quantized (not known exactly at the decoder), so s(n) is not known precisely. This complicates the prediction process (see following equations)

9

Differential PredictionCombining the above equations gives:

sˆ(n)z

P }| {

e(n) = s(n) − X

ak s(n − k) (5)k=1

s˜(n) is introduced in both the encoder and decoder, and prediction based on that (see diagrams)

s˜(n)z

P }| {

e(n) = s(n) − X

ak sˆ(n − k) (6)k=1

10

+

Encoder & DecoderEncoder:

s(n) + P

−

e(n)

s˜(n)

QuantizerQ(·)

Predictor1 − A(z)

sˆ(n) P

+

eˆ(n)

s˜(n)

Decoder:eˆ(n) + P sˆ(n)

+

s˜(n)Predictor1 − A(z)

Filter InterpretationTaking previous prediction equations and converting to the Z -domain:

P

A(z) = 1 − X

ak z−k (7)

k=1

= 1 − Ps(z) (8)

where A(z) is normally referred to as the “analysis filter” – it creates the error (or prediction residual) from the a speech signal:

E(z)A(z) =

S(z)(9)

The short-term predictor Ps(z) forms the linear prediction of future samples based on past samples. The subscript ‘s’ is used to distinguish the short-term predictor from the long-term predictor (see later module on speech coding). Ps(z) is defined as

Ps(z) =PX

ak z−k = 1 − A(z) (10)

k=1

Note that some texts denote the prediction filter by A(z) rather than1 − A(z)

13

Simple Predictors• A simple predictor which gives good prediction for images and reason-

ably good prediction for speech is a first-order predictor with P = 1 and a unity prediction coefficient a1 = 1.

• Optimal prediction requires minimization of a “cost function”. The result will be shown in the next section to depend on the autocorrelation of the signal.

• Another way (not shown) is in terms of vectors (signal vector, coefficient vector) & uses the “principle of orthogonality” (ends up deriving the same result)

First-Order Optimal Predictor

14

Note that the speech coding literature usually uses a for the predictor coefficients, but the adaptive filtering literature usually uses h. It doesn’t matter– they are just coefficients.Consider a simple first-order predictor:

xˆ(n) = h1x(n − 1) (11)

The prediction error is

e(n) = x(n) − xˆ(n) (12)= x(n) − h1x(n − 1) (13)

1

2

1

The instantaneous square error is

e2(n) = (x(n) − h x(n − 1))2

(14)

Taken over a sufficiently large number of samples, the average square error is

e2 = X

n

= X

1e2(n)

N

1(x(n) − h x(n − 1))

N 1

n

= X

(x2(n) − 2x(n)h x(n − 1) + h2x2(n − 1))N

1 1

n

1− −

To minimize the average square error with respect to the predictor parameterh1, take derivatives:

Setting

d e2

d h1=

1 X

Nn

(0 − 2x(n)x(n − 1) + 2h1x2(n − 1))

d e2

gives the equation

= 0d h1

1 X x(n)x(n 1) = h∗

Nn

1 X x2(n 1)

Nn

1

P2

Hence the optimal predictor h∗ is

1 P

h∗ N n x(n)x(n − 1)1 = 1

N n x (n − 1) (15)

Where the summation has to be taken over a “sufficiently large” number of samples to form the prediction.However, real-world signals are “quasi-stationary” so we cannot have too large a block, otherwise the signal changes too quickly.

X

Taking the summation over a large number of samples, we may use autocor- relations defined as

R (0) ≈ 1x2

Nn

1 X

(n − 1) (16)

Hence

R (1) ≈N

x(n)x(n − 1) (17)n

R (1)h∗

1 = (18)R (0)

Second-Order Optimal Predictor

xˆ(n)

= h1x(n − 1) + h2x(n − 2)∴ e(n) = x(n) − xˆ(n)

= x(n) − (h1x(n − 1) + h2x(n − 2)) 2

For a second-order predictor,

(19)

∴ e2(n) = [x(n) − (h1x(n − 1) + h2x(n − 2))]

Over many samples, the average square error is

e2 =1

N

1

X e2(n)

nX

2=N

[x(n) − (h1x(n − 1) + h2x(n − 2))]n

To minimize the average square error with respect to the predictor parameters h1 and h2, take derivatives:

and set

∂ e2

∂ h1=

1 X

Nn

2 [x(n) − (h1x(n − 1) + h2x(n − 2))]

× [−x(n − 1)]

(20)

∂ e2

= 0 (21)∂ h1

Second-Order Optimal Predictor→ optimal predictor h∗,1

1

2

−N

1 X 2 [x(n) − (h∗x(n − 1) + h∗x(n − 2))]

N 1 2

n

× [−x(n − 1)]

= 0

1 X x(n)x(n 1) = h∗

Nn

1 X x(n − 1)x(n − 1)

n

+ h∗ 1 X

N x(n − 1)x(n − 2)

n


2

Using autocorrelation as before,

R (1) = h∗R (0) + h∗R (1) (22)1 2

Similarly, optimizing wrt h∗ yields

R (2) = h∗R (1) + h∗R (0) (23)1 2


h∗2

In matrix form these are more compactly expressed as

R (1)

R (0) R (1)

h∗

or,

R (2)=

R (1) R (0) 1 (24)

r = R h∗ (25)

Prediction FiltersTaking z transforms of the predictor equation,

E(z) = X (z) − Xb (z) (26)

= X (z) − h1X (z)z−1 + h2X (z)z−2

= X (z)

1 − h1z

−1 + h2z−2

Prediction FiltersThus

X (z)E(z)

1=

1 − (h1z−1 + h2z−2)

(27)

• The analysis filter is FIR (all-zero).

• The prediction filter is all-pole.• Care must be taken to ensure the synthesis filter at the receiver does not

become unstable.

• Factorize and check roots are inside the unit circle.

MATLAB Example

actual predicted

Sa

mpl

e V

alue

Second−order Linear Prediction15

10

5

0

−5

−10

−15

−200 50 100 150 200 250 300 350 400

Sample Number

∗

The actual predictor of

hact =

1.7119

−0.8100 (28)

was used in lpeg.m, with an input of white Gaussian noise of variance unity. Over a block of 2000 samples, the normal equation method as above gave

hnormal =

1.7188

−0.8238 (29)

Coding the Predictor Parameters• Optimal predictor size will depend on the application.

• Predictor parameters will change over time, as the source is being en- coded (for example, words being spoken, syllables being pronounced, areas of a video screen).

Coding the Predictor Parameters• Parameters may be estimated from past blocks (backwards estimation)

– does not require transmission of the parameters, but the optimal pre-dictors are slightly out-of-date.

• Parameters may be calculated by the transmitter and explicitly sent to the receiver. This requires extra bits (a “side channel”).

• Speech encoders typically sample at fs = 8kHz, calculate over a block of 2-20ms and use 10th order prediction.

• Image encoders generally use a smaller order predictor, or a two- dimensional predictor.

30

Coding the Parameters• Instead of block-by-block parameter updates and transmission, it is

pos- sible to update the predictor on a sample-by-sample basis.

• Generally this is termed “adaptive linear prediction”.• Algorithm for block estimation is the Normal Equation method or

Yule- Walker method.

• For sample-by-sample estimation it is called the LMS or Least MeanSquare algorithm.

Adaptive Prediction

31

The predictor is defined by

e(n) = x(n) − xˆ(n) (30)P

= x(n) − X

hk x(n − k) (31)k=1

• For a first-order predictor, the error squared e2(n) gives rise to a quadratic-shape in two dimensions (e2 vs h1).

• For a second-order, gives rise to a “bowl-shaped” surface (e2 vs h1, h2).

Adaptive Prediction

32

In matrix form this is

e(n) = x(n) − hT x(n − 1) (32)

Strictly, h is now a function of time index n, ie h(n)

h1

h2h = . .hP

(33)

x(n − 1) =

x(n − 1)

x(n − 2) .

(34) x(n − P )

The estimate of the gradient in the h1 direction is

ˆ 2∂ T 2∇h1 e (n) =

∂h1

∂

x(n) − h x(n − 1)

(35)

2=∂h1

(x(n) − (h1x(n − 1) + h2x(n − 2) + · · ·) )

= 2 (x(n) − (h1x(n − 1) + h2x(n − 2) + · · ·) )∂

×∂h1

(x(n) − (h1x(n − 1) + h2x(n − 2) + · · ·) )

= 2 e(n) ( − x(n − 1))= −2 e(n) x(n − 1) (36)

Adaptive PredictionSimilarly, the estimate of the gradient in the h2 direction is

ˆ 2∇h2 e (n) = −2 e(n) x(n − 2) (37)

∇

At each new sample, update the predictor h by a quantity proportional to thenegative gradient of e2(n) (because we want to seek the minimum error). So,

h(n + 1) = h(n) − µ ˆ e2(n) (38)

where µ is the adaptation rate parameter. Using the partial derivatives just found,

h(n + 1) = h(n) + 2µ e(n) x(n − 1)) (39)

= . . .

Expanded this is,

h1(n + 1)

h2(n + 1)

hP (n + 1)

h1(n)

h2(n)

hP (n)

+ 2 µ e(n)

x(n − 1)

x(n − 2)

x(n − P )

(40)

This equation is evaluated at each sample to update the predictor parameters, which in turn are used to predict the next sample.

38

h

Adaptive Prediction Steppinge2

b

h1(n)b

h1 (n + 1)b

b b

b b

b

∗1

h1

µ is the adaption rate parameter (set empirically, larger gives faster adaptation but possibility of instability). Typ. µ = 0.001.

MATLAB Example

act

∗

39

See MATLAB script alpeg.m

with µ = 0.001, the actual predictor of

1.7119h = −0.8100 (41)

was used with an input of white Gaussian noise of variance unity.Over a block of 2000 samples, the normal equation method as above gave

hLM S =

1.6247

−0.7968 (42)

h co

effic

ient

val

ue

Adaptive Predictor − h Coefficients with µ=0.0012

1.5

1

0.5

0

−0.5

−10 200 400 600 800 1000 1200 1400 1600 1800 2000

Sample Number

41

Module Summary – Important Points1. Explain role of prediction in coding

2. Mathematically derive and implement block-based predictor

3. Mathematically derive and implement adaptive predictor

DPCM

Documents

white gaussian

average square

sufficiently

normal equation

order optimal

order predictor

error term

h1 h2