Top Banner
VLSI DSP 2010 Y.H. Hwang 2-1 Chapter 2 Digital Signal Processing Algorithms & Their Representations VLSI DSP 2010 Y.H. Hwang 2-2 Part 1. Review of DSP Algorithms
37

Chapter 2 Digital Signal Processing Algorithms & Their ...

Feb 21, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-1

Chapter 2 Digital Signal Processing Algorithms & Their Representations

VLSI DSP 2010 Y.H. Hwang 2-2

Part 1.Review of DSP Algorithms

Page 2: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-3

Example DSP algorithms and applications

Speech Coding/decoding, encryption/decryption

Recognition, synthesis

Digital cellular phones, personal communication systems

digital cordless phones, multimedia computers, secure communication

Modem algorithmsDigital cellular phones, personal communication systems

Digital audio broadcast, wireless computing, navigation

Digital communication, data/fax modems

VLSI DSP 2010 Y.H. Hwang 2-4

Audio equalization, noise cancellationConsumer audio, professional audio, advanced vehicular audio

Echo cancellationSpeakerphones, modems, telephone switches

Digital communication

BeamformingNavigation, radar/sonar, smart antenna

Image compression and decompressionDigital cameras, digital video, multimedia computers, consumer video

Example DSP algorithms and applications

Page 3: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-5

A x = b x = b'A'

backward substitution

A => LU

Matrix Operations (1)

Matrix - vector multiplication

Matrix - matrix multiplication

solution of linear systems A•x=bmatrix triangularization

solution of triangular linear systems (backward substitution)

matrix inversion, pseudo inverse

QR, LU, Gauss Elimination

VLSI DSP 2010 Y.H. Hwang 2-6

⎥⎥⎥⎥

⎢⎢⎢⎢

)0()1(

)1()0()1(

)1()0()1(

)1()0(

rr

rrr

rrr

rr

Matrix Operations (2)

singular value decomposition (SVD)

eigenvalue computationcharacteristic equation, det |A - λ·I| = 0

A·e = λ·e, A·E = Λ·E

solution of Toeplitz linear systemA ⇒ a(i,j) = f(|i-j|)

Autocorrelation matrix

Page 4: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-7

Matrix Operations (3)

Gauss EliminationComputing complexity O(N3)

To make A matrix diagonal

Pivoting problem ⇒ numerical stability

( ) ( ) ( )

( ) ( )

( ) ( )

( )

( )⎥⎥⎥

⎢⎢⎢

⎡=⎥⎥⎥

⎢⎢⎢

⎡⋅⎥⎥⎥

⎢⎢⎢

⎡→⎥⎥⎥

⎢⎢⎢

⎡=⎥⎥⎥

⎢⎢⎢

⎡⋅⎥⎥⎥

⎢⎢⎢

⎡→⎥⎥⎥

⎢⎢⎢

⎡=⎥⎥⎥

⎢⎢⎢

⎡⋅⎥⎥⎥

⎢⎢⎢

13

12

1

3

2

1

133

132

123

122

131211

13

2

1

3

2

1

133

132

232221

131211

3

2

1

3

2

1

333231

232221

131211

0

0

0 b

b

b

x

x

x

aa

aa

aaa

b

b

b

x

x

x

aa

aaa

aaa

b

b

b

x

x

x

aaa

aaa

aaa

Modify once

VLSI DSP 2010 Y.H. Hwang 2-8

RAQ

QQt

t

=⋅

Ι=⋅

( )

⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢

−=

1000

0100

cossin

sincos

0010

0001

,

θθθθpqQ

q th

(q+1) thpq

pq

a

a

,

,11tan +−=θ

QR decomposition (Givens’ Rotation)

A=Q·RQ is a unitary matrix

R is triangular

decomposition by a sequence of Givens’ rotations

Given’s Rotation

To eliminate aq+1,p by aq,p

Page 5: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-9

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

θθθθ

cossin00

sincos00

0010

0001

44434241

34333231

24232221

14131211

aaaa

aaaa

aaaa

aaaa

q=3p=1

( )

0

cossin

sincos

241

231

31412

41231

4131

4131'41

241

231

241

231

241

241

231

231

4131'31

=

+⋅+

+

−⋅=

⋅+−⋅=

+=

++

+=

⋅+⋅=

aa

aa

aa

aa

aaa

aa

aa

a

aa

a

aaa

θθ

θθ

θ

41a

31a241

231

41

241

231

31

sin

cos

aa

a

aa

a

+=

+=

θ

θ

QR decomposition (2)

VLSI DSP 2010 Y.H. Hwang 2-10

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

θθ

θθ

cos00sin

0100

0010

sin00cos

44434241

34333231

24232221

14131211

aaaa

aaaa

aaaa

aaaa

⇒= +

pq

pq

a

at

,

,1

⇒+

=21

1cos

⇒=+

= tt

t θθ cos1

sin2

1 MPY 1 SQRT1 ADD 1 INV

1 MPY

QR decomposition (3)

Jacob rotation

Computing complexity analysis

1,1

1,41tana

a−=θ

1 DIV

Page 6: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-11

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

x

x

x

x

x

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

x

x

x

x

x

Jacob Given's

Jacob’s v.s. Given’s Rotation

Different nullification ordering

Questionswhich one is more computing efficient?

which one is more suitable for hardware

implementation?

Decision factors: parallelism & locality of computing

VLSI DSP 2010 Y.H. Hwang 2-12

A RT.

QtAX= b

QtAX= Qt bRX= Qt b

QR v.s. LU decompositions

QR:unitary

better numerical stability

easy to find Eigen values

preserve the norm of the matrix

To solve this => Backward substitution

Page 7: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-13

Backward substitution (1)

Computing Complexity/ : N

*:

+:

order:

xrxrbxrxrbx

rbxbbb

xxx

rrrrrr

31321211

2232322

3333

3

2

1

3

2

1

33

2322

131211

)(

)(

000

−−=⇒

−=⇒

=⇒

⎥⎥⎥⎥

⎢⎢⎢⎢

=

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

22

)1)(11( 2 NNNN −=

−−+

2

2 NN − Can be replaced by MAC operation

)( 2NO

VLSI DSP 2010 Y.H. Hwang 2-14

Backward substitution (2)

Computing Parallelism

Speed up factorN = 4 , t = 7 = 2N-1

computing complexity≒

minimum hardware requirement =2

2

N

)12(2

2

−N

N

1

2 3

3 4 5

4 5 6 7

000

00

0

'4

'3

'2

'1

44

3433

242322

14131211

b

b

b

b

r

rr

rrr

rrrr

←←←←

⎥⎥⎥⎥

⎢⎢⎢⎢

4

N≈

Page 8: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-15

111 ×××× −= mnnmm bxAe

Least Square problem (1)

Optimization problemA, b are given, x is unknown

e is the error vector

Find estimate such that ||e||2 is minimized

m is greater than n, i.e. the number of constraints is greater than the number of variables

e A x b

VLSI DSP 2010 Y.H. Hwang 2-16

bbbAxxAbxAAx

bxAbxAetttttt

t

+−−=

−⋅−= )()(|||| 2

bAxAA

bAxxAAxxAbxAAx

x

e

x

e

e

e

tt

tt

iitt

ittt

i

n

x

x

=⇒

=∂−∂+∂−∂⇒

⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢

∂∂

∂∂

=∇

=∇

0

||||

||||

||||

where,0||||

2

1

2

2

2

Least Square problem (2)

Solution 1

Page 9: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-17

Least Square problem (3)

Solution 1 (cont.)

If AtA is non-singular, then

is the pseudo inverse of matrix A

Not practical due to the inverse operation

bAxAAbAxAA tttt =⇒=−⇒ 0

tt

tt

AAAA

bAbAAAx1

1

)(

,)(−+

+−

=

==

VLSI DSP 2010 Y.H. Hwang 2-18

Least Square problem (4)

Solution 2: QR factorizationQt·A = R

Q is a unitary matrix

The rank of matrix A after unitary transformation will not change

The number of row in R is the rank of A

Qt AR

0

m

m

n

n• =

Page 10: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-19

Least Square problem (5)

Solution 3: QR factorization (cont.)

⎥⎥⎦

⎢⎢⎣

⎡==⋅⇒

⋅−⋅⋅=⋅⇒

−⋅=

'

'

'd

ut

ttt

b

bbbQ

bQxAQeQ

bxAe

n

m-n

2222

1

2

2

||'||0||'||||'||||||

''

'''

||||

)()(||||

ddu

dd

uuu

ttt

tttt

eeee

be

bRxbxRe

eeeeQQe

eQeQeQ

+=+=⇒

=⇒=⇒−=⇒

===

⋅=

− This part will be zero

For this part, no way to minimize

Residual error

VLSI DSP 2010 Y.H. Hwang 2-20

ˆ22110 pnpnnnn xaxaxaxay −−− ++++=

}]ˆE{[ 2nn y-y

Least mean square estimation (1)

for non-deterministic discrete random signal.

predicting one random process {yn} from the observations of another random process {xn}

such that

is minimized

Page 11: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-21

Least mean square estimation (2)

Wiener-Hopf equationExpanding

Taking partial derivatives with respect to ai’s

[ ]

}{E

,,,}{E

],,,,[

where

1

1

210

nxy

pnnn

pn

n

n

t

xx

tp

xyxx

yxr

xxx

x

x

x

xxR

aaaaa

raR

⋅=

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

=⋅=

=

=⋅

−−

}]ˆE{[ 2nn y-y

VLSI DSP 2010 Y.H. Hwang 2-22

Least mean square estimation (3)

Wiener-Hopf equation (cont.)Note: a random field is called stationary or (homogeneous) if its expected value is independent of position

Note: {xn} and {yn} are jointly wide sense stationary if E{yn,xn-i} = rxy(i) = constant for a given i

If {xn} and {yn} are jointly wide sense stationary

txyxyxyxy

xxxxxx

xxxxxx

xxxxxx

xx

prrrr

rprpr

prrr

prrr

R

)](,),1(),0([

)0()1()(

)1()0()1(

)()1()0(

=⇒

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

+−−−

=⇒ Toeplitz matrix

Page 12: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-23

∑∑

∑∑

∑ ∑

=

=

=

=

= =

−==

=⋅=⇒

+=⇒

−+−=

p

k

kk

q

k

kk

q

k

kk

p

k

kk

p

k

q

kkk

zazAzbzB

zA

zBzHzXzHzY

zXzbzYzazY

knxbknyany

10

01

1 0

1)( ,)(with

)(

)()( where),()()(

)()()(

)()()(

Digital Filters (1)

Representation

Moving average (MA) filter FIR filterH(z) = B(z)

Wieghted average of input data

)()2()1()()( 210 qnxbnxbnxbnxbny q −++−+−+=

VLSI DSP 2010 Y.H. Hwang 2-24

=

=

+−=

−=

=

p

kk

p

k

kk

nxknyany

zazA

zAzH

1

1

)()()(

1)(

where,)(1)(

)(1

zA)(zB

Digital Filters (2)

Autoregressive (AR) filter

Auto-regressive moving average (ARMA) filter

Page 13: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-25

ωαθ(ω) ⋅=

Digital Filters (3)

Linear phase filterPhase shift is proportional to frequency

Fixed delay of each frequency component

Symmetrical coefficients: h(n) = h(N-n)

Ex: h(0) h(1) h(2) h(3) h(4) h(5)

x(n) x(n-1) x(n-2) x(n-3) x(n-4) x(n-5)

∑ ∑∑−

=

−−−−

=

+

−−−

=

+=

+==

21

0

)1(

21

0

1

21

1

0

])[(

)()()()(

N

n

nNn

N

n

N

N

nnN

n

n

zznh

znhznhznhzH

VLSI DSP 2010 Y.H. Hwang 2-26

Digital Filters (4)

7-tap linear phase FIR filter

Data broadcastingstructure

Long accumulationpath

Page 14: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-27

Adaptive Filters

Used for applications such asEcho cancellation, channel equalization, voicebandmodem, digital mobile radio, system identification, ….

The coefficients of the filter are updatedat each iteration to minimize the difference between the output and the desired signalContinues until the coefficients converge

Basic building blocksGeneral filter blockCoefficient update block

Coefficient update subject to different criteriaLMS, RLS, …

VLSI DSP 2010 Y.H. Hwang 2-28

LMS Adaptive Filters (1)

)]1(,),1(),([)(

)](,),(),([)(

)()1()(ˆ

21

+−−=

=

−=

Nnununun

nnnn

nnnd

T

NT

T

U

W

UW

ωωω

Weighted sum of the observations as an estimate of the desired signal d(n)Weight vectorInput vector

Page 15: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-29

LMS Adaptive Filters (2)

Estimation errorThe difference between the desired signal and the estimated signal

In the nth iteration, WT(n) minimizing the square error e2(n) is selected

Coefficients updateThe derivative of e2(n) w.r.t. WT(n-1)

)()1()()(ˆ)()( nnndndndne T UW −−=−=

)()()1()(

))((2

1)1()(

2)(2

22)(

2

22

nnenn

nenn

ed

de

e

T

T

T

T

T

UWW

WW

UUUW

UUWUW

W

W

⋅+−=⇒

Δ⋅−−=

−=−−=

⋅+−=∂∂

μ

μ

VLSI DSP 2010 Y.H. Hwang 2-30

LMS Adaptive Filters (3)

Questions:1. Critical path?2. Symbol rate?3. Circuit complexity?

Page 16: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-31

Stochastic-gradient adaptive lattice filter (1)

Lattice filter

Forward and backward error predictionn: time instance

m: lattice stage number (=1,2,…,N)

km: partial correlation (or reflection) coefficient

)1|()()1|1()|(

)1|1()()1|()|(

−−−−=

−−−−=

mnenkmnemne

mnenkmnemne

fmbb

bmff

mk−

mk−

1−− mk

1−− mk

VLSI DSP 2010 Y.H. Hwang 2-32

Stochastic-gradient adaptive lattice filter (2)

Adapts km to minimizethe square sum of the forward and backward prediction errors

Expressing J(n) in terms of ef(n|m-1), eb(n-1|m-1) and km(n)

km(n) update equation

)|()|()( 22 mnemnenJ bf +=

)1|()]1|()()1|1([2

)1|1()]1|1()()1|([2)(

)(

−−−−−−

−−−−−−−=∂∂

mnemnenkmne

mnemnenkmnenk

nJ

ffmb

bbmfm

)1|1()1|()(2

))]1|1()1|()((1)[(

)(

2

)()()1(

22

−−−+

−−+−−=

∂∂

−=+

mnemnen

mnemnennk

k

nJnnknk

bfm

bfmm

m

mmm

β

β

ββm(n) is the adaptation constant

Page 17: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-33

Stochastic-gradient adaptive lattice filter (3)

Adaptation constant βm(n)To keep adaptation speed independent of the input signal levels

normalized by an estimate of the sum of the (m-1)-th order prediction error variance

β is a constant dependent of the initial value of S(0|m-1)

)1|1()1|(

)1|()1()1|1(

)1|(

1)(

22 −−+−+

−−=−+−

=

mnemne

mnSmnS

mnSn

bf

m

β

β

VLSI DSP 2010 Y.H. Hwang 2-34

Stochastic-gradient adaptive lattice filter (4)

Stochastic-gradient adaptive algorithm

)1|()()1|1()|(

)1|1()()1|()|(

)1|(

1

)1|1()1|(

)1|()1()1|1(

)1|1()1|()(2

))]1|1()1|()((1)[()1(

22

22

−−−−=

−−−−=−

=

−−+−+

−−=−+

−−−+

−−+−−=+

mnenkmnemne

mnenkmnemne

mnSβ

mnemne

mnSmnS

mnemnen

mnemnennknk

fmbb

bmff

m

bf

bfm

bfmmm

β

β

β

Adaptation equations

Order-update equations

Page 18: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-35

Stochastic-gradient adaptive lattice filter (5)

VLSI DSP 2010 Y.H. Hwang 2-36

Discrete Cosine Transform (1)

A frequency transform

Widely used as a transform coder for still and moving image and video compression

Even-symmetrical one-dimensional DCTA N-point sequence x(n)

otherwise

0 if

12

1)(

1,,1,0 ],2

)12(cos[)()(

2)(

1,,1,0 ],2

)12(cos[)()()(

1

0

1

0

k ke

NnN

knkXke

Nnx

NkN

knnxkekX

N

n

N

n

=

⎪⎩

⎪⎨⎧

=

−=+

=

−=+

=

∑−

=

=

π

π forward

inverse

Page 19: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-37

Discrete Cosine Transform (2)

Matrix representation

( ) ( ) ( )

( ) ( ) ( )

NNT

T

N

NN

N

N

N

N

N

N

NN

NN

NX

X

X

Nx

x

x

×

−−−−

=⋅

==

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

=

⎥⎥⎥⎥

⎢⎢⎢⎢

=

⎥⎥⎥⎥

⎢⎢⎢⎢

=

IΛΛ

XΛxxΛX

Λ

Xx

2

2 ,

coscoscos

coscoscos

2/12/12/1

)1(

)1(

)0(

,

)1(

)1(

)0(

2

)1)(12(

2

)1(3

2

)1(

2

)12(

2

3

2

πππ

πππ

VLSI DSP 2010 Y.H. Hwang 2-38

Discrete Cosine Transform (3)

Computing complexity analysisN-point 1-D DCT requires N2 MPY and Add

2-D DCT can be obtained by performing row-wise 1-D DCTs followed by column-wise 1-D DCTs

N-point 2-D DCT requires 2 N3 MPY and Add

Fast computing algorithmsDecompose 1-D DCT in a way similar to FFT

Reduce computing complexity from N2 to N·log(N)

Page 20: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-39

Wavelets and Filter Banks (1)

BasicsApplications: speech and image compression

Signals are represented using a set of basis functions (wavelets)

Derived by shifting and scaling wavelets in time

Decomposition of a signal in the time-scale (frequency) plane

Can be regarded as a multi-resolution sub-band filtering

VLSI DSP 2010 Y.H. Hwang 2-40

Wavelets and Filter Banks (2)

1-D DWT

h (n): mother wavelets

hi(2i+1n-k): basis functions (scaled and shifted versions of mother wavelets)

1-D IDWT

∑∞

−∞=

−−−

−∞=

+

−=−=

−≤≤−=

k

mmm

k

iii

miknhkxny

miknhkxny

1for ),2()()(

20for ),2()()(

111

1

basis functions

Wavelet coefficients

∑∑∞

−∞=

−−−

=

−∞=

+

−+

−=

k

mmm

m

i k

iii

knfky

knfkynx

)2()(

)2()()(

111

2

0

1

Page 21: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-41

Wavelets and Filter Banks (3)

Example for decomposition level m = 4computations are similar to convolution operations

digital filter banks with a common input x(k)

−∞=

−∞=

−∞=

−∞=

−=

−=

−=

−=

k

k

k

k

knhkxy

knhkxy

knhkxy

knhkxy

)8()(

)8()(

)4()(

)2()(

33

22

11

00

VLSI DSP 2010 Y.H. Hwang 2-42

Wavelets and Filter Banks (4)

Filter banks

Analysis filter bank for DWT synthesis filter bank for IDWT

Page 22: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-43

Wavelets and Filter Banks (5)

DWT/IDWT with decimatorProcess M (=2m) inputs and generate M outputs periodcally

VLSI DSP 2010 Y.H. Hwang 2-44

Wavelets and Filter Banks (6)

Tree-structured filter bankM wavelet coefficients are computed through log

2M

octave levels

Each octave performs one low-pass g(n) and one high-pass h(n) filtering

High pass filter output wj(n): detail information

Low pass filter output sj(n): coarse information

Computations in octave j

∑∑∑∑

−=−=

−=−=

−−

−−

kj

kjj

kj

kjj

knskhknhksnw

knskgkngksns

)2()()2()()(

)2()()2()()(

11

11

Page 23: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-45

Wavelets and Filter Banks (7)

Block diagram of tree-structured analysis filter bank

Transfer function

)()()()( ),()()()(

)()()( ),()(42

342

2

210

zGzGzGzHzHzGzGzH

zHzGzHzHzH

==

==

VLSI DSP 2010 Y.H. Hwang 2-46

Wavelets and Filter Banks (8)

Block diagram of tree-structured synthesis filter bank

Page 24: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-47

Wavelets and Filter Banks (9)

2-D DWT for imagePerform row-wise 1-D DWT followed by column-wise

1-D DWT

Original image transformed image

VLSI DSP 2010 Y.H. Hwang 2-48

⎥⎦

⎤⎢⎣

⎡−

=

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−−−

=⎥⎦

⎤⎢⎣

⎡−

=

NN

NNN HH

HHH

HH

2

42

1111

1111

1111

1111

2

1,

11

11

2

1

Discrete Hadamard Transform

Walsh-Hadamard transformThe basic functions are binary values with {-1,1}

Defined in recursive form

Matrix size: power of 2

Page 25: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-49

{ }22,,1,0, where

),(),(),(

21

22110 0

2121

1

1

2

2

−⋅⋅⋅∈

−−⋅= ∑∑= =

Nnn

knknwkkunnyn

k

n

k

u( , )

N

N w ( , )N

N

Image Processing Algorithms

2-D convolution

2-D correlation

{ }22,,1,0,1,,2,1, where

),(),(),(

21

22110 0

2121

1

1

2

2

−⋅⋅⋅−+−+−∈

++⋅= ∑∑= =

NNNnn

knknwkkunnyn

k

n

k

VLSI DSP 2010 Y.H. Hwang 2-50

Design criteria for VLSI array algorithm

Maximum parallelismE.g. Schur v.s. Levinson-Durbin algorithms

Maximum pipelinability

Balance among computations communication & memories

Numerical performance & quantization effect

Page 26: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-51

Part 2 Algorithm Representation

VLSI DSP 2010 Y.H. Hwang 2-52

Representations of DSP Algorithms

Mathematical formulationsBehavioral description languages

Applicative languageRepresents a set of equations satisfied by the variables, e.g. Silage

Perspective languageExplicitly specify the order of assignment, e.g. C and other HLLs

Descriptive languageRepresents the structure of a DSP system, e.g. VHDL, Verilog

Graphical representationsBlock diagramsSignal flow graph (SFG)Data flow graph (DFG)Dependence graph (DG)

Page 27: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-53

Block Diagrams (1)

Consists of functional blocks connected with directed edges

Functional block, e.g. Add, Mult

Unit delay element

Directed edge representing the data flow between blocks

Basic blocks

VLSI DSP 2010 Y.H. Hwang 2-54

Block Diagrams (2)

3-tap FIR example

Alternative block diagram with data broadcast

Page 28: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-55

Signal Flow Graph (1)

A collection of nodes and directed edgesNode: computation or taskDirected edge (j,k)

a linear transformation from node j to node kUsually as constant gain multiplier or delay elements

Widely used in digital filter structures

Flow graph reversal (transposition)A transform to obtain equivalent structureApplicable to single-input single output systemReverse the directions of all edgesExchange the input output nodeRetain the edge gain and edge delay

VLSI DSP 2010 Y.H. Hwang 2-56

Signal Flow Graph (2)

SFG of a 3-tap FIR filter

Original SFG

Transposed SFG

Page 29: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-57

Signal Flow Graph (3)

Limitations of transpositioncan be applied to MIMO systems described by symmetric transform matrices

More on SFGApplicable to linear network

Cannot be used to described multi-rate system

VLSI DSP 2010 Y.H. Hwang 2-58

Data Flow Graph (1)

DFGNode: computation (function or subtask)

Directed edge: data path or communication between nodes

Associated edge delay: non-negative

Associated node delay: execution time of each node

Block diagram Conventional DFG Synchronous DFG

add

mpy

Page 30: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-59

Data Flow Graph (2)

Applications: high level synthesis

Firing rulesA node can fire whenever all the input data are available

Concurrency: multiple nodes can be fired simultaneously

Data driven (implicit) scheduling

Precedence constraintIntra-iteration: imposed by edge with no delay

Inter-iteration: imposed by edge with delay

fine-grain (atomic) v.s. coarse grain DFG

VLSI DSP 2010 Y.H. Hwang 2-60

Data Flow Graph (3)

3-tap FIR filter example

Direct form

Transpose form

Page 31: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-61

Data Flow Graph (4)

Synchronous DFGNumber of data samples produced or consumed by each node is specified a priori

Single rate system

Multi-rate system: different nodes working on different frequencies

Multi-rate system can be represented by a single rate system via unfolding (unrolling)

VLSI DSP 2010 Y.H. Hwang 2-62

Part 3 Part 3 Iteration BoundIteration Bound

Page 32: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-63

Introduction

DSP algorithms often contain feedback loopsImpose an inherent lower bound on the achievable iteration or sample period

Iteration bound

Impossible to achieve an iteration period less than the iteration bound even with infinite HW

Iteration kIteration k-1

Iteration k+1Iteration k+2

t

Iteration period

VLSI DSP 2010 Y.H. Hwang 2-64

Data Flow Graph Representations

For n = 0 to ∞y(n) = ay(n-1) + x(n)

Iteration – execution of each DFG node oncePrecedence constraints

Intra-iteration – no delay on edgeInter-iteration – at least one delay on edge

Execution time of a

node

Inter-iteration

Intra-iteration

Critical pathA→B

Page 33: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-65

Critical Path

Critical path of a DFGThe path with the longest computation time among all paths containing zero delaysThe minimum computation time for one iteration of the DFG6→3→2→15→3→2→1Iteration period = 5 u.t.

Iteration boundRecursive DFG has a lowerbound on the shortestiteration period

VLSI DSP 2010 Y.H. Hwang 2-66

Loop bound and iteration bound (1)

Loop boundMinimum time to execute one loop in the DFG

tl / wl: tl = loop computation time, wl = number of delays in the loop

(a) loop bound = (4+2)/2 = 3

(b) loop bound 1 = (4+2)/2 = 3

(b) loop bound 2 = (2+4+5)/1 = 11

Page 34: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-67

Loop bound and iteration bound (2)

In (a), two independent sets of computing threadsTwo iterations in every 6 u.t. ⇒ iteration period = 3 u.t.A0→B0 ⇒ A2→B2 ⇒ A4→B4 ⇒ A6→…A1→B1 ⇒ A3→B3 ⇒ A5→B5 ⇒ A7→…

In (b)Loop 1: A→B→ALoop 2: A→B→C→A (critical loop)

VLSI DSP 2010 Y.H. Hwang 2-68

Loop bound and iteration bound (3)

Loop bound of the critical loop ⇒ iteration bound of the DSP algorithm

Algorithms to find T∞Longest path matrix algorithm

Minimum cycle mean algorithm

Negative cycle detection algorithm

u.t. 111

11,

2

6max

max

=⎭⎬⎫

⎩⎨⎧=

⎭⎬⎫

⎩⎨⎧

=

∈∞

∈∞

Ll

l

l

Ll

T

w

tT

Page 35: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-69

Longest path matrix (LPM) algorithm (1)

L(m), m=1,2,…,dL(m) : series of matrices

d : number of delays in the DFG

: the longest computation time of all paths from delay di to dj that pass through exactly m-1 delays, = -1 if no such path exists

)(,mjil

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−−

−−−−−

=

1115

0115

1014

1101

)1(L

VLSI DSP 2010 Y.H. Hwang 2-70

Longest path matrix (LPM) algorithm (2)

Recursive computation of L(m)

( ))(,

)1(,

)1(, ,1max m

jkkiKk

mji lll +−=

+

1 and 1 and ],1[ )(,

)1(, −≠−≠⊂ m

jkki lldK

( ) 5)50,1max(,1max )1(1,

)1(,2

}3{

)2(1,2 =+−=+−=

∈kk

klll

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−−

−−−

=

1151

1155

0145

1014

)2(L

Page 36: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-71

Longest path matrix (LPM) algorithm (3)

Computing L(3) using L(1) and L(2)

Iteration bound

( ) 5)05,1max(,1max )2(3,

)1(,3

}1{

)3(3,3 =+−=+−=

∈kk

klll

⎥⎥⎥⎥

⎢⎢⎢⎢

=

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−

=

51910

55910

4589

1458

1519

1559

1458

0145

)4()3( LL

24

5,

4

5,

4

8,

4

8,

3

5,

3

5,

3

5,

2

4,

2

4max

max)(

,

},,2,1{,

=⎭⎬⎫

⎩⎨⎧=

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

=∈

∞ m

lT

mii

dmi

VLSI DSP 2010 Y.H. Hwang 2-72

Longest path matrix (LPM) algorithm (4)

the longest computation time of all loops with m delays and containing delay element di

Another example

82

16,

2

12,

1

8,

1

4max

1616

1212

88

44

)2(

)1(

=⎭⎬⎫

⎩⎨⎧=

⎥⎦

⎤⎢⎣

⎡=

⎥⎦

⎤⎢⎣

⎡=

∞T

L

L

)(,miil

Page 37: Chapter 2 Digital Signal Processing Algorithms & Their ...

VLSI DSP 2010 Y.H. Hwang 2-73

Longest path matrix (LPM) algorithm (5)

Computing complexityComputing L(k+1) from L(1) and L(k) is O(d3)

There are d2 elements in L(k+1) and each element takes O(d)

Computing L(d) from L(1) is O(d4)

Computing L(1) is O(de), e is the number of edges in the DFG

Total computing complexity O(d4+de)