Top Banner
SPAlgo2Arch 1 Mapping Signal Processing Mapping Signal Processing Mapping Signal Processing Mapping Signal Processing Algorithms to Architecture Algorithms to Architecture Sumam David S Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India [email protected] Objectives Objectives At the end of the lecture have gained an overview of implementation aspects of signal processing algorithms have gained an understanding of the relationship between the parameters influencing the choice of implementation appreciate the approaches in efficient mapping signal processing algorithms to architecture be familiar with few transformations that help the designer to SU2010-CS Algo2Arch 2 be familiar with few transformations that help the designer to develop different solutions for a given signal processing algorithm
41

Mapping Signal ProcessingMapping Signal Processing ...

Feb 26, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 1

Mapping Signal ProcessingMapping Signal ProcessingMapping Signal Processing Mapping Signal Processing Algorithms to ArchitectureAlgorithms to Architecture

Sumam David SSumam David SHead, Dept of Electronics & CommunicationNational Institute of Technology Karnataka, Surathkal, [email protected]

ObjectivesObjectivesAt the end of the lecture

have gained an overview of implementation aspects of signal processing algorithmshave gained an understanding of the relationship between the parameters influencing the choice of implementationappreciate the approaches in efficient mapping signal processing algorithms to architecturebe familiar with few transformations that help the designer to

SU2010-CS Algo2Arch 2

be familiar with few transformations that help the designer to develop different solutions for a given signal processing algorithm

Page 2: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 2

OrganisationOrganisationWhat is SP ?Applications of SPKey algorithmsImplementation options & trade-offsKey approaches in mapping algorithms to architecturearchitecture

SU2010-CS Algo2Arch 3

NITK NITK SurathkalSurathkal, India, India• 12°N, 74°E• West coast of India• Arabian Sea• Mangalore

SU2010-CS Algo2Arch 4

Page 3: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 3

SP SP –– where do u find it?where do u find it?

High performance

SU2010-CS Algo2Arch 5

Typical DSP systemTypical DSP system

SU2010-CS Algo2Arch 6

Page 4: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 4

Examples of typical signalsExamples of typical signalsSpeech and music signals p g

Represent air pressure as a function of time at a point in space

SU2010-CS Algo2Arch 7

Biomedical signalsBiomedical signals

SU2010-CS Algo2Arch 8

Page 5: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 5

EEGEEG

SU2010-CS Algo2Arch 9

Seismic signalSeismic signal

SU2010-CS Algo2Arch 10

Page 6: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 6

ImageImage

I(x,y)

SU2010-CS Algo2Arch 11

VideoVideo

SU2010-CS Algo2Arch 12

Page 7: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 7

Digital CameraDigital CameraImage Processing Algorithmsg g g

Bad pixel detection and maskingColor interpolationColor balancingContrast enhancement

SU2010-CS Algo2Arch 13

False color detection and maskingImage and video compression

Bad pixel detection & maskingBad pixel detection & masking

SU2010-CS Algo2Arch 14

Page 8: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 8

Color interpolation & balancingColor interpolation & balancing

SU2010-CS Algo2Arch 15

Digital Sound SynthesisDigital Sound SynthesisGuitar with nylon stringsy gMarimbaTenor saxophone

SU2010-CS Algo2Arch 16

Page 9: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 9

Signal CompressionSignal CompressionOriginal musicg

Audio Format: PCM 16.000 kHz, 16 Bit(Data size 66206 bytes)

Compressed musicAudio Format: GSM 6.10, 22.05 kHz

SU2010-CS Algo2Arch 17

(Data size 9295 bytes)

Image CompressionImage Compression

SU2010-CS Algo2Arch 18

8 bpp 0.5 bpp

Page 10: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 10

Signal EnhancementSignal Enhancement

Noisy speech

SU2010-CS Algo2Arch 19

Noise removed

Contrast enhancementContrast enhancement

SU2010-CS Algo2Arch 20

Page 11: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 11

Noise removalNoise removal

SU2010-CS Algo2Arch 21

Signal ProcessingSignal ProcessingWorks on discrete samples of a pcontinuous time signalReal time requirementData drivenProgrammable or custom DSPs

SU2010-CS Algo2Arch 22

g

Page 12: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 12

Implementation choiceImplementation choiceDepends on p

Sampling rateThroughputPower - energyAreaWordlength – precisionFlexibility

SU2010-CS Algo2Arch 23

Time marketVolume

Examples of DSP PrimitivesExamples of DSP PrimitivesConvolutionDigital Filters

Finite Impulse response (FIR)Infinite Impulse Response (IIR)

CorrelationDiscrete Fourier Transform / Fast Fourier Transform (FFT)(FFT)Discrete Cosine Transform (DCT)Least Mean Square (LMS)Applications often comprised of many primitives

SU2010-CS Algo2Arch 24

Page 13: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 13

Two basic DSP structuresTwo basic DSP structuresb0

Finite impulse Response (FIR) Infinite impulse Response (IIR)

b1-a1

-a2 b2

SU2010-CS Algo2Arch 25

Finite impulse Response (FIR)No feedbackOrder -4

Infinite impulse Response (IIR)FeedbackOrder - 2

Different Different applnsapplns different demandsdifferent demands

SU2010-CS Algo2Arch 26

Page 14: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 14

Standard processors or special purpose ?Standard processors or special purpose ?

SU2010-CS Algo2Arch 27

Architectural optionsArchitectural optionsOTS (Off The Shelf) processors

P bl i DSPProgrammable microprocessors or DSPBased on generic computational units, for DSPs usually MACPrefabbed or IP cores

Time-multiplexed application specific processorsSeveral algorithmic operations performed on same hardware unitTrades reduced HW for longer computation time

Hardware mapped architecturesOne (or more) hardware unit per algorithmic operationHigh hardware cost and high throughput

SU2010-CS Algo2Arch 28

Page 15: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 15

Hardware optionsHardware options

SU2010-CS Algo2Arch 29

FPGA - Field Programmable gate arraysASIC – Application Specific Integrated Circuit ASIC

Key design issue today Key design issue today -- energyenergy

Utilising the computational timeclock frequencysupply voltageSleep modes

SU2010-CS Algo2Arch 30

Page 16: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 16

Algorithm 2 ArchitectureAlgorithm 2 ArchitectureWhich structure gives optimal performance, energy and area?How to get from a signal processing algorithm to an EFFICIENT implementation using

Different numbering systemsPipeliningParallelismRetiming, SchedulingStrength reduction, i.e. complexity of operations.etc, etc,...

in a structured way!

SU2010-CS Algo2Arch 31

An exampleAn exampleDatapath and control pathp p

SU2010-CS Algo2Arch 32

Page 17: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 17

DSP basic operationsDSP basic operations

∑ ∑M N

b1

FIRIIR

∑−

=

−⋅=1

0

/2N

n

Nnmjnm exX π

∑ ∑= =

−− ⋅+⋅=k k

knkknkn ybxay0 1∑

=−⋅=

1

0

N

kknkn xhy

IIR

TransformsFFT,DCT

SU2010-CS Algo2Arch 33

Decomposition (SVD, LU, QR) - Matrix operations

Arithmetic operations ? x, +, shiftsNon terminating – data flow , sampling rate

Number representationNumber representationBinary number systemy y

Integers eg. 0110 6 (decimal)Two’s complement 1010 -6 (decimal) (4 bit)

Fractional valuesDivide by 2n-1 range [-1, +1)

n bits : [2n-1 , 2n-1-1]b ts [ , ]8 bits : ?

Fractional : -1 , +127/128Resolution : 1/256 of range :[-1. +1)

SU2010-CS Algo2Arch 34

Page 18: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 18

Floating point numbersFloating point numbers32 bits S(1) Exponent (8) Mantissa (23)

Number magnitude: (1-mantissa)x2exponent-127

Smallest +ve: (1.00….0) x 2-126

Largest +ve: (1.11….1) x 2127 ~ 2128

Mantissa – determines resolutionExponent determines dynamic range

S(1) Exponent (8) Mantissa (23)

Exponent – determines dynamic range

Much wider dynamic range at the cost of energy, area, computation timeEnergy efficient implementation fixed point used

SU2010-CS Algo2Arch 35

Finite word length effectsFinite word length effectsCoefficient quantisation 0

original - solid line, quantized - dashed line

qSignal quantisation

Round off noiseLimit cycles

Scaling 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-80

-70

-60

-50

-40

-30

-20

-10

ω/π

Gai

n, d

B

Adjust signal to fit the hardware

Saturation arithmetic

SU2010-CS Algo2Arch 36

Page 19: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 19

Implementation FlowImplementation FlowHigh level description - Specifications

EquationsMatlab, C/C++

ArchitectureBuilding blocks, speed and areaBlock diagramOptimisations: #bits, # coefficientsHardware Description languages – Verilog, VHDL

Compile (Synthesize)Mapping to gatesPlace & Route

SU2010-CS Algo2Arch 37

Implementation of FIR filtersImplementation of FIR filters

∑3

h∑=

−⋅=0k

knkn xhyAlgorithm non-terminating

SU2010-CS Algo2Arch 38

One execution of the loop – one iterationIteration period = time for one iteration

Page 20: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 20

Direct form 4 tap Fir filterDirect form 4 tap Fir filter

SU2010-CS Algo2Arch 39

FIR with adder treeFIR with adder tree

SU2010-CS Algo2Arch 40

Page 21: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 21

Critical pathCritical pathCritical path Sampling rate = no of Critical path

between delaysinput to delaydelay to outputinput to output

samples processed/secLatency = time difference between output and corresponding input

Combinatorial logic = gate delaysS fSequential logic = no of clock cycles

SU2010-CS Algo2Arch 41

Signal Flow GraphSignal Flow GraphNodes: represent computations and/or taskDirected edge (j, k): denotes a linear transformation from the input signal at node j to the output signal at node k

in digital usually limited to delays and constant multipliers

Source = no entering edgeSink = only entering edges

SU2010-CS Algo2Arch 42

Page 22: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 22

TranspositionTransposition

SU2010-CS Algo2Arch 43

Reverse the direction of all edgesExchange input and output

Transposed FIR filterTransposed FIR filter

SU2010-CS Algo2Arch 44

Page 23: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 23

Fixed point implementationFixed point implementation

SU2010-CS Algo2Arch 45

PipeliningPipelining

SU2010-CS Algo2Arch 46

Pipelining increases latencyMore registers can bring down Critical Path delay to max(TM,TA)

Page 24: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 24

Data Flow GraphData Flow GraphDFGs capture the data-driven property of DSP algorithmD t Fl G h d t t ti ( f ti )Data-Flow Graph – nodes represent computations (or functions)

directed edges represent data flow with non-negative number of delaysa node can execute whenever all its input data are available.Each edge describes a precedence constraint between two nodes in DFG:

Intra-iteration precedence constraint: if the edge has zero delaysInter-iteration precedence constraint: if the edge has one or more delays

SU2010-CS Algo2Arch 47

DFGDFG

SU2010-CS Algo2Arch 48

Page 25: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 25

Dependence GraphDependence GraphLike DFG a combination of nodes (tasks) with ( )directed edges which indicate precedence constraintsUnlike DFG nodes are not reused. Instead in each iteration a new node is createdDG has an explicit time axis, hence delays are

SU2010-CS Algo2Arch 49

p , ynot indicatedExtremely modular regular arrays

FIR Filter FIR Filter -- DGDGy(n) = h0 x(n) +h1x(n-1) +h2x(n-2)

o

o

o

o oh2

h1

h

y

h

x'

h'

y'

SU2010-CS Algo2Arch 50

h0

y(0) y(1) y(2)

x(0) x(1) x(2)

xx' =x, h' = h,

y' = y + hxn

Page 26: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 26

Recurrence equationsRecurrence equations

SU2010-CS Algo2Arch 51

Time for one complete cycle = TM+TA – Iteration periodSampling time TS ≥ TM+TA

Precedence constraintsPrecedence constraints

SU2010-CS Algo2Arch 52

Page 27: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 27

Loop BoundLoop Boundloopon timeexecution TotalboundLoop =

loopin delaysTotalboundLoop

SU2010-CS Algo2Arch 53

Iteration BoundIteration Bound

I i B d M i l b dIteration Bound = Maximum loop bound

SU2010-CS Algo2Arch 54

Delay free loops cannot be computed

Page 28: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 28

Iteration boundIteration boundClock period is lower bounded by the critical path computation timeSampling period is lower bounded by the iteration bound regardless of computational resources availableIteration bound

Depends on technologyDerivation of graph from equationsg p q

SU2010-CS Algo2Arch 55

Pipelining and ParallelismPipelining and ParallelismPipelining

Splits the logic path by introducing pipeline registersleads to a reduction of the critical path but introduce latencyEither increases the clock speed (or sampling speed) or reduces the power consumption at same speed in a DSP system

Parallel ProcessingMultiple outputs are computed in parallel in a clock periodMultiple outputs are computed in parallel in a clock periodThe effective sampling speed is increased by the level of parallelismCan also be used to reduce the power consumption

SU2010-CS Algo2Arch 56

Page 29: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 29

PipeliningPipelining

SU2010-CS Algo2Arch 57

Parallel processingParallel processing

SU2010-CS Algo2Arch 58

Page 30: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 30

PipeliningPipeliningCutset - A set of edges that if removed, or cut, results in two disjoint

hgraphs.Feedforward Cutset - if data is moved in forward direction on all cutsetsPipelining – Adding delays at feedforward cutsetsReduces critical path delayIntroduces latency, no change in functionality

SU2010-CS Algo2Arch 59

PipeliningPipelining

SU2010-CS Algo2Arch 60

Tnode = 1CPold = 4CPnew= 2

Page 31: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 31

Parallel FIR filterParallel FIR filter

SISO FIRSISO FIR )2()1()()( 210 −+−+= nxbnxbnxbny

MIMO FIR

SU2010-CS Algo2Arch 61

Parallel 3 tap FIRParallel 3 tap FIR

SU2010-CS Algo2Arch 62

Page 32: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 32

Pipelining & ParallelismPipelining & ParallelismCritical path unchanged in parallel systemp g p y

Tclock ≥ TM + 2TA

Titer = Ts = Tclock/LParallel system Tclock ≠ Ts

Pipeline system Tclock = TsB bi i ll l i (bl k i L) dBy combining parallel processing (block size: L) and pipelining (pipelining stage: M), the sample period can be reduced toTiter = Ts = Tclock/LM

SU2010-CS Algo2Arch 63

RetimingRetiming

M i d l i hMoving delays in the system

SU2010-CS Algo2Arch 64

• Modifies Critical path delay, no of registers• Retiming does not change

- delay in a loop- iteration bound

Page 33: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 33

CutsetCutset retimingretiming

SU2010-CS Algo2Arch 65

Node retimingNode retiming

Cut set around a nodeCut set around a node

SU2010-CS Algo2Arch 66

Page 34: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 34

Algorithmic Strength reductionAlgorithmic Strength reductionReduces no of strong operationsg p

SU2010-CS Algo2Arch 67

Fixed coefficient multiplicationFixed coefficient multiplication

SU2010-CS Algo2Arch 68

Page 35: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 35

Complex multiplicationComplex multiplication(a+jb)(x+jy)( j )( jy)Can u do it with 3 multipliers and few extra adders?

SU2010-CS Algo2Arch 69

Architectural synthesisArchitectural synthesisGiven a DFG, resources,

AllocationAllocate enough resources to solve the problem

BindingWhich operation happens on which resource

SchedulingSchedulingWhen should each operation take place

SU2010-CS Algo2Arch 70

Page 36: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 36

6 tap FIR Filter6 tap FIR Filter

Cycle Operations Resources1 m1, m2,m3,m4,m5,m6 6 multi2 a1 1 adder

Resource UtilisationAdder – 5/6 ~83%M lti li 6/36 16 67%

SU2010-CS Algo2Arch 71

2 a1 1 adder3 a2 1 adder4 a3 1 adder`5 a4 1 adder6 a5 1 adder

Multiplier – 6/36 ~ 16.67%m1, m2 have to be in cycle 1

Alternate Alternate –– 2 M +1 A2 M +1 A

Cycle Operations Resources1 m1, m2 2 multi2 a1,m3,m4 1 adder+1multi

SU2010-CS Algo2Arch 72

2 a1,m3,m4 1 adder 1multi3 a2,m5,m6 1 adder+1 multi4 a3 1 adder`5 a4 1 adder6 a5 1 adder

Adder – 5/6 ~83%Multiplier – 6/12 = 50%

Page 37: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 37

1M, 1A1M, 1A

Cycle Operations Resources1 m1 1 multi2 m2 1 adder+1multi3 m3, a1 1 adder+1 multi

SU2010-CS Algo2Arch 73

3 m3, a1 1 adder 1 multi4 m4, a2 1 adder+1 multi`5 m5, a3 1 adder+1multi6 m6,a4 1 adder+1multi7 a5 1 adder

Adder – 5/7 ~84%Multiplier – 5/7 = 73%

SchedulingSchedulingASAP or ALAPMobility =ALAPtime – ASAPtime for each functionTime bound = Total no of opns of a type / # resourcesSystem bound = max over all types of resourcesIf resource constraint

(1M 1A) max(6 5) = 6 time units(1M,1A) max(6,5) = 6 time unitsPipelining / retiming

6M, 5 A 2 cycles

SU2010-CS Algo2Arch 74

Page 38: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 38

IIR filterIIR filter

SU2010-CS Algo2Arch 75

4 cyclesIteration Bound = 3

IIR filterIIR filter

SU2010-CS Algo2Arch 76

3 cycles2A+3M

Page 39: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 39

Embedded System Design DilemmaEmbedded System Design DilemmaASICASSP

In-System

COMPUTEPERFORMANCE

FPGA?

In System ConfigurableProcessors

SU2010-CS Algo2Arch 77

SYSTEM TIME-TO-MARKET

GPPDSP

MEDIASECURITY

ETC.

SW HW

Combining The Best of HW and SWCombining The Best of HW and SW

Registersm) Registers

Datapath

Con

trol

FSM Storage

Mem

ory

(Pro

gram

SU2010-CS Algo2Arch 78

control compute

Page 40: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 40

Software Configurable Processor ConceptSoftware Configurable Processor Concept

Registers

Datapath

Con

trol

FSM Storage

Mem

ory

(Pro

gram

)

STRETCHISEF

SU2010-CS Algo2Arch 79

DatapathM

ConclusionConclusionOverview of implementation signal processing algorithmsRelationship between parameters influencing the choice of implementationLooked at some approaches to efficiently map signal processing algorithms to architecture

Hopefully a thought on what happens atHopefully a thought on what happens at hardware level and how to optimisealgorithms to suit the architecture

SU2010-CS Algo2Arch 80

Page 41: Mapping Signal ProcessingMapping Signal Processing ...

SPAlgo2Arch 41

ReferencesReferencesK.K. Parhi, VLSI Digital Signal Processing , g g gSystems, Wiley 1999

SU2010-CS Algo2Arch 81

18 March 2010 Embedded Processor Architecture 82

Thank you