Chapter 9 Digital Signal Processing with Xilinx FPGAs Yin-Tsung Hwang The materials are largely based on the Xilinx Seminar Notes presented by Bruce Newgard 2 Configurable Hardware DSP Solutions Introduction to digital filters Distributed Arithmetic (DA) DA FIR filter example 8 Tap Slice High speed FIR filter Low speed FIR filter IIR bi-quad filter correlator Summary
36
Embed
Chapter 9 Digital Signal Processing with Xilinx FPGAssocdsp.ee.nchu.edu.tw/class/download/vlsi_dsp_102/night...Chapter 9 Digital Signal Processing with Xilinx FPGAs Yin-Tsung Hwang
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Chapter 9Digital Signal Processing
with Xilinx FPGAs
Yin-Tsung Hwang
The materials are largely based on the Xilinx Seminar Notes presented by Bruce Newgard
2
Configurable Hardware DSP Solutions Introduction to digital filters Distributed Arithmetic (DA) DA FIR filter example 8 Tap Slice High speed FIR filter Low speed FIR filter IIR bi-quad filter correlator Summary
Digital Filter Basics
4
Introduction to Digital Filters Key component in many DSP applications
channel equalization, echo cancellation
digital vs analog filters programmability better frequency response
ck: filter coefficients (constants) x(n): input at time instance n y(n): output at time instance n M: filter tap order
M could be as large as 1000
a series of multiply and accumulate operations No. of MAC operations /sec = sampling frequency
filter tap order
)1()1()(
)()(
110
1
0
Mnxcnxcnxc
knxcny
M
M
kk
6
High Pass Filter Example
0 10 20 30 40 50 60 70-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
n
|h(n
)|
coefficients
-2 -1.5 -1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Real part
Ima
gin
ary
pa
rt
0 1000 2000 3000 4000 5000 6000 7000 8000 9000-60
-50
-40
-30
-20
-10
0
frequency
de
cib
els
Magnitude Response in dB
Ws=1500Hz Wp=4000HzSampling freq. : 5KHzTap order : 67filter type :HPF filter(FIR)
7
Low Pass Filter Example
Wp=1500Hz Ws=3000HzSample_freq=5KHzTap order=67filter type : LPF filter(FIR)
0 10 20 30 40 50 60 70-0.02
0
0.02
0.04
0.06
0.08
0.1
n
|h(n
)|coefficient
-2 -1.5 -1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Real part
Ima
gin
ary
pa
rt
0 0 .5 1 1 .5 2 2 .5
x 104
-140
-120
-100
-80
-60
-40
-20
0
20
frequency
de
cib
els
Magn itude Response in dB
8
Basic FIR Filter Block Diagram
9
FIR Implementation Using Programmable DSP Processor
Software solution
1 parallel multiplier, accumulator
Time sharing through micro-coding
relative low sample rate
multiple chip solution
no migration path
complex real time programmingFor each sample data word
For each tapMultiply c(i) times x(i)Add result to accumulator
Distributed Arithmetic Basics
11
2’s Complement Multiplication
12
A Series of Multiply & Add
×
+
+
×
+
+
×
+
coefficient Input sampleWeighted
partial product
multiply result
+final result
Parallel multiplier
accumulator
13
Distributed Arithmetic Approach (1)
+ + + +
LSB MSBLSB+1
+final result
partial sumCan be implementedby a look up table
Accumulator + shifter
14
Distributed Arithmetic Approach (2) x x x x3,3 0,0 3,2 0,0 3,1 0,0 3,0 0,0a a a a
Sum Sum Sum Sum SumSum
x x x x2,3 1,0 2,2 1,0 2,1 1,0 2,0 1,0a a a a x x x x1,3 2,0 2,2 2,0 1,1 2,0 1,0 2,0a a a a x x x x0,3 3,0 0,2 3,0 3,1 0,0 3,0 0,0a a a a
x x x x3,3 0,1 3,2 0,1 3,1 0,1 3,0 0,1a a a a x x x x2,3 1,1 2,2 1,1 2,1 1,1 2,0 1,1a a a a x x x x1,3 2,1 1,2 2,1 1,1 2,1 1,0 2,1a a a a x x x x0,3 3,1 0,2 3,1 0,1 3,1 0,0 3,1a a a a
Sum Sum Sum Sum SumSum x x x x3,3 0,2 3,2 0,2 3,1 0,2 3,0 0,2a a a a x x x x2,3 1,2 2,2 1,2 2,1 1,2 2,0 1,2a a a a x x x x1,3 2,2 1,2 2,2 1,1 2,2 1,0 2,2a a a a x x x x0,3 3,2 0,2 3,2 0,1 3,2 0,0 3,2a a a a
x x x x3,3 0,3 3,2 0,3 3,1 0,3 3,0 0,3a a a a
Sum Sum Sum Sum SumSum
x x x x2,3 1,3 2,2 1,3 2,1 1,3 2,0 1,3a a a a x x x x1,3 2,3 1,2 2,3 1,1 2,3 1,0 2,3a a a a x x x x0,3 3,3 0,2 3,3 0,1 3,3 0,0 3,3a a a a
Sum Sum Sum Sum SumSum
SumSum Sum Sum SumSumSumSum Sum Sum SumSum
SumSum Sum Sum SumSumSumSum Sum Sum SumSum
+1
+1
+1
+1
1
1
1
1
P 0P 1P 2P 3P 4P 5P 6P 7P 8P 9
Need a 4-operandparallel adder
Need a scalingaccumulator
15
DA One-Tap FIR Filter Reduces to multiply a variable x(n) with a constant c0
16
DA Two-Tap FIR Filter
17
DA Three-Tap FIR Filter
Look up table implementation can be both faster and area efficient than a multi-operand adder
The Development of a Distributed Arithmetic FIR Filter
10 bit 10 tap XC4000E Family example
20
DA FIR Filter Design in XC 4000E
10-Tap 10-bit example
• N clocks per sample word• Fast clock• No multiplier required• Embedded hardware solution• LUT holds coefficients & Mult.
21
LUT Size in DA FIR Design• Look up table scales exponentially• 10-tap 10-bit needs 210×10 bits• need to reduce the LUT size• take advantages of linear phasesymmetrical FIR filter
22
10-Tap 10-Bit Symmetrical FIR Filter
23
Look Up Table Implementation
Holds all partial products
LUT is as wide as coefficient
use MEMGEN to generate LUT
32×10 memory
Look UpTable
A0
A1
A2
A3
A4
320 bits
DATA10
24
Serial Time Skew Buffer
Sample data word size = Nfilter tap size = k
• one N-bit shift register per tap• use XC4000E RAM to build
shift register• one 16-bit shift register per
1/2 CLB
Using FFs10-bit 10-tap50 CLBs
Using FFs10-bit 10-tap50 CLBs
Using RAMs10-bit 10-tap10 CLBs
Using RAMs10-bit 10-tap10 CLBs
Shift register implemented in RAM
25
Bit Serial Adder
Distributed arithmetic lookup table
26
27
1‘s Complementer MSB has negative
weighting inverts data on the last
cycle 2 bits per CLB
28
Scaling Accumulator
Adds data to 1/2*(SUMOUT)
2 bits per CLB
needs N+1 bits
double precision with an extra shift register
can use LogiBlox for RPM
10-bit 10-tap linear phase FIR filter
29
30
Implementation Block Diagram
Total of 44 CLBs: Fits in a 4003E (with extra 56 CLBs for system use) about 1,300 equivalent gates: little interconnect between blocks
31
Performance No. of 10-tap 10-bit sym. FIR per 4000E device
XC4000part
4003E 4005E 4006E 4008E 4010E 4013E 4020E 4025E
Number ofinstances
2 4 5 7 9 11 15 22
FIR 10B10T macro can be clocked at 70 MHz
10 bit word requires 11 clocks 10 bit sample word rate is 6.4
MHz
word sizesample rate
6 8 10 12 14 16
10.0 7.8 6.4 5.4 4.7 4.1
bitsMHz
32
Double Rate DA FIR Filter (1)
Process 2 bits per clock # of clocks = (N/2)+1
33
Double Rate DA FIR Filter (2)
two taps require 4-input LUT without symmetry
four taps require 4-input LUT with symmetrical FIR
time skew buffer is twice as many CLBs
twice the data word sample rate
both LUTs are the same
Designing large multi-tap filter Xilinx 8-tap FIR filter SLICE building blocks
34
Issue: LUT scales exponentially
35
32-tap FIR filter using 8-tap slices
36
8-tap FIR filter slice building blocks
37
8-tap FIR filter slice
38
8-tap FIR filter slice
39
Very high speed sampling rates
Multiple parallel multipliers
40
Multiply variable with a constant
41
Multiply variable with a constant (1)
42
Multiply variable with a constant (2)
43
High speed parallel FIR filter
44
Fully parallel distributed arithmetic
45
8-tap parallel DA slice (1)
46
8-tap parallel DA slice (2) Support sampling rates 50 ~ 70 Msps Data and coefficient sizes are independent of each
other 8-bit data, 8-bit coefficient require 122 CLBs per 8-