An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst
An FPGA Based Adaptive Viterbi
Decoder
Sriram Swaminathan
Russell Tessier
Department of ECEUniversity of Massachusetts
Amherst
04/18/23
Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture and Implementation Issues Results Related Work Summary and Future Work
04/18/23
Introduction A Digital Data Communication System
ChannelEncoder
SinkSource
Decoder
SourceEncoder
ChannelDecoder
Source
Noise
information Bitstream Bitstream with redundancy
Bitstream
Modulator
DeModulator
Convolutionalencoder
Viterbi
04/18/23
Goals Implement Adaptive Viterbi Algorithm
on hardware Constraints
Data rate (or throughput) - 20 Kbps Probability of Error or Bit Error Rate (BER) <
10-5
# of errors / Length of Sequence
Minimize Design-time area
04/18/23
Convolutional Encoder Accepts information bits as a continuous stream Operates on the current b-bit input, where b
ranges from 1 to 6 and some number of immediately preceding b-bit inputs to produce V output bits, V > b
FF FF
+
+
1
0 1
0
0
b =1, V =2
04/18/23
Definitions Constraint Length
Number of successive b-bit groups of information bits for each encoding operation
Denoted by K Code Rate (or) Rate
b/V Typical values
K : 7 Rate : 1/2, 1/3
04/18/23
The Viterbi Algorithm Finds a bit-sequence in the set of all
possible transmitted bit-sequences that most closely resembles the received data.
Maximum likelihood algorithm Each bit received by decoder associated
with a measure of correctness. Practical for short constraint length
convolutional codes
04/18/23
00
10
11
01
0/00
1/11
1/01
1/10
0/01
0/11
1/00
0/10
State diagram State
Encoder memory Branch
k/ij,where i and j
representthe output bitsassociated with input bit k
04/18/23
Trellis Diagram
00
01
10
11
00 00 00
11 1111
11
10
01
10
01
00
10
T=0 T=1 T=2 T=3
ENC IN : 0 1 0ENC OUT : 00 11 10RECEIVED: 00 11 11
Accumulated metric
2+2,3+0 : 3
0+1,3+1 : 1
2+0,3+1 : 2
0+1,3+1 : 1
0 0
3
2
2
3 1
3
0 2
1
K = 3Rate ½
Total number of states = 2K-1
04/18/23
Adaptive Viterbi Algorithm
Motivation Extremely large memory and logic for Viterbi
Algorithm Fewer number of paths retained Reduced memory and computation
Definitions Path – Bit sequence Path metric or cost – Accumulated error metric of a
path Survivor – Path which is retained for the
subsequent time step
04/18/23
Adaptive Viterbi AlgorithmCriterion for path survival
1. A threshold T is introduced such that a path is retained if and only if current path metric is less than dm+T, where dm is the minimum cost among all survivors of the previous time step.
2. The total number of survivors per time step is limited to a critical number called Nmax selected by user.
Only best Nmax paths have to be retained at any
time.
04/18/23
Parameters in the algorithm Constraint length K Truncation length, TL
Rate R Threshold T Maximum # of paths per time Nmax
04/18/23
Influence of Threshold T and Nmax
Threshold T Smaller T, low average # of survivors, increased BER Larger T, high average # of survivors, reduced BER
Nmax Smaller Nmax
Possibility of discarding the best path => high BER Smaller area
Larger Nmax
Reduced BER Larger area
Selection of Nmax and T crucial
04/18/23
Variation of BER with T and Nmax
for K = 9 & 14
K = 9, SNR = 3.1 db, TL=45 K = 14, SNR = 2.5 db, TL=70
T=24Nmax= 41
T=18Nmax= 9
04/18/23
Optimal values of Nmax, T and TL for different K’s
K TL Nmax T 4 20 4 145 25 7 146 30 8 187 35 8 178 40 8 179 45 9 1810 50 21 2011 55 25 2312 60 25 2314 70 41 24
04/18/23
Simplified View of Adaptive Viterbi Decoder
Branch metricgenerator
AddCompare
Select
SurvivorMemory
Logic for di < dm + T
Symbolsfrom channel
Branchmetrics
DecisionBits
Decodedoutput
04/18/23
Survivor Memory
Truncation length
Nmax
Store all possible bit-sequences(paths) before making a decision
Size of memory for Viterbi :
Rows : Nmax
Columns : Truncation Length - (3-5) * K
Two schemes Traceback
Large Latency, small area, low power
Register Exchange Fast, Large area,
large power
04/18/23
Practical Considerations Serial Implementation
Same ACS repeatedly used for all states Small area, Inexpensive Slow, Low throughput (data rate)
Parallel Implementation Each State has its own ACS (2K-1 ACS) Fast, High throughput (data rate) Large area, bottleneck for large K values
04/18/23
Architecture (contd.)
Add
Add
b1
sum1
b2
sum2
di < dm + T
di < dm + T
Countpaths
Count < Nmax
T = T-2
yes
no
Updatememory
yes
yes
Elimination of sorting
04/18/23
FPGA Implementation FPGA can exploit the parallelism Dynamic reconfiguration for
performance enhancement Implementation platform
WildOne-XL FPGA board from Annapolis Microsystems Inc.
2 XC4036 FPGAs, one for user application
Simulation on Virtex XCV1000
04/18/23
Hardware implementationRTL description
in VHDL
HDL Simulation
Synthesis
FPGA Mapping, place
and route
Cadence Affirma tools
Synplicity Synplify Pro
Xilinx Foundation 2.1i
FPGA XC4036XL-08
04/18/23
XC4036XL FPGA Resource utilization
K TL Nmax T 4 20 4 145 25 7 146 30 8 187 35 8 178 40 8 179 45 9 1810 50 21 2011 55 25 2312 60 25 2314 70 41 24
K CLBs LUTs FFs 4i/p 3i/p
6 1206 2081 482 7247 1215 2087 537 7568 1284 2119 654 7889 1296 2213 615 820
4 553 978 196 278 5 1194 2046 340 540
04/18/23
Decoding rate on XC4036 FPGA Overheads
32-bit, 33 MHz PCI bus
Execution of Wildone API using VC++
Slowdown 1.5-2 times
0
50
100
150
200
250
300
350
400
Constraint length, K
Deco
din
g r
ate
in
Kb
ps
no overhead 333.743 164.168 162.273 160.773 143.632 141.141
with overhead 185.994 117.689 116.28 114.231 109.392 107.775
4 5 6 7 8 9
FPGA freq.(MHz) 40.455 20.089 19.857 19.674 17.576 17.316
04/18/23
Issues in Reconfiguration Reconfigurable Units
Number of ACS units (depends on number of survivors) Run-time survivor memory
Reconfiguration types Fine-grained - infeasible Coarse-grained - feasible
Motivation Performance improvement
Tradeoff Small SNR (noisy channel), Large K, slow decoding Large SNR (less noisy channel), Small K, fast decoding Maintain approx. same BER
04/18/23
Coarse-timescale reconfiguration
20.9 % performance improvement over static
100
120
140
160
180
200
3 4 5 6 7 8 9Constraint length K
Decoding rates
(Kbps)
Individual decoding rates w/oreconfigurationAverage decoding rate w/reconfiguration
LessNoisy channel Noisy channel
04/18/23
Coarse-timescale reconfiguration – Experimental Approach
Vary channel noise during transmission Noise changes ~ 250,000 bits or ~1.5
to 2.5 seconds If noise change is detected
Download new decoder configuration content to the FPGA on WildOne board
Reconfiguration overhead ~40 mS PCI bus transfer + Noise change
detection + download bitstream
04/18/23
Comparison with microprocessor
Intel Celeron 366 MHz, 128 MB RAM Speed-up
Up to 7.5X for XC4036 (incl. overheads)
10
100
1000
4 5 6 7 8 9Constraint length K
Decoding rate in
Kbps w/ PCI
overhead
FPGACoprocessorCeleronProcessor