By: Daniel Barsky, Natalie Pistunovich Supervisors: Rolf Hilgendorf, Ina Rivkin Characterization Sub Nyquist Implementation Optimization 11/04/2010
Jan 19, 2016
By: Daniel Barsky, Natalie PistunovichSupervisors: Rolf Hilgendorf, Ina Rivkin
Characterization Sub Nyquist Implementation
Optimization
11/04/2010
AgendaProject OverviewHardware FeaturesProject Goals & AgendaDesign Overview – End to EndExpander ModuleCTF ModuleDSP & SCD ModulesMemory ModuleDebug ModuleGantt Chart
Project OverviewThis project is part of the Sub-Nyquist Sampling
& Reconstruction card.The design is to be implemented on a card
consisting of 4 Altera Stratix-III FPGAs, as well as a set of DDR memories.
The currently suggested implementation requires significant resources, and is implemented on 3 FPGA’s.
The design consists 5 of separate blocks, designed by 8 groups.
Data is represented in 18 bit Fixed Point, 16 bits fraction.
Hardware Features
Hardware Features (cont.)
Project Goals & AgendaReducing the design to 2 FPGA’s, at the
expense of latency:Studying each group’s activity – algorithms,
implementations, resources utilized, etc.Pointing out possible efficiency improvements:
Resources that can be reusedImplementations that exceed requirementsHardware idleness
Implementing improvementsUltimately, suggesting the optimal
architecture to be implemented in an ASIC
Design Overview – End to End
ExpandSequen
ces4:12Morad, Amir
MemoryArchitecture
CTF
Support Change
DetectorOmer, Daniel
DSPOmer, Daniel
Analog Back-End
Analog
System+
A/D
Controller
.
.
.
Samples Bundle
SupportEli, Tzvik
a
Yoni
A†
Expander ModuleDescription: In Normal Operation Mode:
Receives 4 channels at 60 MHz, expands each to 3 slices of 20MHz - a total of 12 channels - and sends them to the Memory block (for later reconstruction) as well as to the CTF & Support Change Detector
In Iteration Mode:Creates 10 slices of 2MHz out of each 20MHz
slice, and sends them to the CTF block for support calculation, in iterations – a different slice each cycle – A total of 12 slices per iteration, 10 iterations required
Expander ModuleAlgorithm:Modulate (if needed) – multiply by
Sine/Cosine coefficients LPF – using a FIR polyphase Kaiser filter, 240
tapsFIR filters are used for added stability and
linear phasePolyphase filters are used for efficient filtering
and decimation using minimal resources (multipliers)
Expander Module (cont.)Total Resource Utilization:4·8 18x18 multipliers at the modulators 4·3·240/3 18x18 multipliers at the 60MHz
20MHz filters4·3·2·400/10 18x18 multipliers at the 20MHz
2MHz filtersTotal:
960+960+32=1952 multipliers There are 448 multipliers per FPGA!
Expander Module (cont.)Possible improvements:Reducing the number of filter taps by
widening the transition band: 0.044πReducing the stop band ripple: -70dBOperating at a higher clock frequency:
Each channel can be sampled several times, and thus the same filter can be reused for several parallel channels
CTF – Q-FrameDescription:Calculates Q frame out of y:
Multiplies by
Q is Hermitian:
0
[ ] [ ]fN
HQ y n y n
1 2 3 4Y y y y y H TY Y
center
right leftsliceslice slice
iy a ib c a ib
, ,i j j iQ Q
CTF – Q-FrameAlgorithm:For non-diagonal elements :
Calculates 9 required products:
Calculates elements of Q using the above products:
For diagonal elements , using 6 products:
,i j i jQ y y
i j i j i j i j i j i j i j i j i ja a a b b a b b c a c b a c b c c c
( )( ) ( ) ( )( )
( ) ( )
( )( ) ( ) ( )( )
i i j j j i i i i j ji i
ij i j j j j j i j j j i i j j
i i i i j j j i i i i j j
a ib a ib c a ib a ib a iba ib
Q c a ib c a ib c a ib c c c a ib
a ib a ib a ib c a ib a ib a ib
2, | |i i i i iQ y y y
2 2 2 2
2
2 2 2 2
2
2
i i i i i i i i i ii i
ij i i i i i i i i i i i i i i i
i i i i i i i i i i i i
a b a c ib c a b i a ba ib
Q c a ib c a ib a c ib c c a c ib c
a ib a b i a b a c ib c a b
CTF – Q-Frame (cont.) Total Resource Utilization:Total multiplier requirements:
42 basic multipliers36 Two-multiply-adders
Basic Multiplier
Two-Multiply Adder
CTF – OMPDescription:Receives Q frame: Calculates using Orthogonal Matching
Pursuit (OMP)Gets support from
0
[ ] [ ]fN
HQ y n y n
QU
QU
CTF – OMP (cont.) Algorithm:
A residue matrix R is loaded with QU is calculated using iterations as follows:
The matrix A is projected on the residue matrix R: The energy of each row in the projection is calculated: The row Ai with the max projection energy is added to the
supportAn orthogonal vector is constructed from Ai using Gram-
Schmidt processThe projection of R on the orthogonal vector is subtracted
from RThe energy of the residue matrix R is calculated: If the energy of the residue is greater than a predefined
threshold, continue to next iteration
HZ A R 2
Z
2R
CTF – OMP (cont.) Total Resource Utilization:Row by matrix multiplier, 144 18x18
complex multipliers18-bit , operations operation: 12 18x18 complex multipliers
12 18-bit addersTotal Hardware requirements
approximation:18x18 complex multipliers: 156
1
2
2
CTF Possible improvements:Increasing clock frequency to speed up
support calculationUsing less multipliers for the calculations at
the cost of additional latency (pipelining)Sharing multipliers with the DSP pseudo-
inverse block (both never work simultaneously)
DSP & SCDDescription:Receives the support from CTFCalculates A†, the Moore-Pennrose pseudo-
inverse of AReconstructs the original signalDetects a change in the support
†z n A y n
DSPAlgorithm – Pseudo inverse & Reconstruction:Receive the support S from CTF blockCreate AS from the columns of A that are in the
supportDecompose AS to an orthogonal matrix Q and an
upper-triangular matrix R using QR decomposition (computed using Householder reflections)
Inverse R using the upper-triangular matrix inversion algorithm
Calculate the pseudo inverse by Reconstruct z[n] by matrix multiplication:
† 1 TA R Q
†z n A y n
SCDAlgorithm – Support Change Detection:Add an extra support to the matrix As
After Pseudo inverse, create a control vector from Multiply the control vector by 12 samples and sum up the
result. If the energy level is high - a support change has occurred:
Instruct the CTF to calculate a new supportIf the support has failed several times in Normal Operation
Mode, instruct the CTF to switch to Iteration ModeIf the support has failed several times in Iteration Mode,
indicate that there is a problem.
†A
DSP & SCD (cont.) Total Resource Utilization:QR decomposition - 51 18x18 Complex
multipliersMatrix Pseudo-Inverse - 20 18x18 Complex
multipliersMatrix Multiplication – 24 18x18 Complex
multipliersSample Multiplication – 48 18x18 Complex
multipliers
DSP & SCD (cont.) Possible Improvements :Increasing clock frequency to speed up non-
realtime calculations (pseudo-inverse, matrix multiplication)
Using less multipliers for the calculations at the cost of additional latency (pipelining)
Sharing multipliers with the CTF block (both never work simultaneously)
Examining other decompositions (SVD, LQ, Cholesky, etc.)
MemoryDescription:Memory block designed as a FIFO to store
sampled channelsDesigned to delay the input long enough to
calculate a new support and a new A†
Possible Improvements:If there is a shortage in on-chip memory,
using an external DDR memory chip can be considered
Debug ModulesDescription:Designed to debug each block of the design
separatelyConsists of a signal generator for the input
of the block, and a FIFO memory to hold the output
Possible Improvements:If these modules are expensive in hardware,
two firmware versions can be prepared – a compact version without the debug modules, and a complete one with them
Gantt Chart
Thank You!