By: Daniel Barsky, Natalie Pistunovich Supervisors: Rolf Hilgendorf, Ina Rivkin Characterization Sub Nyquist Implementation Optimization 11/04/2010.

By: Daniel Barsky, Natalie PistunovichSupervisors: Rolf Hilgendorf, Ina Rivkin

Characterization Sub Nyquist Implementation

Optimization

11/04/2010

AgendaProject OverviewHardware FeaturesProject Goals & AgendaDesign Overview – End to EndExpander ModuleCTF ModuleDSP & SCD ModulesMemory ModuleDebug ModuleGantt Chart

Project OverviewThis project is part of the Sub-Nyquist Sampling

& Reconstruction card.The design is to be implemented on a card

consisting of 4 Altera Stratix-III FPGAs, as well as a set of DDR memories.

The currently suggested implementation requires significant resources, and is implemented on 3 FPGA’s.

The design consists 5 of separate blocks, designed by 8 groups.

Data is represented in 18 bit Fixed Point, 16 bits fraction.

Hardware Features

Hardware Features (cont.)

Project Goals & AgendaReducing the design to 2 FPGA’s, at the

expense of latency:Studying each group’s activity – algorithms,

implementations, resources utilized, etc.Pointing out possible efficiency improvements:

Resources that can be reusedImplementations that exceed requirementsHardware idleness

Implementing improvementsUltimately, suggesting the optimal

architecture to be implemented in an ASIC

Design Overview – End to End

ExpandSequen

ces4:12Morad, Amir

MemoryArchitecture

CTF

Support Change

DetectorOmer, Daniel

DSPOmer, Daniel

Analog Back-End

Analog

System+

A/D

Controller

.

.

.

Samples Bundle

SupportEli, Tzvik

a

Yoni

A†

Expander ModuleDescription: In Normal Operation Mode:

Receives 4 channels at 60 MHz, expands each to 3 slices of 20MHz - a total of 12 channels - and sends them to the Memory block (for later reconstruction) as well as to the CTF & Support Change Detector

In Iteration Mode:Creates 10 slices of 2MHz out of each 20MHz

slice, and sends them to the CTF block for support calculation, in iterations – a different slice each cycle – A total of 12 slices per iteration, 10 iterations required

Expander ModuleAlgorithm:Modulate (if needed) – multiply by

Sine/Cosine coefficients LPF – using a FIR polyphase Kaiser filter, 240

tapsFIR filters are used for added stability and

linear phasePolyphase filters are used for efficient filtering

and decimation using minimal resources (multipliers)

Expander Module (cont.)Total Resource Utilization:4·8 18x18 multipliers at the modulators 4·3·240/3 18x18 multipliers at the 60MHz

20MHz filters4·3·2·400/10 18x18 multipliers at the 20MHz

2MHz filtersTotal:

960+960+32=1952 multipliers There are 448 multipliers per FPGA!

Expander Module (cont.)Possible improvements:Reducing the number of filter taps by

widening the transition band: 0.044πReducing the stop band ripple: -70dBOperating at a higher clock frequency:

Each channel can be sampled several times, and thus the same filter can be reused for several parallel channels

CTF – Q-FrameDescription:Calculates Q frame out of y:

Multiplies by

Q is Hermitian:

0

[ ] [ ]fN

HQ y n y n

1 2 3 4Y y y y y H TY Y

center

right leftsliceslice slice

iy a ib c a ib

, ,i j j iQ Q

CTF – Q-FrameAlgorithm:For non-diagonal elements :

Calculates 9 required products:

Calculates elements of Q using the above products:

For diagonal elements , using 6 products:

,i j i jQ y y

i j i j i j i j i j i j i j i j i ja a a b b a b b c a c b a c b c c c

( )( ) ( ) ( )( )

( ) ( )

( )( ) ( ) ( )( )

i i j j j i i i i j ji i

ij i j j j j j i j j j i i j j

i i i i j j j i i i i j j

a ib a ib c a ib a ib a iba ib

Q c a ib c a ib c a ib c c c a ib

a ib a ib a ib c a ib a ib a ib

2, | |i i i i iQ y y y

2 2 2 2

2

2 2 2 2

2

2

i i i i i i i i i ii i

ij i i i i i i i i i i i i i i i

i i i i i i i i i i i i

a b a c ib c a b i a ba ib

Q c a ib c a ib a c ib c c a c ib c

a ib a b i a b a c ib c a b

CTF – Q-Frame (cont.) Total Resource Utilization:Total multiplier requirements:

42 basic multipliers36 Two-multiply-adders

Basic Multiplier

Two-Multiply Adder

CTF – OMPDescription:Receives Q frame: Calculates using Orthogonal Matching

Pursuit (OMP)Gets support from

0

[ ] [ ]fN

HQ y n y n

QU

QU

CTF – OMP (cont.) Algorithm:

A residue matrix R is loaded with QU is calculated using iterations as follows:

The matrix A is projected on the residue matrix R: The energy of each row in the projection is calculated: The row Ai with the max projection energy is added to the

supportAn orthogonal vector is constructed from Ai using Gram-

Schmidt processThe projection of R on the orthogonal vector is subtracted

from RThe energy of the residue matrix R is calculated: If the energy of the residue is greater than a predefined

threshold, continue to next iteration

HZ A R 2

Z

2R

CTF – OMP (cont.) Total Resource Utilization:Row by matrix multiplier, 144 18x18

complex multipliers18-bit , operations operation: 12 18x18 complex multipliers

12 18-bit addersTotal Hardware requirements

approximation:18x18 complex multipliers: 156

1

2

2

CTF Possible improvements:Increasing clock frequency to speed up

support calculationUsing less multipliers for the calculations at

the cost of additional latency (pipelining)Sharing multipliers with the DSP pseudo-

inverse block (both never work simultaneously)

DSP & SCDDescription:Receives the support from CTFCalculates A†, the Moore-Pennrose pseudo-

inverse of AReconstructs the original signalDetects a change in the support

†z n A y n

DSPAlgorithm – Pseudo inverse & Reconstruction:Receive the support S from CTF blockCreate AS from the columns of A that are in the

supportDecompose AS to an orthogonal matrix Q and an

upper-triangular matrix R using QR decomposition (computed using Householder reflections)

Inverse R using the upper-triangular matrix inversion algorithm

Calculate the pseudo inverse by Reconstruct z[n] by matrix multiplication:

† 1 TA R Q

†z n A y n

SCDAlgorithm – Support Change Detection:Add an extra support to the matrix As

After Pseudo inverse, create a control vector from Multiply the control vector by 12 samples and sum up the

result. If the energy level is high - a support change has occurred:

Instruct the CTF to calculate a new supportIf the support has failed several times in Normal Operation

Mode, instruct the CTF to switch to Iteration ModeIf the support has failed several times in Iteration Mode,

indicate that there is a problem.

†A

DSP & SCD (cont.) Total Resource Utilization:QR decomposition - 51 18x18 Complex

multipliersMatrix Pseudo-Inverse - 20 18x18 Complex

multipliersMatrix Multiplication – 24 18x18 Complex

multipliersSample Multiplication – 48 18x18 Complex

multipliers

DSP & SCD (cont.) Possible Improvements :Increasing clock frequency to speed up non-

realtime calculations (pseudo-inverse, matrix multiplication)

Using less multipliers for the calculations at the cost of additional latency (pipelining)

Sharing multipliers with the CTF block (both never work simultaneously)

Examining other decompositions (SVD, LQ, Cholesky, etc.)

MemoryDescription:Memory block designed as a FIFO to store

sampled channelsDesigned to delay the input long enough to

calculate a new support and a new A†

Possible Improvements:If there is a shortage in on-chip memory,

using an external DDR memory chip can be considered

Debug ModulesDescription:Designed to debug each block of the design

separatelyConsists of a signal generator for the input

of the block, and a FIFO memory to hold the output

Possible Improvements:If these modules are expensive in hardware,

two firmware versions can be prepared – a compact version without the debug modules, and a complete one with them

Gantt Chart

Thank You!

By: Daniel Barsky, Natalie Pistunovich Supervisors: Rolf Hilgendorf, Ina Rivkin Characterization Sub Nyquist Implementation Optimization 11/04/2010.

Documents

q frame

elements of q

ctf qframealgorithm

q u u y

ctf block

residue matrix r

significant resources

bits fraction