Top Banner
1 FPGAs: Enabling the Software/Reconfiguable Radio Dr Chris Dick DSP Chief Architect Agenda Device Technology Software Design Methodologies Example
18

FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

Aug 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

1

FPGAs: Enabling theSoftware/ReconfiguableRadio

Dr Chris DickDSP Chief Architect

Agenda

• Device Technology• Software• Design Methodologies• Example

Page 2: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

2

Why FPGA DSP?

• Flexibility• High performance• Time to Market• Functional extensions to existing equipment• Standard part (no NRE/Inventory issues)• Early system bring-up on hardware

1960 1970 1980 1990 2000

Transistor Count

Transistors = 7.4M

Transistors = 75M

Intel Pentium II - 199532bit up

Xilinx/UMC Group - 1999Virtex - 1000

The Impact of Moore’s Law

100M

10M

1M

100K

10K

1K

100

10

0

RCA - 1962First MOSFET

Transistor = 1

Intel - 1972First 8bit up

8080

Transistors = 4,500

Transistors = 450,000

HP-198132bit up

Page 3: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

3

Process Drives Density &Performance

Source: SIA ‘94, SIA ‘97, Xilinx

0100

200300

400

500600

700800

900

1990 1995 2000 2005 2010

Line

Wid

th (

nm)

SIA 1994

SIA 1997

Xilinx

Virtex-II Platform FPGA

SwitchMatrix

SwitchMatrix

CLB,IOB,DCM

CLB,IOB,DCM

Active Interconnect ™

• Fully Buffered• Fast, Predictable

• 18b x 18b multiplier• 200MHz pipelined

Multipliers

BRAM

• 18KBit True Dual Port• Up to 3.5Mbits / device

Block RAM

SwitchMatrix

Slice S0

Slice S1

Slice S2

Slice S3

Powerful CLB

• 8 LUTs• 128b distributed RAM• Wide Input functions (32:1)

Page 4: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

4

A Decade of Progress

1

1/91 1/92 1/93 1/94 1/95 1/96 1/97 1/98 1/99 1/00 1/01

Year

CapacitySpeedPrice Virtex &

Virtex-E(excl. Block RAM)

XC4000

100x

10x

1x

Spartan

1000x

Virtex-II(excl. Block RAM)

Virtex®-II Family

Virtex-II XC2VXC2V XC2V XC2V XC2V XC2V XC2V XC2V XC2V XC2V XC2VPart Number 40 80 250 500 1000 1500 2000 3000 4000 6000 8000Logic Cells 576 1152 3456 6912 11520 17280 24192 32256 51840 76032 104832BRAM (Kb) 72 144 432 576 720 864 1008 1728 2160 2592 3024Multipliers 4 8 24 32 40 48 68 96 120 144 168DCM Units 4 4 8 8 8 8 8 12 12 12 12

CS144 88 92 92 FG256 88 120 172 172 172 FG456 200 264 324 FG676 392 456 484 FF896 432 528 624 FF1152 720 824 824 824 FF1517 912 1104 1108 BG575 328 392 408 BG728 456 516 BF957 624 684 684 684 684

11 Devices, 10 Packages, 37 combinations

Page 5: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

5

FPGA to ASIC CrossoverImproves with Process

Cumulative Volume K units

CumulativeNRE + Unit Cost

ASIC .25µ

ASIC .15µ

FPGA .15µFPGA .25µ

ASIC CostsStart higher,

but slope is flatter

For each technology advance, crossover volume moves higher

Problem Today:Integrator’s Dilemma

Inte

grat

ion

Applications Space

ASIC SoCHighly Specialized

Point Solution

Page 6: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

6

New Era of Platform FPGAs

Inte

grat

ion

Applications Space

Platform FPGABroad Range

of Applications

ASICSoC

FPGA Customized Datapaths

• Design tradeoffs and optimization in real (design)-time

2040

6080

0

100

2001

2

3

4

FP

GA

Are

a

Signal Processing ComplexityPerformance

increaseincrease

Page 7: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

7

Example: FIR Filter

2040

6080

0

100

2001

2

3

4

Log1

0 Sl

ices

Sample Rate (MSPS) Filter Length (Taps)

14-bit coefficients

• Use optimum precisions at eachnode in the computation graph

• ‘Right-size’ the datapath• design surface for a FIR filter:

Area vs Sample Rate vs Length

Adding Parallelism inConventional DSP Solutions

• New DSP architectures such as VLIW and super-scalarhave one goal: provide higher degrees of parallelism

• Architecture evolution along the same design axis is notscalable

– Too many MAC functional units makes programming, compilersand scheduling an issue

• The effective computing per chip area decreases– Memories grow geometrically while the datapath does not

Page 8: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

8

The Power of Parallelism• In FPGAs we can exploit the large amounts of parallelism

inherent in many DSP data paths

DDC

DDC

DDC

DDC

Rake FarmViterbi

Viterbi

Viterbi

Turbo Decoder

Turbo Decoder

Turbo Decoder

Multi-user Detect

AdaptiveInterference

Canceller

Beam Former

Space-Time Coding

FPGAs = Performance (1)

Virtex-II XC2V3000-5 with 14,336 slices† Optimized for coefficient set

• 12 concurrently operating 64-tap filters• 8-bit MACs – 8-bit data, 8-bit

coefficients†

• Sample Rate (fs) = 154 MHz• 13,704 slices (95% of device)• 118 Billion MACs/s• I/O bandwidth = 237 Giga-bytes/s

Page 9: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

9

FPGAs = Performance (2)

• 1024-point complex FFT– 9 microsecond execution time (@fclk = 115 MHz)– 2,500† logic slices

• Viterbi decoder at OC3 data rates: 155 Mbps• Interleaver/de-interleaver @fclk > 200 MHz• RS decoding @10 Gbps

– 16 parallel RS decoders in a single XC2V3000-4

SDR System Diagram

10010110101001110101

10010110101001110101

Digital Signal

Processing Engine

Programmable DSP Fabric

RF/IF RX1

RF/IF RX2

RF/IF RX3

WB-A/D

WB-A/D

WB-A/D WB-D/A RF/IF TX1

WB-D/A RF/IF TX2

WB-D/A RF/IF TX3

RX SmartAntenna

TX SmartAntenna

10010110101001110101

Radio Personality DatabaseDefined by software for DSP µPor bit-stream for FPGA

Air interface, Digital filters,RF access mode,Synchronization,

Equalization, FEC, Networkinterface

Network, e.g. PSTN

Wire/fiber

Page 10: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

10

Wideband BTS - Receiver

Mixerchannelselectio

n

Filter +decimatio

n

Filter +decimatio

n

Programmablefilter

Mixerchannelselectio

n

Filter +decimatio

n

Filter +decimatio

n

Programmablefilter

Mixerchannelselectio

n

Filter +decimatio

n

Filter +decimatio

n

Programmablefilter

Mixerchannelselection

Filter +decimation

Filter +decimation

Program-mablefilter

ADC

LNA

AGC

LO

FPGA front-end signal processor: channelselection,rate adjust, matched filter, DDS

FPGA Configurable Signal Processor

Channel controldigital GC

Protocol and controlApplications/applets

RISC Micro.

Rake processor (search,track)Adaptive rakeDemodulatorFEC: Turbo, ViterbiMUD, ICUbeam forming

Sample rateselection, filter

coefficients

Digitally controlledanalog loop (Σ∆ based)

Building the System• Device technology is part of the solution• The software/IP is getting harder than the hardware• Design methodologies for

– Productivity– Rapid design exploration– Hardware abstraction– Single source for all aspects of the design & development

cycle• Verification• Implementation

Page 11: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

11

SDR Personality: QAM Receiver

AGC FIR EQUALIZER

BPF

PWR EST OSC 1

TIMING RECOVERY

EQUALADJUST

CARRIER TRACKINGOSC 2

HARDDEC

CONTROL BUS

DFE

RECEIVER INPUTFORWARD

ERROR CORRECTION

HARDDEC

W/FECANALOG FE

Example will considerimplementation of the

equalizer and carrier loop

The Design Space is Rich• Decision directed T/2 Adaptive Equalizer - LMS based update• Using FPGAs There are multiple architectural choices available to

meet a desired area/performance objective

• Fully parallel– N MAC processing elements (PEs)– N LMS PEs

• Folded architecture– 1 MAC PE & 1 LMS PE for each polyphase segment

• … Many others

Page 12: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

12

Equalized Receiver ExampleTransmitter Model- 16-QAM Source- Matched Filter- Sample rate Change

Passband Adaptive Equalizer- Fractionally spaced (T/2)- Polyphase decimator structure- LMS coefficient update- coefficients updated at the symbol rate Carrier Recovery

- CORDIC based PD

Instrumentation

System Generator Simulation

TX const.

Transition

diagram

Rxsignal

with ISI&

Doppler

Equalizedconstellation

Equalized

No carrier lock

Page 13: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

13

Implementation

• Parallel T/2 FSE• Polyphase decomposition• 8-taps total

– 4 taps in each polyphase segment

• 8-LMS PEs• Coefficients updated at the symbol rate

Input sample commutator

Pipelined Parallel T/2 DD FSE• Design components are based on a library of highly

optimized module generators

Polyphase Filter produces samplesat the symbol rate

Page 14: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

14

Pipelined Parallel T/2 FSE• One polyphase segment

– 4 FIR PEs & 4 LMS PEs

† software version 4.1.03i, speedfile version 1.93, par - rl 5 -pl 5 -xe 2

Pipelined Parallel T/2 FSE

• Design statistics for 8 tap equalizer– 2674 logic slices– 66 multipliers

• 64 used for FIR + LMS PEs, 2 for rate adaption

– fclk = 149.5 MHz (-6 speed grade part)

• Computation rate: 9.6 Giga-MACs

Page 15: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

15

† software version 4.1.03i, speedfile version 1.93, par - rl 5 -pl 5 -xe 2

Folded FSE

• Benchmark data– 2093 logic slices– 16 embedded multipliers– fclk † = 100 MHz (XC2V3000bf957-6)

• For fclk = 100 MHz and N=8 T/2 FSE the symbolrate is 25 Msym/s

• For 16-QAM this is 100 Mbps

Carrier Recovery Loop

G2

G3

G1

G4

Mixer using Virtex-IIEmbedded multipliers- 3 multipliers/ 5 additions

CORDIC basedphase detector

PI Loop filter using 2embedded multipliers

Look-up table Based DDS

Page 16: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

16

CORDIC Phase Detector

0x

0y -

02−

02−

-

12−

12−

-

22−

22−

1z −

-

32−

32−

-2 N−

2 N−

0zNx

Ny

Nz0PE 1PE 2PE 3PE NPE

SGN

1z−

-

2 i−

2 i−

1z−

1z−

k

iPE

ix

iy

iz

1ix +

1iy +

1iz +

System Generator Implementation

1z −

1z −

1z −

1z −

1z −

1z −

1z −

1z −

1z −

1z −

1z −

1z −

1z −

1z −

CRL Resources

Function Slice Count Block RAMs EmbeddedMultipliers

Heterodyne 111 - 3DDS 5 1 -Loop Filter 32 - 2PhaseDetector

270 - 3

Total 413† 1 8

† The small slice count discrepancy is due to logic optimizations that occur when the individual CRL components are integrated into thecomplete system.)

Page 17: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

17

DIME - Modular System Building

DAC

RISC

DSP µp

DACDACDAC

DACDACADC

Network

Analog

RISC

Network

Analog

DAC

DAC

DAC

DAC

Analog

ADC

ADC

ADC

ADC

Board image courtesy of Nallatech http://www.nallatech.com/

EmbeddedRISC CPU

Synchronous Dual-Port RAM

Up to 8 million gates

3.125Gb Serial

Active super-fastinterconnect

BRAM

SwitchMatrix

SwitchMatrix

CLB,IOB,DCM

CLB,IOB,DCM

ProgrammableI/Os with LVDS

50 Ω

ImpedanceController

XCITEImpedanceControl

• 18b x 18b multiplier• 200MHz pipelined

Multipliers

The Signal Processing Platform

Page 18: FPGAs: Enabling the Software/Reconfiguable Radio · FPGA to ASIC Crossover Improves with Process Cumulative Volume K units Cumulative NRE + Unit Cost ASIC .25µ ASIC .15µ FPGA .15µ

18

Platform Based Design

• Hardware/Software partitioning

Logic Fabric

Processor

The Future• Trends

– Increasing levels of System integration– Pervasive DSP enabling anywhere anytime connectivity– Increasingly complex systems– Decreasing market windows

• FPGA DSP systems– Device technology supporting highly parallel DSP engines– Design methodologies

• Abstraction that permits working in the language of the problem• Enables effective integration of re-usable components (cores)