Top Banner
1 Tang P187_MAPLD2004 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept. of EE, Univ. of Washington
22

TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

Dec 17, 2015

Download

Documents

Kathryn Harper
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

1Tang P187_MAPLD2004

High-Performance SEE-Hardened Programmable

DSP Array

Larry McMurchie, Carl SechenStudents: Victor Tang, James Lan, Duncan

LamDept. of EE, Univ. of Washington

Page 2: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

2Tang P187_MAPLD2004

Outline• The RADAR architecture

• Why coarse-grained programmable architectures• Features of the RADAR architecture • Examples of FIR filter• Benchmarks

• Radiation Hardening of RADAR• SETs in combinational logic and pipeline registers• Register filtering technique

Page 3: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

3Tang P187_MAPLD2004

Current Commercial FPGAs – “One Size Fits All”

• Flexibility -- they can implement any digital function• Commodities – not cheap ones, but not near as expensive as

ASICs to design and fabricate• Fewer man hours to design than ASICs• Reprogrammable in situ – allowing updates and bug fixes to be

made easily

Page 4: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

4Tang P187_MAPLD2004

Downside of “One Size Fits All”

• Power can be 10X that of an ASIC that performs the same function

• Area/weight can be many times an equivalent ASIC• Performance may not meet requirements• Varying degrees of susceptibility to radiation effects

– Particularly as process feature sizes decrease

Page 5: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

5Tang P187_MAPLD2004

A Critical Observation!An FPGA in a given system will generally be used only for a limited set of related functions

Example: an FPGA that performs high-throughput

DSP applications, e.g. a FIR filter

- May be reprogrammed to perform a variant of the FIR, e.g. different number of taps, or IIR

- But not a totally different operation, e.g. random

logic required for a control block

- Result for this example is that all the fine-grained, bit-level flexibility in an FPGA is wasted

Page 6: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

6Tang P187_MAPLD2004

Is There a Better Way? If we can identify the domain of applications that will be used in a given environment …..

Then we can create a customized programmable device (CPD) that will :

• Approach ASIC performance in terms of power, area and throughput

• Retain sufficient programmability to enable all applications within the domain

Page 7: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

7Tang P187_MAPLD2004

ASIC/CPD/FPGA Comparison

Flexibility

ASICs FPGAs

Area/Power

ASICs FPGAs

Customized PD

Customized PD

Page 8: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

8Tang P187_MAPLD2004

RADAR is a Programmable Device Customized for DSP

• Based upon Reconfigurable Pipelined Datapaths (RAPID)• Linear bus-based datapath (as opposed to crossbar)

– Provides efficient local interconnect, which is dominant in DSP applications

• Many registers (in the right places) to allow intensive pipelining

• Combination of static and dynamic control– Static to determine the particular application– Dynamic to control multiple phases within the

applicationD. Cronquist, P. Franklin, C. Fisher, M. Figueroa and C. Ebeling, “Architecture Design of Reconfigurable Pipelined Datapaths,” 20th Anniversary Conf. On Advanced Research in VLSI, 1999.

Page 9: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

9Tang P187_MAPLD2004

Example of RADAR Datapath

OutputStreams

InputStreams

Memory

ALU

Multiply

ALU

Multiply

ALU

Multiply

ALU

MultiplyMemory Memory Memory

multiplexor driver bus connector

Cell #1 Cell #2 Cell #3 Cell #4

/w

/w/w/w

/w/w/w/w

4 cells – each containing local memory, multiply, ALU and register plus input and output streams

Page 10: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

10Tang P187_MAPLD2004

Bus Multiplexor and Drivers

GND

s

s

s

s

s

s

s

s

Tofunctional

unit

w

w

w

w

w

w

w

w

w

dynamic control

Fromfunctional

unit

w

w

w

w

w

w

w

w

w

Page 11: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

11Tang P187_MAPLD2004

Bus Connectors

w w

s

s s

s

Page 12: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

12Tang P187_MAPLD2004

Example #1 – 4 Tap FIR Filter

• Given a vector of coefficient weights• Compute the dot product of the coefficient weights and a

vector of inputs• Easily maps to a linear pipeline

……x9……x8……x7……x6……x5……x4……x3……x2……x1……x0* **

w0 w1 w2 w3

y7 =

*

Following slides courtesy of Carl Ebeling, Dept. of CSE, UW

Page 13: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

13Tang P187_MAPLD2004

RADAR Datapath Programmed for 4-tap FIR filter

OutputStreams

InputStreams

Memory

ALU

Multiply

ALU

Multiply

ALU

Multiply

ALU

MultiplyMemory Memory Memory

Page 14: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

14Tang P187_MAPLD2004

RADAR Performance Benchmarks

• Assume 16 RaPiD cells each containing a 16X16 multiplier, and 16 bit buses in communication network

• Applications: 8x8 DCT, motion estimation, FIR filter, matrix multiply, 2D Convolution

• Experiments so far in 0.18 micron CMOS show that 1GHz is achievable, giving 16 GOPs

Page 15: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

15Tang P187_MAPLD2004

Common Techniques for SETs• TMR-in-Hardware for logic and memory

• 3X in power/area• Voting circuitry must be hardened

•Using larger gate widths• Increased current flow suppresses transients• Also increases power/area• Equivalent to using feature sizes of previous generation processes

•Adding resistors and capacitors• Low pass filtering of SETs• Increases power/area

In general, circuit design techniques such as these increase area, delay and power, are difficult to design, and do not transfer well between processes!

Page 16: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

16Tang P187_MAPLD2004

TMR-in-Time and SETsA single event transient in a pipelined computation may be filtered using TMR-in-time, a simple temporal voting scheme:

C1

Clk

C2

Clk

C3

Clk M

Clk

Clk

Same data is applied on successive clock cycles,resulting in three threads of computation followed by a majority function

Page 17: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

17Tang P187_MAPLD2004

TMR-in-TimeThis simple scheme works -- providing transients are no longer than the clock period

It suffers from a ~3X latency relative to the singlet (unhardened) circuit, but requires one third the hardware of the TMR-in-hardware approach.

In the RADAR architecture (where throughput is determined by the number of clock cycles that critical functional units are busy), throughput is the same for TMR-in-time and TMR-in-hardware

TMR-in-time approaches the singlet (unhardened) case in energy consumption per computation.

Data switching activity occurs only during the first of three cycles!Of course, clock power is 3X that of the singlet.

Page 18: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

18Tang P187_MAPLD2004

Filtering SETs at Registers

clk1a

clk1b

clk1c

C1 M

clk1a

clk1b

clk1c

C2 M

clk1a

clk1b

clk1c

DT DT

Sampling data at every register and applying the majority function yields an optimized form of TMR-in-time.

D. Mavis and P. Eaton, “Soft Error Rate Mitigation Techniques for Modern Microcircuits,”Proc. of the 40th Annual Int. Reliability Physics Symposium, 2002, pp. 216-225

Page 19: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

19Tang P187_MAPLD2004

Filtering SETs at Registers (cont.)

• If the delay of the transient is less than the clock separation time DT, only one of the three registers will latch incorrect data and the majority function will filter it out

• Note that SETs created in majority function itself will be filtered out at the following register.

• By increasing DT, the circuit can be made immune to transients caused by radiation of increasing LET values.

The means of generating clocks delayed by DT can be made a programmable feature in the architecture. i.e. the degree of radiation hardening is programmable!

Page 20: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

20Tang P187_MAPLD2004

Power/Throughput Comparisons of

Hardening Techniques

Singlet

TMR-in-hardware

TMR-in-time

RegisterFiltering

Power

Throughput

As applied to a fixed size RADAR array Implementation assumes static CMOS Throughput is measured in output data values / unit time

Page 21: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

21Tang P187_MAPLD2004

Application of Register Filtering to RADAR

Register filtering is well suited to the RADAR architecture

• Better power/throughput characteristics than other methods• The degree of radiation hardening can be programmable through adjustment of DT

Page 22: TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

22Tang P187_MAPLD2004

Summary

• RADAR – Programmable architecture customized for DSP

applications– Capable of 16 GOPS in 0.18 micron CMOS

• Radiation hardening of combinational logic– Using register filtering– Achieves near-ideal power/throughput

characteristics– Degree of radiation hardness programmable