Top Banner
Lecture 7 FPGA technology
44

Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

Dec 14, 2015

Download

Documents

Joana Shellito
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

Lecture 7 FPGA technology

Page 2: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

2

Implementation Platform Comparison

Page 3: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

3

FPGA main components and features

Logic block architecture Interconnect architecture Programming technology Power dissipation Reconfiguration model

Page 4: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

4

FPGA model

…….

Page 5: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

5

Interconnect Network Topologies

Island style Row-based Sea-of-gates Hierarchical One-dimensional structures

Page 6: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

6

Island-Style Architecture

Page 7: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

7

Row-Based Architecture

Page 8: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

8

Sea-of-Gates Architecture

Page 9: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

9

Hierarchical Architecture

Page 10: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

10

One-Dimensional Architecture

Page 11: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

11

Logic Cluster Parameters

The size of (number of inputs to) a LUT.

The number of CLBs in a cluster. The number of inputs to the cluster

for use as inputs by the LUTs. The number of clock inputs to a

cluster (for use by the registers).

Page 12: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

12

Studies on the CLB structure

Area optimal: 3-4 input LUTs For multiple output LUTs:

Optimal area: 4 input LUTs Optimal delay: 5-6 input LUTs

4-input LUT clusters show 10% area efficiency in comparison to single 4-input LUTs

Page 13: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

13

Programming Technology

Volatile (SRAM) Irreversible (Antifuse) EPROM, EEPROM AND FLASH The programming technology affects

the FPGA area

Page 14: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

14

SRAM Programming Technology

Configuration storage on SRAM cells Volatile (FPGA has to be

reprogrammed on power-up) Large area (SRAM cells) Allows dynamic and partial

reconfiguration

Page 15: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

15

Antifuse Programming Technology

Programming element is an antifuse (high impedance (open-circuit) on low voltage, low impedance (connection) on high voltage)

Small area Non-volatile (no need for

reprogramming on power-up) Irreversible (design errors cannot be

corrected)

Page 16: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

16

EPROM, EEPROM and Flash Programming Technology

Non-volatile Reprogramming through exposure to

ultraviolet light (EPROM) or electrical signals (EEPROM/Flash)

Slower programming than SRAM

Page 17: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

17

FPGA Power Consumption

FPGA power dissipation components: Interconnection network Clock network Input/Output Logic block

Page 18: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

18

FPGA Power Consumption Breakdown (XC4003)

Page 19: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

19

Dynamic vs Static Power Consumption

Dynamic power consumption is still dominant, even though the static power consumption component increases with the decrease in feature size.

Page 20: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

20

Reconfiguration Models

Static Reconfiguration Dynamic Reconfiguration Single Context Multi-Context Partial Reconfiguration Pipeline Reconfiguration

Page 21: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

21

Static Reconfiguration

Compile-time Reconfiguration Most common approach One configuration per application System must be halted and then

restarted with new program

Page 22: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

22

Dynamic Reconfiguration

Run-time Reconfiguration Based on virtual hardware Trade-off between time and space

Page 23: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

23

Single Context

One configuration at a time Programming using a serial

bitstream High overhead for small

configuration changes Not suitable for run-time

reconfiguration

Page 24: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

24

Multi-Context

Multiple memory bits for each programming bit location

Multiplexed set of single context devices

One context can be reprogrammed when another is active

Page 25: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

25

Partial Reconfiguration

Addresses used to specify the target location of the configuration data

Undisturbed portions of the array can continue execution during reconfiguration

Reduces the amount of data that must be transferred to the FPGA

Page 26: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

26

Pipeline Reconfiguration

Partial reconfiguration increments of pipeline stages

Used in datapath-style computations

Page 27: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

27

Run-Time Reconfiguration

Algorithmic Reconfiguration Architectural Reconfiguration Functional Reconfiguration Fast Configuration Configuration Prefetching Configuration Compression Relocation and Defragmentation in

Partially Reconfigurable Systems Configuration Caching

Page 28: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

28

Algorithmic Reconfiguration

Reconfigure the system with an algorithm which performs the same functionality but with different requirements

Adapt dynamically to environment or operational changes

Page 29: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

29

Architectural Reconfiguration

Modify hardware topology by reallocating resources to computations

Page 30: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

30

Functional Reconfiguration

Execute different functions on the same resources

Time-share resources across computational tasks

Page 31: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

31

Fast Configuration

Reconfigure the device as fast as possible in order to minimize reconfiguration overhead

Page 32: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

32

Configuration Prefetching

Loading a configuration onto a device in advance, in order to overlap reconfiguration with useful computation

The challenge is to determine future configurations

Page 33: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

33

Configuration Compression

Minimize the data that must be loaded to the device in multi-context environment

Page 34: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

34

Configuration Caching

Reducing the amount of configuration data that must be transferred to the device

The challenge is to determine which configuration to retain and which to flush

Page 35: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

35

Commercial Fine-Grain Reconfigurable Architectures

Xilinx Spartan-3 /Spartan-3L Virtex-4 Virtex-5

Altera Cyclone Cyclone II Stratix II /Stratix II GX

Actel Fusion ProASIC3/

ProASICPLUS Axcelerator Varicore

AtmelAT40K/AT40KLVAT6000

QuicklogicPolarProEclipse II

LatticeLatticeECP2LatticeXP

Page 36: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

36

Xilinx Spartan-3 CLB

Four slices Two logic function generators/slice Two storage elements/slice

Interconnect Long lines (one out of every six CLBs) Hex lines (one out of every three CLBs) Double lines (every other CLB) Direct lines (each CLB with its neighbours)

Advanced features BlockRAM Dedicated Multipliers Digital Clock Managers

Configuration SRAM

Page 37: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

37

Xilinx Spartan-3

Page 38: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

38

Xilinx Virtex-4 Three variations (LX, FX, SX) CLB

Four slices Two logic function generators/slice Two storage elements/slice

Advanced features BlockRAM XtremeDSP slices Digital Clock Managers

Additional features in the FX family 8–24 RocketIO Multi-Gigabit serial Transceivers One or Two PowerPC cores Two or Four Tri-MAC Cores

Configuration SRAM

Page 39: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

39

Xilinx Virtex-5 65 nm ExpressFabric

6-input LUTs Interconnect

Diagonal symmetric interconnect Advanced features

DCM and PLLs BlockRAM DSP48E slices

Configuration SRAM Advanced Encryption Standard technology for bitstream

protection

Page 40: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

40

Altera Cyclone/Cyclone II

Essentially the same architecture in 130 nm (Cyclone), and 90 nm (Cyclone II)

LE (10 per LAB): 4-input LUT Register Carry chain

MultiTrack Interconnect Row and column interconnects spanning fixed distances

Advanced Features: Embedded Memory PLLs External RAM interfacing Embedded multipliers (Cyclone II only)

Page 41: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

41

Cyclone Logic Element

Page 42: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

42

Altera Stratix II/ Stratix II GX

Adaptive Logic Modules: MultiTrack Interconnect Advanced Features:

TriMatrix Memory

Page 43: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

43

Adaptive Logic Module

Page 44: Lecture 7 FPGA technology. 2 Implementation Platform Comparison.

44

Review Questions

Can you partially reconfigure a single-context FPGA?

How often do you need to reconfigure a SRAM configuration memory FPGA device?

One design comprising 200 CLBs and one comprising 400 CLBs are to be downloaded on the same device, that doesn’t support dynamic reconfiguration. How big is the size of the second design bitstream in comparison to the first?