7/30/2019 L16-fpgas
1/20
6.888 PARALLELAND HETEROGENEOUS COMPUTER ARCHITECTURE
SPRING 2013
LECTURE 16FINE-GRAINED RECONFIGURABLE
COMPUTING: FPGAS
JOEL EMERAND DANIEL SANCHEZ
7/30/2019 L16-fpgas
2/20
Field Programmable Gate Arrays (FPGA)
And
00 0
01 0
10 0
11 1
LUTLatch
RAM
.....
Or
00 0
01 0
10 1
11 1
7/30/2019 L16-fpgas
3/20
Evolution of FPGA applications3
Logic ReplacementLow design cost and effortLow volume applicationsOften replaced with ASIC as volume increases
Algorithmic computationOffloads a general purpose processorUsed for multiple algorithmsASIC replacement not expected
6.888 Spring 2013 - Sanchez and Emer L16
7/30/2019 L16-fpgas
4/20
Benefits of FPGA computation4
Custom operations/data typescustom operations/data types Flexible flow control- control flow based on arbitrary state machines Local state access- local state elements allows parallel state access Fine grain parallelism- replicated logic permits easy parallelism Custom communication- explicit direct inter-module communication Reduced memory references more direct reuse of data Better power efficiencymore activity directly applied to computation Better area efficiencymore area directly applied to computation
6.888 Spring 2013 - Sanchez and Emer L16
7/30/2019 L16-fpgas
5/20
QPI-attached FPGA platformIntel QuickAssist QPI-based FPGA AcceleratorPlatform (QAP)
Four Socket QuickAssist Platform
Topology
QPI
Links
PM
PM
M
P M
CS
I/O I/O
CS
I/O I/O
QPI
Links
Altera Stratix IVModule
Xilinx Virtex 6 Module
Intel Xeon processor 7000 series
Accelerator HardwareModule (AHM)
FPGA
QuickAssist FPGA
PCIe attached FPGA
6.888 Spring 2013 - Sanchez and Emer L16
5
7/30/2019 L16-fpgas
6/20
Reed Solomon Results
Xilinx IP Catapult-C
Bluespec
EquivalentGate Count 297,409 596,730 267,741
Frequency(MHz)
145.3 91.2 108.5
Steady State(Cycles/Block)
660 2073 276
Data rate(Mbps)
392.8 89.7 701.3
Lower is better Higher is better
WiMAX requirement is to support a throughput of 134Mbps
Source: MIT, Abhinav Agarwal, Alfred Ng CSG
7/30/2019 L16-fpgas
7/20
BORPH
Berkeley Operating system for ReProgrammableHardware
OS for reconfigurable computers Treats reconfigurable hardware as computational
resources
UNIX interface to HW designs Familiar to both software and hardware engineers Design language independent
Goal:
Make FPGA-based reconfigurable computerseasy to use
6.888 Spring 2013 - Sanchez and Emer L16
7
7/30/2019 L16-fpgas
8/20
6.888 Spring 2013 - Sanchez and Emer L16
Conventional View of FPGA Systems
User Process(SW)
OS Kernel
User Process(SW)
User Process(SW)
User Library
Hardware Platform
(Network, UART, HD)
file IPC
Device Driver
Softw
are
Hardware
socketpipe
FPGA coprocessor
Master-Slave
Relationship
6.888 Spring 2013 - Sanchez and Emer L16
8
7/30/2019 L16-fpgas
9/20
6.888 Spring 2013 - Sanchez and Emer L16
BORPH LayersUser Process
(SW)User Process
(SW)User Process
(SW)
Hardware Platform(Network, UART, HD)
Device Driver
User Process(HW)
User Process(HW)
Hardware User Library
BORPH KernelSoftw
are
Hardware
User Library
fileIPC socketpipe
ioregvirtual
file
Peer-to-Peer
Relationship
6.888 Spring 2013 - Sanchez and Emer L16
9
7/30/2019 L16-fpgas
10/20
Overview of BORPH Concepts
Hardware process Hardware syscall interface
Interacting with an FPGA ioreg virtual file interface Hardware file I/O
SW SW SW
Hardware Platform(Network, UART, HD)
Device Driver
HW HW
Hardware User Library
BORPH KernelSoftware
Hardware
User Library
fileIPC socketpipe
ioreg
6.888 Spring 2013 - Sanchez and Emer L16
7/30/2019 L16-fpgas
11/20
Hardware Process (1)
An executing instance of ahardware design
SW: An executing instance of aprogram
Normal UNIX process Has pid, check status withps,kill, etc
Unit of management Created when a BORPH Object
File (BOF) file is
exec-ed Kernel selects and configure
hardware region automatically
6.888 Spring 2013 - Sanchez and Emer L16
SW SW SW
Hardware Platform(Network, UART, HD)
Device Driver
HW HW
Hardware User Library
BORPH KernelSoftware
Hardware
User Library
fileIPC socketpipe
ioreg
11
7/30/2019 L16-fpgas
12/20
Benefits of UNIX Process Model
Very easy for user to reason about Enable FPGA designs to become active component of the
system
e.g. an FIR filter:
Conventional: apassive entity where software sends/receives data BORPH: an active entity that pulls/pushes data as needed
Enable multiple instances of the same FPGA design runningin the system No more fixed accelerator concept Works well in true reconfigurable computing systems
6.888 Spring 2013 - Sanchez and Emer L16
12
7/30/2019 L16-fpgas
13/20
SW SW SW
Hardware Platform(Network, UART, HD)
Device Driver
HW HW
Hardware User Library
BORPH KernelSoftware
Hardware
User Library
fileIPC socketpipe
ioreg
HW Processes I/O
I/O managed by kernel Similar to SW
Hide details from users e.g. HW-SW, HW-HW UNIX
file pipe
Standard UNIX I/Omechanism
File I/O, pipe, signal HW specific service
ioreg virtual file system
Dont ask How do I in HW.Think: What if it were SW?
6.888 Spring 2013 - Sanchez and Emer L16 13
7/30/2019 L16-fpgas
14/20
ioreg Virtual File System
Maps user defined hardware constructs as virtual files under the processs /proc//hw/ioreg/ directory Single word register Memory: On-chip + Off-chip FIFO
Example: /proc/123/hw/ioreg/COUNTERVAL
ioreg information embedded in the executing BOF file readandwrite system calls translated to message packet by the kernel
Any UNIX program can communicate with hardware processesn Shell: echo 1 > /proc/123/hw/ioreg/enablen
C:MEM_FILE =n fopen(/proc/123/hw/ioreg/MyMemory, r);n fread(swbuf, 1, MEM_SIZE, MEM_FILE);n Python, Java, etc
6.888 Spring 2013 - Sanchez and Emer L16
14
7/30/2019 L16-fpgas
15/20
Hardware File I/O
Access to the general file system from hardware processes Debug by printing
printf Read test vectors, record output
SW/HW processes chained by file pipe
BasebandProcess
A/DAnalogFrontendUpperLayer
Decode ResizeEdge
DetectEncode
video.in video.out
bash$decode video.in | resize | edgdet.bof | encode > video.out
bash$receiver.bof < file.in > file.out
6.888 Spring 2013 - Sanchez and Emer L16
15
7/30/2019 L16-fpgas
16/20
Latency-Insensitive Design: A Higher Semantic
Inter-module communication by latency insensitive channels Changing the timing behavior of a module does not affect functional correctness of
the program
Many HW designs use this methodology Improved modularity Simplified design-space exploration
Implemented with guarded FIFOs in current RTLs
Control
Timing Partition
ExeDecodeFetch
FPGA
Functional Partition
ExeDecodeFetch
Control Partition
16
7/30/2019 L16-fpgas
17/20
FPGA
FPGA1
FPGA0
Latency-Insensitive Design: A Higher Semantic
Timing Partition
ExeDecodeFetch
Functional Partition
ExeDecodeFetch
Control
Control Partition
17Behavior of LI channels does not affect functional correctness.
7/30/2019 L16-fpgas
18/20
Latency-Insensitive Design: A Higher Semantic
There are many FIFOs in the design It may not be safe to modify some of them
Compilers see only wires and registers Reasoning about cycle accuracy is difficult
Control
Timing Partition
ExeDecodeFetch
FPGA
Functional Partition
ExeDecodeFetch
Control Partition
18But the programmer knows about the LI property
7/30/2019 L16-fpgas
19/20
A Syntax for LI Design
Programmer needs todifferentiate LI channels fromnormal FIFOs
Latency-Insensitive Send/Recvendpoints Implementation chosen by
compiler
FIFO order Guaranteed delivery
Explicit programmer contract Unspecified buffering &
unspecified latency
Programmer responsible forcorrect annotation
module mkTimeP;
Send#(Inst) send
7/30/2019 L16-fpgas
20/20
Pla8orm_1
Pla8orm_1
Comms
Pla8orm_0
Pla8orm_0
Comms
Connected User Application
mkApplicaon
mkB_Stub
mkC
mkA_Stub
mkApplicaon_Stub
mkB
mkC_Stub
mkA
Library GeneratedUserKey: 6.888 Spring 2013 - Sanchez and Emer L16
20