Top Banner
HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible Michael Adler Elliott Fleming Michael Pellauer Joel Emer
35

HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible

Feb 22, 2016

Download

Documents

temima

HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible. Michael Adler Elliott Fleming Michael Pellauer Joel Emer. Outline. Problem & goals Basic model structure Modeling a pipelined microarchitecture Modeling memory hierarchies Modeling multiprocessors - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible

Michael AdlerElliott FlemingMichael PellauerJoel Emer

Page 2: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

2

Outline

• Problem & goals• Basic model structure• Modeling a pipelined microarchitecture• Modeling memory hierarchies• Modeling multiprocessors• FPGA implementation details

Page 3: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

3

Standard Scaling Problem Slide

• Single core targets: model performance scaled with processor speed• Multi-core targets: problem size grows with each generation

• Solutions:

– Reduce fidelity:• Shorter runs• Subset of available cores• Lightweight model

– Structural simulator change:• Parallelize it• Find a new method

Page 4: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

4

Dependence Problems in Parallel Software Models

Option 1: Target CPUs ➞ Simulator Threads– Uncore causes dependence between simulator threads– High performance models (e.g. Graphite) relax the dependence

Fetch Decode Execute

Core 0 Core 1Uncore

Option 2: Target Pipeline Stages ➞ Simulator Threads– Lots of data movement– Cyclic pipelines impose complex dependence

Page 5: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

5

Why is Hardware Difficult to Model in Software?

• Constant data movement through pipelines• Many points of dependence between “parallel” regions• Large, irregular, memory footprint• Difficult to vectorize• Branchy

Page 6: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

6

Software Model Compromises

• Speed: Detailed model– Slow– Studies limited by run-time (e.g. large cache replacement policy)

• Accuracy: Simplified model– Model writer makes decisions about fidelity, hoping not to affect

predictions– Multi-core interactions remain difficult to parallelize

Find a new method?

Page 7: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

7

FPGAs

• Shares the same properties as the target machine– Abundant wires– “ parallelism– “ registers

• Obvious mapping of pipelines• Already ubiquitous for RTL verification• Fast

Detailed FPGA models are often faster than simple models!

Page 8: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

8

Aggregate Simulator Throughput (Parsec Black-Scholes)

Page 9: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

9

Classification of FPGA-Based Designs

Page 10: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

10

Prototype

• Final RTL, mapped to a different technology– E.g. an ASIC emulated on an FPGA

• This is what most people imagine for FPGA-based models

Characteristics:• Useful for verification before producing final hardware

– Shorter debugging loop– Internal state is more visible than final hardware– Masks are expensive

• Too late to make big micro-architectural decisions• Often too large to fit on a single FPGA• Often too late or too slow to be useful for software development

Page 11: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

11

Functional Emulator

• Model architectural semantics• No prediction of run-time

Characteristics:• Can be written faster than prototypes• Potentially more FPGA-area efficient

– Use FPGA-friendly structures (e.g. no big CAMs)– Multiplex functional pipelines (like SMT)

• Useful as a software development platform• Not useful for microarchitectural research

Page 12: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

12

Model

• Project metrics of interest (e.g. timing, power, reliability)• Emulate functional behavior as needed to compute metrics

Characteristics:• Metric may be computed algorithmically (even time)• An extension of functional emulators: function + metrics

Page 13: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

13

Model Terminology

Modeling hardware on hardware leads to terminology confusion:– Both have caches, pipelines, memories…

• Target machine means the microarchitecture being studied• FPGA, functional-model and timing-model all refer to

implementation details. (E.g. functional memory cache is an FPGA structure.)

• Host is the general purpose machine to which FPGAs are connected

Page 14: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

14

Why isn’t everyone building timing models with FPGAs?

Page 15: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

15

Fast, Accurate or Now?

Accuracy

Development TimeModel Speed

Page 16: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

16

FPGA Picture is Different

Accuracy

Development TimeModel Speed

Page 17: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

17

Reducing Development Time: Managing Complexity

Use FPGAs while focusing on my algorithm? HAsim LEAP

Development Time

Model time? A-Ports Re-use components?

Split functional / timing models AWB

Fit a large problem on FPGAs? Multiplexing Latency Insensitivity Multiple FPGAs

How do I:

Page 18: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

18

STDIO on General Purpose Machines

FILE *f = fopen(path, “w”);const char *name = “Kenneth”;fprintf(f, “%s, what is the frequency?\n”, name);

Page 19: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

19

I/O In Hardware Description Languages (System Verilog)

Integer f = fopen(path, “w”);string name = “Kenneth”;fwrite(f, “%s, what is the frequency?\n”, name);

Page 20: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

20

Nothing Comes from Nothing

FPGAs have:• No standard physical device• No standard device model• No standard system interface• No standard API

Page 21: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

21

What Makes Hardware General Purpose?

The software!

• Compilers and library APIs make code “universal”• Hardware standards (ACPI, PCIe) make OS development and

compiler writing easier. Little impact on user programs.• ISA matters if you want to avoid recompiling. ISA is part of the

software API, along with standard libraries.

Page 22: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

22

LEAP Platform

RRR

Platform Interface

STDIOScratchpadMemory

Control

Timing Partition

Functional Partition

Remote Memory Channel

FPGA Physical Platform

ExeDecodeFetch

RRR

Channel

Software Physical Platform

VirtualPlatform

Control

Software Services

StreamsMemoryStateEmulate

VirtualPlatform

FPGA Software

Page 23: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

23

Hello World in LEAP

module [CONNECTED_MODULE] mkConnectedApplication ();

STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");

Reg#(STATE) state <- mkReg(STATE_start);

rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule

endmodule

Page 24: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

24

Bluespec on One Foot

• Functional language derived from Haskell• Generates Verilog• Modules – the analog of C++ classes

– May be polymorphic (types are abstract)

• Methods are the callable routines exposed by modules– Inlined statically at compile time into a calling rule

• Rules are:– Executed atomically– Guarded (predicated)

• Guard is both explicit (user specified) and implicit• Implicit guards come from guards on methods called in a rule

Page 25: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

25

Hello World in LEAP

module [CONNECTED_MODULE] mkConnectedApplication ();

STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");

Reg#(STATE) state <- mkReg(STATE_start);

rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule

endmodule

main()

Page 26: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

26

Hello World in LEAP

module [CONNECTED_MODULE] mkConnectedApplication ();

STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");

Reg#(STATE) state <- mkReg(STATE_start);

rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule

endmodule

Control Logic

Page 27: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

27

Hello World in LEAP

module [CONNECTED_MODULE] mkConnectedApplication ();

STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");

Reg#(STATE) state <- mkReg(STATE_start);

rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule

endmodule

STDIO

Page 28: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

28

LEAP Gives FPGAs Key “General Purpose” Properties

Virtual Platform– I/O– Virtual memory abstract ion (scratchpads)

Topology– Named channels (FIFOs) instead of hard-coded wires– Host/FPGA remote procedure calls– Automated mapping to multiple FPGAs

Debugging Aids– Deadlock detection– Automated scan chains– User scan chains

Page 29: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

29

LEAP Platform Users

• HAsim timing models• Prototypes

– SSD Functional Model– AirBlue wireless network stack

• Algorithmic accelerators– H.264 decoder– Matrix multiplication– …

Page 30: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

30

Key Concept: Latency Insensitivity

Page 31: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

31

Latency Insensitive Channel Semantics

• Guaranteed:– FIFO– Accurate– Always allow at least one message to be in flight

• Not guaranteed:– Latency

Why?– Allows for replacement of algorithms – even to software– Permits use of hierarchical memories (caches)– Simplifies communication – especially off-chip

This is a common software strategy (pipes, TCP/IP, pthread mutex)

Page 32: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

32

Named Channels

• Name both endpoints of a FIFO• Software builds the connection• Replaces user’s hand-routed Verilog channels• Automatically route, even across FPGAs

Common in software:– Named ports in software timing models– UUCP has been dead for a long time (for a reason)

Page 33: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

33

Page 34: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

34

Finally, an Explanation of our Project’s Name

LINC: Latency-Insensitive Named Channel

LEAP: LINC-based Environment for Application Programming

HAsim: Hardware-based micro-Architecture Simulator

Page 35: HAsim FPGA-Based Processor Models: Fast, Accurate and  Flexible

35

http://asim.csail.mit.edu/redmine