Top Banner
Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University
37

Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Compiling Application-Specific Hardware

Mihai Budiu

Seth Copen Goldstein

Carnegie Mellon University

Page 2: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Resources

Page 3: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Problems

• Complexity

• Power

• Global Signals

• Limited issue window => limited ILP

We propose a scalable architecture

Page 4: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Outline

• Introduction• ASH: Application Specific Hardware

• Compiling for ASH• Conclusions

Page 5: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Application-Specific HardwareC program

Compiler

Dataflow IR

Reconfigurable hardware

Page 6: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Our Solution

General: applicable to today’s software - programming languages

- applications

Automatic: compiler-driven

Scalable: - run-time: with clock, hardware - compile-time: with program size

Parallelism: exploit application parallelism

Page 7: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Asynchronous Computation

+

data

datavalid

ack

Page 8: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

New

• Entire C applications

• Dynamically scheduled circuits

• Custom dataflow machines

- application-specific

- direct execution (no interpretation)

- spatial computation

Page 9: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Outline

• Scalability• Application Specific Hardware• CASH: Compiling in ASH

• Conclusions

Page 10: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

CASH: Compiling for ASH

Memory partitioning

Interconnection net

Circuits

C Program

RH

Page 11: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Primitives+Arithmetic/logic

Multiplexors

Merge

Eta (gateway)

Memory

data

predicates

datapredicate

ld st

Page 12: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Forward Branches

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Decoded mux

Conditionals => Speculation

Page 13: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Critical Paths

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Page 14: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Lenient Operations

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Solve the problem of unbalanced paths

Page 15: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

!

ret

i

+1< 100

0

*

+

sum

0

Loops

int sum=0, i;

for (i=0; i < 100; i++)

sum += i*i;

return sum;

Control flow => data flow

Page 16: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Compilation

• Translate C to dataflow machines

• Optimizationssoftware-, hardware-, dataflow-specific

• Expose parallelism – predication– speculation– localized synchronization– pipelining

Page 17: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Pipeliningi

+

<=

100

1

*

+

sum

pipelinedmultiplier

Page 18: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Pipeliningi

+

<=

100

1

*

+

sum

Page 19: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Pipeliningi

+

<=

100

1

*

+

sum

Page 20: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Pipeliningi

+

<=

100

1

*

+

sum

Page 21: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Pipeliningi

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

Longlatency pipe

Page 22: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Pipeliningi

+

<=

100

1

*

+

sum

Page 23: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Pipeliningi

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

Longlatency pipe

predicate

Page 24: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Predicate ackedge is on thecritical path.

Pipeliningi

+

<=

100

1

*

+

sum

critical pathi’s loop

sum’s loop

Page 25: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Pipeliningi

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

decouplingFIFO

Page 26: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Pipeliningi

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

critical path

decouplingFIFO

Page 27: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

ASH Features

• What you code is what you get– no hidden control logic– lean hardware

(no CAM, multi-ported files, etc.)– no global signals

• Compiler has complete control

• Dynamic scheduling => latency tolerant

• Natural ILP and loop pipelining

Page 28: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Conclusions

• ASH: compiler-synthesized hardware from HLL

• Exposes program parallelism

• Dataflow techniques applied to hardware

• ASH promises to scale with:

– circuit speed

– transistors

– program size

Page 29: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Backup slides

• Hyperblocks• Predication• Speculation• Memory access• Procedure calls• Recursive calls• Resources• Performance

Page 30: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Hyperblocks

Procedure back

Page 31: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Predication

p !p

q

if (p) .......q

if (!p) .......

hyperblock

back

Page 32: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Speculation

q

if (!p) ......

q

if (!p) ......

ops w/ side-effects

back

Page 33: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Memory Access

back

load

addresspredicate

token

tokendataLoad-store

queue

store

address pred token

token

data

Inte

rcon

nect

ion

netw

ork

Memory

Page 34: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Procedure calls

back

Inte

rcon

nect

ion

netw

ork

Extract args

ret

result caller

Procedure P

call P

args

Page 35: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Recursion

recursive call

save live values

restore live values

hyperblock

stack

back

Page 36: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Resources

• Estimated SpecINT95 and Mediabench

• Average < 100 bit-operations/line of code

• Routing resources harder to estimate

• Detailed data in paper

back

Page 37: Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Performance• Preliminary comparison with 4-wide OOO• Assumed same FU latencies• Speed-up on kernels from Mediabench

0

0.5

1

1.5

2

2.5

3

3.5

adpc

m_e

adpc

m_d

gsm

_e

gsm

_d

epic_

e

epic_

d

mpe

g2_d

jpeg_

e

pegw

it_e

pegw

it_d

g721

_e

g721

_d

back