Top Banner
Computer Architecture Dataflow Machines
27

Computer Architecture Dataflow Machines

Dec 31, 2015

Download

Documents

Computer Architecture Dataflow Machines. Data Flow. Conventional programming models are control driven Instruction sequence is precisely specified Sequence specifies control which instruction the CPU will execute next Execution rule: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Architecture Dataflow Machines

Computer Architecture

Dataflow Machines

Page 2: Computer Architecture Dataflow Machines

Data Flow

• Conventional programming models are control driven• Instruction sequence is precisely specified• Sequence specifies control

• which instruction the CPU will execute next

• Execution rule:• Execute an instruction when its predecessor

has completed s1: r = a*b;s2: s = c*d;s3: y = r + s;

s2 executes when s1 is completes3 executes when s2 is complete

Page 3: Computer Architecture Dataflow Machines

Data Flow• Consider the calculation

• y = a*b + c*d

• Represent it bya graph• Nodes represent

computations• Data flows along

arcs

• Execution rule:• Execute an instruction

when its data is available• Data driven rule

a b

x

+

d c

x

y

Page 4: Computer Architecture Dataflow Machines

Data Flow• Dataflow firing rule

• An instruction fires (executes)when its data is available

• Exposes all possible parallelism• Either multiplication can

fire as soon as data arrives• Addition must wait

• Data dependence analysis!• Instruction issue units:

• Fire (issue) each instructionwhen its operands (registers) have been written

a b

x

+

d c

x

y

Page 5: Computer Architecture Dataflow Machines

Data Flow - Realisations• Several Experimental Machines built

• Manchester Gurd & Watson

• Tagged Token Arvind, MIT

• SigmaETL, Tsukuba

• EMC-4 ETL, Tsukuba

• Monsoon Arvind, MIT

• EMX ETL, Tsukuba

• RAPID Osaka/Sharp/Mitsubishi(Asynchronous!)

• Naiad Tasmania

and some others

Page 6: Computer Architecture Dataflow Machines

Data Flow - Realisations

• Manchester

Page 7: Computer Architecture Dataflow Machines

Data Flow - Program• Program word

• Matching Store Entry

• When both Presence Flags are Y,this packet is despatched to a PE (any PE!)

Operation+, -, *, /

etc

Left, RightOperands Presence

Flags

DestinationAddress

DestinationLeft or Right

Page 8: Computer Architecture Dataflow Machines

Data Flow - Matching Store

• Special purpose memory• Limited processing capability• Detects full slots• Despatches operation packets to any idle PE

Operation+, -, *, /

etc

Left, RightOperands Presence

Flags

DestinationAddress

DestinationLeft or Right

Page 9: Computer Architecture Dataflow Machines

Data Flow - Processing Elements• Receive operation packets

• Generate result• Form result packet• Despatch to matching store

Page 10: Computer Architecture Dataflow Machines

Data Flow - EM4• Architects

• Yamaguchi,Sakai, Kodama,Sato et al

• ElectroTechnicalLaboratory,Tsukuba,Japan

• PE (EM-Y)• CMOS Gate Array• 80k gates / 1.0• f = 20MHz• ~1992

Page 11: Computer Architecture Dataflow Machines

Data Flow - Monsoon• Architects

• Papadopoulos, Culleret al

• MIT, Cambridge

• PE • f = 10MHz• ~1990

• I-StructureProcessor

Page 12: Computer Architecture Dataflow Machines

Data Flow - I-Structures• Memory with a presence bit

• Tag each memory location with a bitindicating its validity

• Valid bit set -> normal read (no wait)

• Data not yet written (valid bit not set)WaitRead requests queued

Data driven execution

• Operations proceed when data is available

valid validdata data valid data

Page 13: Computer Architecture Dataflow Machines

Data Flow - Monsoon Pipeline

• 8 stage pipeline• “Presence bits”

checks operandavailability

• Frame (coarse grain)basis

Page 14: Computer Architecture Dataflow Machines

Data Flow - Summary• Fine-Grain Dataflow

• Suffered from comms network overload!

• Coarse-Grain Dataflow• Monsoon ...

• Overtaken by commercial technology!!

• A sad “fact-of-life”• It’s almost impossible to generate the funds

for non-”mainstream” computer architecture research

• $n x 108 required • Non-mainstream = interesting!

Page 15: Computer Architecture Dataflow Machines

Data Flow - Summary• As a software model …

• Functional languages • Dataflow in a different guise! • Theoretically

• important

• Practically?• Inefficient ( = slow!!) • ….. Ask your CS colleagues!

• Cilk - based on C• Used on CIIPS Myrmidons• Uses a dataflow model

• Threads become ready for execution when their data is generated

• Message passing efficiency• Without explicit data transfer & synchronisation!

Page 16: Computer Architecture Dataflow Machines

Networks

• Network Topology (or shape)• Vital to efficient parallel algorithms• Communication is the limiting factor!

• Ideal• Cross-bar

• Any-to-any• Non-blocking

• Except two sources to same receiver

• Realisable• But only for limited order (number of ports)

Page 17: Computer Architecture Dataflow Machines

Networks

• Cross-bars• Achilles

• 8 x 8• Full duplex

• Simultaneous Input and Outputat each port

• 32 bit data-path• Target :

1Gbyte / second total throughput but we needed the 3-D arrangement to achieve

• bandwidth• high order

Page 18: Computer Architecture Dataflow Machines

Networks

• Cross-bars• Achilles

• Hardwarealmost trivial!

• Single FPGAon each level

• Programmable• VHDL Models

• Several topologies

• Just by changing thesoftware!

Page 19: Computer Architecture Dataflow Machines

Networks - More than 8 PEs

• Simple• Use 2 8x8 routers!

but ….This linkgets a lot of traffic!

Page 20: Computer Architecture Dataflow Machines

Networks - Fat tree

• Problem:• High-traffic links between PEs can become a bottleneck

• Solution: Fat-tree• Links higher up the tree are “fatter”• Sustainable bandwidth between all PEs is the same

Page 21: Computer Architecture Dataflow Machines

Networks - Performance Metrics

• Metrics for comparing network topologies• Diameter

• Maximum distance between any pair of nodes• Determines latency

• Bisection Bandwidth• Aggregate bandwidth over any “cut”

which divides the network in half• Determines throughput

• Crossbar• Diameter: 1

• Every PE is directly connected to routerso a single “hop” suffices

• Bisection Bandwidth: b bytes/sec• b is the bandwidth of a single link

Page 22: Computer Architecture Dataflow Machines

Networks - Performance Metrics

• Metrics for comparing network topologies• To connect n PEs with mxm crossbars• Single link bandwidth b bytes/s

• Simple: n = 14 (2 switches)• Diameter 3

• Bisection Bandwidth b

1

2

3

Page 23: Computer Architecture Dataflow Machines

Networks - Performance Metrics

• Fat-tree• Diameter: 2 logmn

• Height is logmn

• Worst case distance - up and down

• Bisection Bandwidth: b n/2 bytes/sec• Links are fatter higher up the tree

logmn

Page 24: Computer Architecture Dataflow Machines

Networks - Performance Metrics

• Mesh• Diameter: 2n-2• Bisection Bandwidth: b n bytes/sec• Order: 4

Page 25: Computer Architecture Dataflow Machines

Networks - Performance Metrics

• Hypercube• Hypercube of order m• Link 2 order m-1 hypercubes with 2m-1 links• Number of PEs: n = 2m

• Order: log2n = m

Order 2 Hypercube Order 2

Hypercube

Order 3 Hypercube

Page 26: Computer Architecture Dataflow Machines

Networks - Hypercubes

• Embedding property• In an n PE hypercube,

we have hypercubes of size n/2, n/4, …• Number PEs with binary numbers

• 000, 001, 010, 011, 100, …• Joining two hypercubes

• add one binary digitto the numbering

• Each PE is connectedto every PE whoseindex differs in only one bit

Page 27: Computer Architecture Dataflow Machines

Networks - Hypercubes

• Embedding property• Partitioning tasks

• Allocate to sub-cubes• Sub-tasks allocated to

sub-cubes of that cube,etc