Top Banner
Selling an executable data flow graph based IR John Yates
39

MathWorks Interview Lecture

Apr 16, 2017

Download

Documents

John Yates
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MathWorks Interview Lecture

Selling an executabledata flow graph based IR

John Yates

Page 2: MathWorks Interview Lecture

Order of presentation

• Who am I and why am I here?• 2010: Netezza needs a new architecture• A family of statically typed acyclic DFG IRs• (Time permitting: Some engineering details)• Q&A

Page 3: MathWorks Interview Lecture

“Who am I and why am I here?”

(with apologies to Adm. Stockdale)

Page 4: MathWorks Interview Lecture

1970: Maybe I’ll be a programmer

• NYC hippie, ponytail, curled handlebar mustache

• Liberal arts high school, lousy student• Wanted to build things, real things• Computers seemed interesting and intuitive• Luckily in 1970 programmers were scarce

Page 5: MathWorks Interview Lecture

40 years…– 1970: learning the craft, various jobs (all in assembler)– 1978: Digital Equipment Corp

• Pascal frontend, dynamic programming code selector– 1983: Apollo Computer

• Designed RISC ISP w/ explicit parallel dispatch (pre-VLIW)• Lead architect for RISC backend optimizer; built team• 1st commercial: SSA IR, SW pipeliner, lattice const prop

– 1992: Binary translation: DEC (sw), Chromatic (hw-support)• More SSA IR, lowering; built teams; lot of patents (many hw)

– 1999: Everfile - NFS-like Win32 internet file system– 2002: Netezza, badge #26

• Storage: compression, indices, access methods, txns, CBTs

20+

year

s

Page 6: MathWorks Interview Lecture

2010: Netezza needsa new architecture

Page 7: MathWorks Interview Lecture

Data parallel analytics engine

• Data partitioned across a cluster of nodes– Multiple “slices” per node to exploit multi-core

• Execution model:– Leader accepts query, produces an execution plan– Leader broadcasts plan’s parallel components– Cluster performs data parallel work– Leader performs work requiring a single locus

• Competition: Teradata, Green Plum, DB2, …

Page 8: MathWorks Interview Lecture

Netezza’s architecturePG Plan

Split

1

Split

2

Gen

FPGA

Gen

C++

Gen

C++

Com

pil

eCo

mpi

le

Load

DL

L

Bcas

t

Load

DL

LLo

adFP

GA

Exec

ute

Exec

ute

N workers

Page 9: MathWorks Interview Lecture

Latency

Netezza’s problemsPG Plan

Split

1

Split

2

Gen

FPGA

Gen

C++

Gen

C++

Com

pil

eCo

mpi

le

Load

DL

L

Bcas

t

Load

DL

LLo

adFP

GA

Exec

ute

Exec

ute

Very simplistic code generator:-Lowering across an enormous semantic gulf- No intermediate representation- Very complex, very fragile- Difficult to implement much more than general case code patterns

Hardwaredevelopmenttime scales

N workers

Page 10: MathWorks Interview Lecture

Garth’s incomplete Marlin vision

• What is the real input to the interpreter?• How do we get from query plan to that form?

PG Plan

Split

Bcas

t

Inte

rpre

t(fa

ster

?)

Inte

rpre

t(fa

ster

?)

N workers

Unspecifiedmiracle

Multi-core?

Page 11: MathWorks Interview Lecture

A family of statically typed acyclic data flow graph IRs

Page 12: MathWorks Interview Lecture

Working backwards

• Graph• Dataflow• Acyclic• Statically typed• A family of … IRs

Page 13: MathWorks Interview Lecture

Graph

• Operators– Label names a function– Edge connections in and out

• Edges– Directed (“dataflow”)

Page 14: MathWorks Interview Lecture

Dataflow

• Dataflow machines– Apply history, wisdom, insights to the interpreter

• Value semantics– All edges carry data– No other kinds of edges (i.e. no anti-dependence)– No updatable shared state (i.e. no store)

• Expose all opportunities for concurrency

Page 15: MathWorks Interview Lecture

Acyclic

• No backedges ≡ no cycles J• Can exploit topological ordering– Fact propagation: rDFS (forward) or DFS (reverse)– No iteration, guaranteed termination– Linear algorithms, O(graph)

Page 16: MathWorks Interview Lecture

Statically typed

• Edges initially have unknown type• A well-formed graph can be statically typed– Linear pass over topologically ordered Operators– Assign edge types per Operator descriptors– Inconsistencies can be diagnosed and reported

Page 17: MathWorks Interview Lecture

• Well-nested subsets of edge type vocabularies• Constraining edge types constrains operators

A family of … IRsPG Plan

Split

Bcas

t

Inte

rpre

t

N workers

Low

er

and

Opt

Low

er

and

Opt

Low

er

and

Opt

Inte

rpre

t

Tree

pa

ttern

s

Grap

h 1

patte

rns

Grap

h 2

patte

rns

High level tree - tuplesHigh level graph - tuplesMid level graph - nullable valuesLow level graph - values

Commonpatternnotation

Topo

ex

pand

, in

sert

CL

ON

Es

Topo

ex

pand

, in

sert

CL

ON

Es

Page 18: MathWorks Interview Lecture

Nothing convinces like working code

• First delivery– Table drive operator semantics– Utilities: build, edit & expand– Topologically sort– Type check & report errors

Split

Bcas

t

Inte

rpre

t

N workers

Inte

rpre

t

Topo

ex

pand

, in

sert

CL

ON

Es

Topo

ex

pand

, in

sert

CL

ON

Es

Grap

h as

sem

ble

r

Graphassemblyprogram

Page 19: MathWorks Interview Lecture

Sold!

• Working code rendered mysuccessive lowerings idea credible

• Overall Marlin added ~10 engineers; I got 3• My team got itsfirst end-to-end test case working

PG Plan

Split

Bcas

t

Inte

rpre

t

N workers

Low

er

and

Opt

Low

er

and

Opt

Low

er

and

Opt

Inte

rpre

t

Tree

pa

ttern

s

Grap

h 1

patte

rns

Grap

h 2

patte

rns

Topo

ex

pand

, in

sert

CL

ON

Es

Topo

ex

pand

, in

sert

CL

ON

Es

Page 20: MathWorks Interview Lecture

IBM killed the Marlin program…

• Marlin was a clean up project promising…– Performance and shorter development cycles– But no new features nor functionality

• It is always hard to fund significant clean up– Especially if not legitimately tied to a coveted feature

• Harder if your company is under duress• Harder still if DB2 is gunning for your headcount

Page 21: MathWorks Interview Lecture

Question?

Page 22: MathWorks Interview Lecture

Some engineering details

Page 23: MathWorks Interview Lecture

Why clone?

• After expansion all edges are point-to-point– No output is multiply-consumed

• Chunk handoff along an edge becomes trivial– Think C++11’s new move semantics

• So only clones implement reference counting

Page 24: MathWorks Interview Lecture

Broadcast

• Serialize / deserialize• On network size matters• Graph object– Small number of scalar members– Handful of C++ vector (some ephemeral)– Position independent (no pointers in vectors)

Page 25: MathWorks Interview Lecture

No pointers

• Pointers index the linear address space– Implicit context (there is only one address space)

• Unsigned as vector index– User must provide explicit context (vector base)– 32 bit indices are ½ the size of 64 bit pointers– Position independence simplifies serialization

Page 26: MathWorks Interview Lecture

The graph object

• Exposed read-only data– Vector of Operator objects– Vector of EdgeIn objects– Vector of EdgeOut objects– Literal table and pool

• Private data (may be missing or elided)– Vector of EdgeIn next links– Vector of Operator BreadCrumbs

Page 27: MathWorks Interview Lecture

Discardable elements

• vecBc: BreadCrumbs vector• vecNxt: EdgeIn sibling links• LiteralPool hash table array

Page 28: MathWorks Interview Lecture

Graph vector detailsVector Index Type Element Type Element Sizeg.vecOp OperatorIndex Operator 16 bytes

g.vecOut EdgeOutIndex EdgeOut 8 bytes

g.vecIn EdgeInIndex EdgeIn 8 bytes

g.lit LiteralKey Literal multiple of 8 bytes

g.vecNxt EdgeInIndex EdgeInIndex 4 bytes

g.vecBc OperatorIndex BreadCrumb 4 bytes

Page 29: MathWorks Interview Lecture

Connectivity: Operator objects

• Operator private members– Operator’s edges are sub-vectors of g.vecIn, g.vecOut– Start of EdgeIn objects: EdgeInIndex baseIn_;

– Start of EdgeOut objects: EdgeOutIndex baseOut_;• Number of connections– Inputs: vecOp[x+1].baseIn_ - vecOp[x].baseIn_– Outputs: vecOp[x+1].baseOut_ - vecOp[x].baseOut_

Page 30: MathWorks Interview Lecture

Connectivity: EdgeIn objects

• EdgeIn private members– Sink Operator: OperatorIndex dstOp_;

– Source EdgeOut: EdgeOutIndex src_;

• EdgeIn connection position– Use pointer arithmetic:this - (vecIn + vecOp[dstOp_].baseIn_);

Page 31: MathWorks Interview Lecture

Connectivity: EdgeOut objects

• EdgeOut private members– Source Operator: OperatorIndex srcOp_;

– Sink EdgeIn: EdgeInXIndex dst_;

• EdgeOut connection position– Use pointer arithmeticthis - (vecOut + vecOp[srcOp_].baseOut_);

Page 32: MathWorks Interview Lecture

Working with XG

Page 33: MathWorks Interview Lecture

Thin graph constructionMethod Effect

graph.add(BreadCrumb, Op, Locus, Expansion, unsigned nVarIn =0, unsigned nVarOut =0);

Add an Operator and its Edge resources

graph.connect(OperatorIndex srcOp, unsigned srcPos, OperatorIndex dstOp, unsigned dstPos);

Guarantee a srcOp[srcPos] to dstOp[dstPos] edge exists

Page 34: MathWorks Interview Lecture

Whole graph operationsOperation Effect

Graph(); Construct an empty Graph

void done(); Topo sort and type check

Graph(Graph const thinGraph&, bool forSpu); Partitioning constructor

BinStream& operator << (BinStream&, Graph const&); Put to a BinStream (cheap)

BinStream& operator >> (BinStream&, Graph&); Get from a BinStream (cheap)

void expand(bool forSpu, Environment const& env); Expand, insert clones, etc.

Page 35: MathWorks Interview Lecture

Graph states and conversions

• Start with a “thin” graph• Leader plus one representative node and dataslice• Operators tagged with a locus and expansion rule• Outputs can have multiple consumers

• Partition into leader-side & node-side subsets• Expand based on loci and system topology

• Duplicate operators, adjust in and out arities, add sites• Expand edges: fan-in, fan-out, parallel• Introduce clones as needed

Page 36: MathWorks Interview Lecture

Graph overlay

• Template object publically derived from Graph• Macro hides lots of template boilerplate• User supplied types for parallel vectors– MyOperator ovOp[OperatorIndex]– MyEdgeIn ovIn[EdgeInIndex]– MyEdgeOut ovOut[EdgeOutIndex]

• Constructor shares vectors and LiteralTable

Page 37: MathWorks Interview Lecture

1973: Began 2-axis controller I wrote every line of code (in assembler)

Page 38: MathWorks Interview Lecture

1975: First installation 0.5 MegaWatt torch cutting up to ¾”

steel plate at Marion Power Shovel

Page 39: MathWorks Interview Lecture

1975: Torch on… I was hooked!