Top Banner
Wide-SIMD Parallelization of Streaming Dataflow, with Applications to Bioinformatics Jeremy Buhler For CSE 591 NSF Awards CNS-1500173, CNS-1763503
68

Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Jun 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Wide-SIMD Parallelization of Streaming Dataflow, with

Applications to Bioinformatics

Jeremy Buhler For CSE 591

NSF Awards CNS-1500173, CNS-1763503

Page 2: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Take-Home Message

• Biological sequence analysis is a source of high-impact computational problems

• Using SIMD parallel computing for these problems requires dealing with irregularity

• MERCATOR is an ongoing research effort to make irregular application development on SIMD platforms easier.

2

Page 3: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Who Am I? • I study how to accelerate high-

impact bioinformatics problems.

• One way to do this is via parallelization on modern architectures (FPGAs, GPUs, …)

• Along the way, many interesting CS questions… – Streaming computation [FCCM’07, JVSP’07,M&M’09] – Systolic array design [FPL’09,FCCM’10,ASAP’10] – Deadlock avoidance [SPAA’10,PPoPP’12,DFM’13,JPDC’17] – SIMD mapping [ISPDC’14,DFM’15,HPCS’17]

3

Page 4: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Talk Overview

• Problems: DNA comparison and read mapping

• Algorithmic approach – Why SIMD?

• MERCATOR overview and performance

• Research challenges

4

Page 5: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Molecular Biology is Fundamental

• Genetic basis of disease and disease risk

• Systems biology – what are your cells doing?

• Studying natural history and evolution

• Engineering cells’ behavior for medicine, industry, agriculture

5

Page 6: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

The First Step: DNA Sequencing

• Sequencing can tell us what is in a genome…

• … but also the basis of experiments to probe gene expression, protein binding, chromosome

conformation, epigenetic marks, polymorphism, copy number variants …

…acaggatagtaccgataccat cacccggataggacctatgag ggacacaggacttatggcattt… 6

Page 7: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

7

Page 8: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

Billi

ons o

f DN

A Ba

ses

NIH Sequence Read Archive Size (Open Access Only)

http://www.ncbi.nlm.nih.gov/Traces/sra/

8

Page 9: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Problem: Classical Similarity Search

• Given – a genome-sized or larger DNA

sequence database D – a “query” sequence q of some

length L << |D|

• Does q appear in D with at most k differences, and if so, where?

9

Page 10: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Typical Parameters

• Database D has size 109 – 1010 bases

• Query q has size 102 – 104 bases

• # differences k is 5-25% of |q| (bases added, deleted, changed)

10

Page 11: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Tools for Similarity Search

• BLAST [Altschul et al. 1990, 1996]

• BLAT [Kent 2002]

These tools use variants of same basic search algorithm.

11

Page 12: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Problem: Short-Read Mapping

• Given – a genome-sized or larger DNA

sequence database D – N “reads” – DNA seqs of some

length L << |D|

• For each read, does it appear in D with at most k differences, and if so, where?

12

Page 13: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Typical Parameters

• Database D has size 109 – 1010 bases

• Number of reads N is 106 – 108

• Length L is 75-150 (may vary among reads)

• # differences k is 0-3 (added, deleted, changed)

13

Page 14: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Tools for Mapping

• Bowtie [Langmead et al. 2009, 2012]

• BWA [Li & Durbin 2009, 2010]

• SOAP2 [Li et al. 2009]

All these tools use variants of same basic search algorithm.

14

Page 15: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Why Short-Read Mapping?

• Some experimental procedure selects a subset of everything in the database

• Reads are sampled from this subset by your sequencing machine

• Mapping tells you which parts of database are present in your sample

15

Page 16: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Problem: Alignment-Free Organism ID

• Given – a metagenome-sized DNA

sequence database D – N microbial genomes – DNA seqs

of some length L << |D|

• For each genome, do (some of) its sequences appear in D?

16

Page 17: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Alignment-Free Techniques

• Min-Hash Sketching – convert a seq to a small sample (m ~ 1000) of hash values

• Approximate Containment: how much of (the sketch for) a genome overlaps (the sketch for) a metagenome?

• MASH (Ondov et al. 2016) • SourMASH (Brown et al. 2016)

17

Page 18: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Talk Overview

• Problems: DNA alignment and read mapping

• Algorithmic approach – Why SIMD?

• MERCATOR overview and performance

• Research challenges

18

Page 19: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

How BLAST Works

19

Substring

Matching

Gapped

Filter Ungapped

Filter

SEQUENCES REMAINING

COMPUTATIONAL COST

BLAST operates as a pipeline of computational stages.

Page 20: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Stages of BLAST

• Stage 1: identify potential match locations between q, D

• Stage 2: keep only those locations that look somewhat promising

• Stage 3: keep only those locations that actually yield high-similarity alignments

20

Page 21: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Generating Possible Matches

• Every place where some 11-mer from q matches an 11-mer from D exactly is a candidate.

• Can rapidly find all such matching locations using hash table of 11-mers in sequence q

21

accagatacatagcactcgctacgtcagatgggtaca gttaagtcagatgggtagactcaggatgacagtggaca

Page 22: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Filtering Candidates

• Uses explicit edit distance computation between q, part of D (Smith-Waterman algo)

• Expensive dynamic programming!

• “Easy” version (substitutions only), followed by hard version (add/delete chars allowed)

22

Page 23: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

BLAST Parallelization

• Can generate candidates in parallel at each DB location, then filter them in parallel.

23

Gen Candidates

Filter Ungapped

Filter Gapped

Page 24: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

What About Read Mapping?

• Uses an index (virtual suffix tree) of database

• Matching involves tracing a path down index tree for each read

• (must try several paths if differences are allowed)

• Can do in parallel for many reads at once!

24

Page 25: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Suffix Tree Example

D = acagaccaga$ 0 1 2 3 4 5 6 7 8 9 10

10 $

9 a$

0 aca…

4 acc…

7 aga$

2 agac…

6 caga$

5 cca…

1 cagac…

8 ga$

3 gac…

A

9 0 4 7 2 6 1 5 8 3

a c g a

$ c g a

a g a

c

$ c

$ c

a c $ c

… …

T

25

Page 26: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Rapid Matching vs Suffix Tree

• Can find all matches to a read in D in time proportional to read length L.

9 0 4 7 2 6 1 5 8 3

a c g a

$ c g a

a g a

c

$ c

$ c

a c $ c

… …

Where is cag?

26

Page 27: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Rapid Matching vs Suffix Tree

• Can find all matches to a read in D in time proportional to read length L.

9 0 4 7 2 6 1 5 8 3

a c g a

$ c g a

a g a

c

$ c

$ c

a c $ c

… …

Where is cag?

27

Page 28: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Rapid Matching vs Suffix Tree

• Can find all matches to a read in D in time proportional to read length L.

9 0 4 7 2 6 1 5 8 3

a c g a

$ c g a

a g a

c

$ c

$ c

a c $ c

… …

Where is cag?

28

Page 29: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Rapid Matching vs Suffix Tree

• Can find all matches to a read in D in time proportional to read length L.

9 0 4 7 2 6 1 5 8 3

a c g a

$ c g a

a g a

c

$ c

$ c

a c $ c

… …

Where is cag?

29

Page 30: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Rapid Matching vs Suffix Tree

• Can find all matches to a read in D in time proportional to read length L.

9 0 4 7 2 6 1 5 8 3

a c g a

$ c g a

a g a

c

$ c

$ c

a c $ c

… …

Where is cag?

D = acagaccaga$ 0 1 2 3 4 5 6 7 8 9 10

30

Page 31: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Extension to Inexact Matching

• To permit matches with k substitutions, try multiple paths, but charge for each mismatch.

• To permit matches with k differences, we do dynamic programming to compute edit distance of read against each path in tree.

• Descent stops for a read when we hit bottom of tree or find that path requires > k differences.

31

Page 32: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Parallel Alignment is a SIMD Computation

• We process every BLAST starting loc / every read through same filtering computation

• Single Instruction stream, Multiple Data items

Thread 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Page 33: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

SIMD Targets

• Our work: NVIDIA GPUs (32 SIMD lanes x 4+ threads x 2-64 cores)

• Other possibilities: any multicore with wide vector instructions (Intel Xeon, AMD, ARM, …)

• ~All modern processors have wide SIMD!

33

Page 34: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Batched Traversal (Short Reads)

34 9 0 4 7 2 6 1 5 8 3

Page 35: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Batched Traversal

35 9 0 4 7 2 6 1 5 8 3

Page 36: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Batched Traversal

36 9 0 4 7 2 6 1 5 8 3

x x x

Some reads may accumulate > k diffs before others

Page 37: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Batched Traversal

37 9 0 4 7 2 6 1 5 8 3

x x x x x

Page 38: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Batched Traversal

38 9 0 4 7 2 6 1 5 8 3

x x x x x x

Stop descending when all reads either have > k diffs or are completely matched with fewer diffs

Page 39: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Batched Traversal

39 9 0 4 7 2 6 1 5 8 3

x x x x x x

Continue on next branch starting from batch on top of stack

Page 40: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Batched Traversal

40 9 0 4 7 2 6 1 5 8 3

x x x x x x

x x x x x x x x

Page 41: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Batched Traversal

41 9 0 4 7 2 6 1 5 8 3

x x x x x x

Page 42: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Batched Traversal

42 9 0 4 7 2 6 1 5 8 3

x x x x

Page 43: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Performance?

• Each stage of BLAST costs more but processes less input.

• 98% of threads idle for 110/111 ms • 1.99% of threads idle for 100/111 ms • SIMD EFFICIENCY: 1.1%

43

Substring

Matching

Gapped

Filter Ungapped

Filter

1 ms 10 ms 100 ms

100% 2% 0.01%

Page 44: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Irregular Computations

• DNA alignment is an irregular computation: different inputs (i.e. DB locations, reads) require different amounts of work to process.

• Antithesis of, e.g., linear algebra calculations that are easily vectorized

• Irregular computations are highly inefficient if naively implemented on SIMD processors.

44

Page 45: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

The Key Problem

• How can we efficiently map irregular computations onto a SIMD architecture?

45

Page 46: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Talk Overview

• Problems: DNA alignment and read mapping

• Algorithmic approach – Why SIMD?

• MERCATOR overview and performance

• Research challenges

46

Page 47: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

47

Pause for MERCATOR demo

Page 48: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

MERCATOR Paradigm

• Application processes a long stream of inputs

• Application graph consists of nodes (computations), edges (data transfer)

• Data flows through graph of computations

• Irregularity: paths differ per input, each input to a node generates 0, 1, or multiple outputs 48

Page 49: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Handling Irregularity

• Each edge between nodes has a queue

• MERCATOR queues inputs to a node until there are enough to fill all its SIMD lanes

• Node is only fired when it has “full ensemble” of inputs in all lanes.

49

Page 50: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Illustration of Queues

50

Page 51: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Illustration of Queues

51

x

x

Page 52: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Illustration of Queues

52

Page 53: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Illustration of Queues

53

Page 54: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Illustration of Queues

54

x

x x

Page 55: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Illustration of Queues

55

Page 56: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Illustration of Queues

56

Page 57: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Illustration of Queues

57

x

x

Page 58: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Illustration of Queues

58

Page 59: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Illustration of Queues

59

Page 60: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

A Few Complications

• Shared Code – two or more nodes may do same thing (e.g. Viola-Jones)

• Overhead – queueing isn’t free

• Asynchrony – must use multiple processors, each with multiple SIMD lanes

• Ordering – are inputs processed “in order”?

60

Page 61: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Exploiting Shared Code

• “Module type” CUDA code

• Multiple nodes with same function have same module type

• We execute all nodes of a given module type in parallel!

• [Requires pulling data from each node’s queue concurrently]

61

Page 62: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Minimizing Overhead

• Queue manipulation is itself parallelized

• Easy case: “read next k inputs from queue into threads 1..k.”

• More fun: “read k total inputs from all queues combined into threads 1..k, and remember which queue each input came from.”

62

Page 63: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Sneaky Tricks

• Parallel scan

• Branch-free binary search

• Parallel output compaction

• [exploits, maintains input ordering] 63

Page 64: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Results of Synthetic Trial

64

Page 65: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Dealing with Asynchrony

• Shared input / output buffers

• Output order with multiple processors?

• [Need stream-synchronized signaling]

• Associative (and commutative?) reductions

65

Page 66: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Applications with Cycles

• App graph can have back edges

• Issue: deadlock prevention

• [topology restrictions, queueing policy]

• Order preservation?

66

Page 67: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Optimization Opportunities

• Parameter tuning (queue sizes, scheduler, …)

• Latency-sensitive applications vs occupancy

• Fusing nodes to elide queueing (at what cost to occupancy?)

67

Page 68: Wide-SIMD Parallelization of Streaming Dataflow, with ...jain/cse591-18/ftp/buhler591.pdf · Application graph consists of nodes (computations), edges (data transfer) •Data flows

Want to Play?

• https://github.com/jdbuhler/mercator

• MERCATOR will be a testing ground for SIMD-aware irregular streaming computation

• Many interesting problems still to be solved!

• thesis topics

68