Top Banner
Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines Porting the Barcelona OpenMP Tasks Suite to Go Artjom Simon https://github.com/artjomsimon/go-bots Know Your Gophers 2015-05-12
18

Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Aug 08, 2015

Download

Engineering

Artjom Simon
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Scalability comparison: Traditional fork-join-basedparallelism vs. Goroutines

Porting the Barcelona OpenMP Tasks Suite to Go

Artjom Simonhttps://github.com/artjomsimon/go-bots

Know Your Gophers

2015-05-12

Page 2: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Traditional approach in C

Cilk:cilk_spawn task();

[...]cilk_sync;

OpenMP:#pragma omp parallel{

#pragma omp task[...]#pragma omp taskwait[...]

}

Page 3: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Go: Parallel For Loop Pattern1

queue := make(chan int)done := make(chan bool)NP := runtime.GOMAXPROCS(0)

go func() {for i := 0; i < n; i++ { queue <- i }close(queue)

}()

for i := 0; i < NP; i++ {go func() {

for i := range queue { work(i) }done<-true

}()}

for i := 0; i < NP; i++ { <-done }

1Benchmarking Usability and Performance of Multicore Languages, PDF:http://arxiv.org/pdf/1302.2837v2

Page 4: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Barcelona OpenMP Tasks Suite2

2https://github.com/alcides/bots

Page 5: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

...used in academic publications3

3http://www.sarc-ip.org/files/null/Workshop/1234128788173__TSchedStrat-iwomp08.pdf

Page 6: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Micro benchmarks

1 8 16 32 48

1

8

16

32

48

OMP_NUM_THREADS

Spee

dup

rel.

tose

q.

spc (opteron)

n=1000µsn=100µsn=10µs

Figure: Speedup spc (icc), 10 000 Tasks

Page 7: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Task pools: Variations

• notaskpoolStart Goroutines as needed, no limitation, uses WaitGroup forsynchronization

• simple-queueBuffered channel of func()s holds task queue. n goroutinesreceive the func()s and execute them

• goroutines-dispatcherDispatcher function, executing tasks in Goroutine only if aglobal counter of running goroutines is < n

• const-goroutinesn goroutines remove tasks from a double-linked list

Page 8: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Micro benchmarks

1 8 16 32 48

1

8

16

32

48

OMP_NUM_THREADS

Spee

dup

rel.

zuse

quen

tiel

l

spc (opteron)

gccicc

clanggo-notaskpool

go-simple-queuego-const-goroutines

go-goroutine-dispatch

Figure: Speedup spc, n=100µs, 10 000 Tasks

Page 9: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

BOTS: nqueens

• N-Queens problem with n=12• Recursive backtracking search• No cut-off when creating tasks

Page 10: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Ergebnisse: BOTS (nqueens)

1 8 16 32 48

0

5

10

CPU cores

Spee

dup

nqueens (opteron)

gccicc

clanggo-const-goroutines

go-dispatchgo-notaskpool

gccgo-const-goroutinesgccgo-dispatch

gccgo-notaskpool

Figure: Speedup for nqueens -n 12, parallel

Page 11: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

BOTS: sparselu

• LU factorization of a sparse block matrix• 50x50-Matrix, 100x100 sub block matrices

Page 12: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Results: BOTS (sparselu)

1 8 16 32 48

0

10

20

30

CPU cores

Spee

dup

sparselu (opteron)

gccicc

clanggo-const-goroutines

go-dispatchgo-notaskpoolgo-simplequeue

gccgo-const-goroutinesgccgo-dispatch

gccgo-notaskpool

Figure: Speedup sparselu -n 50 -m 100, parallel

Page 13: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Problem: Recursion (dependencies!)

Page 14: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Memory

opteron

0

0.5

1

1.5

·105RSS

[Kby

tes]

spc-par 10000 1000, 4 Threads

gccicc

clang

go-notaskpoolgo-const-goroutines

gccgo-notaskpoolgccgo-const-goroutines

Figure: Memory comparison (Resident Set Size), spc parallel

Page 15: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Side effect: Possible heap corruption bug in Go 1.4?

Page 16: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Questions?

Page 17: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Thank you!

Page 18: Scalability comparison: Traditional fork-join-based parallelism vs. Goroutines: Porting the Barcelona OpenMP Tasks Suite to Go

Image credits

Icon N-Queens problem: Colin M.L. Burnett, Wikimedia Commons,(GFDL & BSD & GPL)http://commons.wikimedia.org/wiki/File:Chess_d45.svg(2015-03-09)