Top Banner
Weighted Dynamic Scheduling with Many Parallelism Grains for Offloading of Numerical Workloads to Multiple Varied Accelerators Azzam Haidar Yulu Jia Piotr Luszczek Stanimire Tomov Asim YarKhan Jack Dongarra
15

Weighted Dynamic Scheduling with Many Parallelism · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Feb 06, 2018

Download

Documents

vannga
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Weighted Dynamic Scheduling with Many Parallelism Grains for

Offloading of Numerical Workloads to Multiple Varied

Accelerators

Azzam Haidar Yulu Jia

Piotr Luszczek Stanimire Tomov

Asim YarKhan Jack Dongarra

Page 2: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Problem Statement: Factorization

11/16/15, Austin, TX, USA ScalA 2015 2

Ax = b PA = LU Ly = P-1b Ux = y

GETF2 TRSM GEMM

Page 3: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Problem Statement: Algorithm

11/16/15, Austin, TX, USA ScalA 2015 3

Ax = b PA = LU Ly = P-1b Ux = y

GETF2 TRSM GEMM

Page 4: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Problem Statement: Mapping to Hardware

11/16/15, Austin, TX, USA ScalA 2015 4

Ax = b PA = LU Ly = P-1b Ux = y

GETF2 TRSM GEMM

Page 5: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

From Code to DAG

11/16/15, Austin, TX, USA ScalA 2015 5

GETRF(A[1:n,1:n]){fori=1,nb,2*nb,…{ GETF2(A[i:n,i:i+nb]) TRSM(A[i:i+nb,i:i+nb],A[i:i+nb,i:n]) GEMM(A[i:n,i:i+nb],A[i:i+nb,i:n],A[i+nb:n,i+nb:n])}

}

GETF2(A[1,1],A[2,1],…)TRSM(A[1,1],A[1,2])TRSM(A[1,1],A[1,3])GEMM(A[2,1],A[1,2],A[2,2])

Startwithcanonicalcode:

Unwindthefunc:oncalls: Trackdependences:

Thefunc:oncallsandtheirdependencesformaDAG.

Page 6: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Matrix-View of Dependences

11/16/15, Austin, TX, USA ScalA 2015 6

GPU

XeonPhi

Page 7: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

From Code to DAG

11/16/15, Austin, TX, USA ScalA 2015 7

Page 8: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Asynchronous Algorithm

11/16/15, Austin, TX, USA ScalA 2015 8

Panelfactoriza:on

Pivo:ng(rowswaps)

Triangularsolve

Matrixmul:ply(Schurcomplement)

Scheduletasksaccordingto:-datasizes-hardwareweights-affinity-performlookahead

Page 9: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Main Features of the Scheduler

•  Resource capability weight for the task – w(kernel, device)

•  Adaptive scheduling with weights •  Dynamic lookahead •  Enhanced task priorities – Regulates lookahead (DAG exploration order)

•  Transparent data movement – Uses (and tracks) asynchronous data transfers

•  Dynamic data redistribution – Data is moved as needed and only if needed

11/16/15, Austin, TX, USA ScalA 2015 9

Page 10: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Runtime Scheduler Loop

11/16/15, Austin, TX, USA ScalA 2015 10

defmain_thread_loop(user_code,queues,threshold):#ifthereareenoughtasksforcoresifqueues.total_length()>threshold: #resumeuser’scodeforsubmittingtasks task=user_code.get_next_task() q=queues.find_closes_queue(task.devices()) q.insert(task,task.priority())else: task=queues.steal_task(main_cpu) #iftasksavailableforstealing ifnottask.is_empty(): task.execute()#executesingletask else: queues.wait_for_tasks()

Page 11: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Effect of Dynamic Scheduling

11/16/15, Austin, TX, USA ScalA 2015 11

Page 12: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Performance: Kepler K20

11/16/15, Austin, TX, USA ScalA 2015 12

Page 13: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Performance: Xeon Phi

11/16/15, Austin, TX, USA ScalA 2015 13

Page 14: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Performance: Xeon Phi + Kepler K40

11/16/15, Austin, TX, USA ScalA 2015 14

Page 15: Weighted Dynamic Scheduling with Many Parallelism  · PDF fileWeighted Dynamic Scheduling ... Ax = b PA = LU Ly = P-1b Ux = y ... # resume user’s code for submitting tasks

Summary of Contributions

•  Fine- and coarse-grained tasks for scheduling •  Capability weights for hardware description •  Unified scheduling across CPUs, GPUs, and

coprocessors •  Synchronous memory-transfer model with

transparent asynchronous progress

11/16/15, Austin, TX, USA ScalA 2015 15