Top Banner
CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY 1
77

CILK/CILK++ and Reducers

Jul 14, 2015

Download

Technology

Yunming Zhang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CILK/CILK++ and Reducers

CILK/CILK++

AND

REDUCERS

YUNMING ZHANG

RICE UNIVERSITY

1

Page 2: CILK/CILK++ and Reducers

OUTLINE

• CILK and CILK++ Language Features and

Usages

• Work stealing runtime

• CILK++ Reducers

• Conclusions

2

Page 3: CILK/CILK++ and Reducers

IDEALIZED SHARED

MEMORY ARCHITECTURE

3

• Hardware model

• Processors

• Shared global

memory

• Software model

• Threads

• Shared variables

• Communication

• Synchronization

Slide from Comp 422 Rice University Lecture 4

Page 4: CILK/CILK++ and Reducers

CILK AND CILK++

DESIGN GOALS

• Programmer friendly

• Dynamic tasking

• Parallel extension to C

• Scalable performance

• Efficient runtime system

• Minimum program overhead

4

Page 5: CILK/CILK++ and Reducers

CILK KEYWORDS

• Cilk: a Cilk function

• Spawn: call can execute asynchronously

in a concurrent thread

• Sync: current thread waits for all locally-

spawned functions

5

Page 6: CILK/CILK++ and Reducers

CILK EXAMPLE

cilk int fib(n) {if (n < 2)

return n;

else {

int n1, n2;n1 = spawn fib(n-1);

n2 = spawn fib(n-2);

sync;

return (n1 + n2);

}

}

6Borrowed from Comp 422 Rice University Lecture 4

Page 7: CILK/CILK++ and Reducers

CILK++ EXAMPLE

int fib(n) {if (n < 2)

return n;

else {

int n1, n2;n1 = cilk_spawn fib(n-1);

n2 = fib(n-2);

cilk_sync;

return (n1 + n2);

}

}

7Borrowed from Comp 422 Rice University Lecture 4

Page 8: CILK/CILK++ and Reducers

CILK++ EXAMPLE

WITH DAG

8

Pictures from “Reducers and Other CILK+ HyperObjects”

Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).

Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 9: CILK/CILK++ and Reducers

OUTLINE

• CILK and CILK++ Language Features and

Usages

• Work stealing runtime

• CILK++ Reducers

• Conclusions

9

Page 10: CILK/CILK++ and Reducers

WORK FIRST

PRINCIPLE

• Work: T1

• Critical path length: T∞

• Number of processor: P

• Expected time

• Tp = T1/P + O(T∞)

• Parallel slackness assumption

• T1/P >> C∞T∞

10

Page 11: CILK/CILK++ and Reducers

WORK FIRST

PRINCIPLE

• Minimize scheduling overhead borne by

work at the expense of increasing critical

path

• Tp ≤ C1Ts/P + C∞T∞

≈ C1Ts/P

Minimize C1 even at the expense of a larger

C∞

11

Page 12: CILK/CILK++ and Reducers

WORK STEALING

DESIGN GOALS

• Minimizing contentions

• Decentralized task deque

• Doubly linked deque

• Minimizing communication

• Steal work rather than push work

• Load balance across cores

• Lazy task creation

• Steal from the top of the deque

12

Page 13: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

13

Pictures from “Reducers and Other CILK+ HyperObjects”

Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).

Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 14: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

14

Pictures from “Reducers and Other CILK+ HyperObjects”

Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).

Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 15: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

15

Pictures from “Reducers and Other CILK+ HyperObjects”

Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).

Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 16: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

16

Pictures from “Reducers and Other CILK+ HyperObjects”

Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).

Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 17: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

17

Pictures from “Reducers and Other CILK+ HyperObjects”

Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).

Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 18: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

18

Pictures from “Reducers and Other CILK+ HyperObjects”

Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).

Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 19: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

Pictures from “Reducers and Other CILK+ HyperObjects”

Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).

Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 20: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

Pictures from “Reducers and Other CILK+ HyperObjects”

Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).

Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 21: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

21

Pictures from “Reducers and Other CILK+ HyperObjects”

Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).

Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 22: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

22

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 23: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

23

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 24: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

24

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 25: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

25

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 26: CILK/CILK++ and Reducers

CILK WORK STEALING

SCHEDULER

26

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 27: CILK/CILK++ and Reducers

TWO CLONE

STRATEGY

• Fast clone

• Identical in most respects to the C elision of the Cilkprogram

• Very little execution overhead

• Sync statements compile to no op

• Allocates an continuation• Program variables and instruction pointer

• Slow clone

• Convert a spawn schedule to slow clone only when it is stolen

• Restores program state from activation frame that contains local variables, program counter and other parts of the procedure instance

27

Page 28: CILK/CILK++ and Reducers

FAST CLONE

28

Page 29: CILK/CILK++ and Reducers

SLOW CLONE

Slow_fib(frame * _cilk_frame){

restore states of the program

switch (_cilk_frame->header.entry)

{

fast_fib(_cilk_frame->n - 1 );

case 1: goto _cilk_sync1;

fast_fib(_cilk_frame->n - 2 );

case 2: goto _cilk_sync2;

sync (not a no op)

case 3: goto _cilk_sync3;

}

}

29

Page 30: CILK/CILK++ and Reducers

EXTENDED DEQUE

WITH CALL STACKS

30

Stack frame

Full frame

Extended Deque

Call stack

Page 31: CILK/CILK++ and Reducers

FRAMES

• C++ Main Frame

• Local variables of the procedure instance

• Temporary variables

• Linkage information for return values

31

Page 32: CILK/CILK++ and Reducers

FRAMES

• CILK++ Stack Frame

• Everything in C++ Main Frame

• Continuation

• Parent pointer

• Have exactly one child

• Used by Fast Clone

• A worker can have multiple Stack Frames

32

Page 33: CILK/CILK++ and Reducers

FRAMES

• CILK++ Full Frame (used by slow clone)

• Everything in CILK++ Stack Frame

• Lock

• Join counter

• List of children (has more than one

children)

• A worker has at most one Full Frame

33

Page 34: CILK/CILK++ and Reducers

FUNCTION CALL

34

Stack frame

Full frame

Extended Deque (Before Function Call)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Page 35: CILK/CILK++ and Reducers

FUNCTION CALL

35

Stack frame

Full frame

Extended Deque (After Function Call)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

New stack

frame

Page 36: CILK/CILK++ and Reducers

SPAWN

36

Stack frame

Full frame

Extended Deque (Before Spawn Call)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Page 37: CILK/CILK++ and Reducers

SPAWN

37

Stack frame

Full frame

Extended Deque (After Spawn Call)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Set

continuation

in last stack

frame

Page 38: CILK/CILK++ and Reducers

RESUME FULL FRAME

38

Stack frame

Full frame

Extended DequeFunction call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Set the full frame to be the only frame in the

call stack, resume execution on the

continuation

Page 39: CILK/CILK++ and Reducers

RANDOMLY STEAL

39

Stack frame

Full frame

Extended DequeFunction call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Steal this call stack

Page 40: CILK/CILK++ and Reducers

RANDOMLY STEAL

40

Stack frame

Full frame

Extended DequeFunction call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Steal this call stack

1 1 1

Page 41: CILK/CILK++ and Reducers

RANDOMLY STEAL

41

Stack frame

Full frame

Extended DequeFunction call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

1

1 1

Page 42: CILK/CILK++ and Reducers

PROVABLY GOOD

STEAL

42

Stack frame

Full frame

Extended DequeFunction call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

0

Page 43: CILK/CILK++ and Reducers

UNCONDITIONALLY

STEAL

43

Stack frame

Full frame

Extended DequeFunction call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

2

Page 44: CILK/CILK++ and Reducers

FUNCTION CALL

RETURN

44

Stack frame

Full frame

Extended Deque (Before Return from a Call Case1)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Page 45: CILK/CILK++ and Reducers

FUNCTION CALL

RETURN

45

Stack frame

Full frame

Extended Deque (Return from a Call Case 1)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Page 46: CILK/CILK++ and Reducers

FUNCTION CALL

RETURN

46

Stack frame

Full frame

Extended Deque (Return from a Call Case2)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Worker executes an

unconditional steal

Page 47: CILK/CILK++ and Reducers

SPAWN RETURN

47

Stack frame

Full frame

Extended Deque (Before Spawn return Case 1)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Page 48: CILK/CILK++ and Reducers

SPAWN RETURN

48

Stack frame

Full frame

Extended Deque (After Spawn return Case 1)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Page 49: CILK/CILK++ and Reducers

SPAWN RETURN

49

Stack frame

Full frame

Extended Deque (Return from a SpawnCase2)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Worker executes an

provably good steal

Page 50: CILK/CILK++ and Reducers

SYNC

50

Stack frame

Full frame

Extended Deque (Sync Case 1)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Do nothing if it

is a stack

frame (No Op)

Page 51: CILK/CILK++ and Reducers

SYNC

51

Stack frame

Full frame

Extended Deque (Sync Case 2)Function call

Spawn

Call return

Spawn return

Sync

Randomly steal

Provably good

steal

Unconditionally

steal

Resume full

frame

Pop the frame,

provably good steal

Page 52: CILK/CILK++ and Reducers

OUTLINE

• CILK and CILK++ Language Features and

Usages

• Work stealing runtime

• CILK++ Reducers

• Conclusions

52

Page 53: CILK/CILK++ and Reducers

PROBLEMS WITH

NON-LOCAL VARIABLES

bool has_property(Node *)

List<Node *> output_list;

void walk(Node *x)

{

if (x) {

if (has_property(x))

output_list.push_back(x);

cilk_spawn walk(x->left);

walk(x->right);

cilk_sync;

}

}

53

Page 54: CILK/CILK++ and Reducers

REDUCER

DESIGN GOALS

• Support parallelization of programs

containing global variables

• Enable efficient parallel scaling by

avoiding a single point of contention

• Provide deterministic result for

associative reduce operations

• Operate independently of any control

constructs

54

Page 55: CILK/CILK++ and Reducers

REDUCER EXAMPLE

bool has_property(Node *)

List_append_reducer<Node *> output_list;

void walk(Node *x)

{

if (x) {

if (has_property(x))

output_list.push_back(x);

cilk_spawn walk(x->left);

walk(x->right);

cilk_sync;

}

}

55

Page 56: CILK/CILK++ and Reducers

HYPER OBJECTS

56

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 57: CILK/CILK++ and Reducers

REDUCER

57

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 58: CILK/CILK++ and Reducers

SEMANTICS OF

REDUCERS

• The child strand owns the view owned by parent function before cilk_spawn

• The parent strand owns a new view, initialized to identity view e,

• A special optimization ensures that if a view is unchanged when combined with the identity view

• Parent strand P own the view from completed child strands

58

Page 59: CILK/CILK++ and Reducers

REDUCING OVER LIST

CONCATENATION

59

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 60: CILK/CILK++ and Reducers

REDUCING OVER LIST

CONCATENATION

60

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 61: CILK/CILK++ and Reducers

IMPLEMENTATION OF

REDUCER

• Each worker maintains a hypermap

• Hypermap

• Maps reducers to the views

• User

• The view of the current procedure

• Children

• The view of the children procedures

• Right

• The view of right sibling

• Identity

• The default value of a view

61

Page 62: CILK/CILK++ and Reducers

UNDERSTANDING

HYPERMAPS

bool has_property(Node *)

List_append_reducer<Node *> output_list;

void walk(Node *x) ------------ Proc A

{

if (x) {

if (has_property(x))

output_list.push_back(x);

cilk_spawn walk(x->left); ---------proc B

cilk_spawn walk(x->right); -------- proc C

cilk_sync;

}

62

Page 63: CILK/CILK++ and Reducers

HYPERMAP CREATION

64

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 64: CILK/CILK++ and Reducers

HYPERMAP CREATION

65

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 65: CILK/CILK++ and Reducers

HYPERMAP CREATION

66

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 66: CILK/CILK++ and Reducers

HYPERMAP CREATION

67

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 67: CILK/CILK++ and Reducers

HYPERMAP CREATION

68

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 68: CILK/CILK++ and Reducers

LOOK UP FAILURE

• Inserts a view containing an identity

element for the reducer into the

hypermap.

• Following the lazy principle

• Look up returns the newly inserted

identity view

69

Page 69: CILK/CILK++ and Reducers

RANDOM WORK

STEALING

A random steal operation steals a full frame

P and replaces it with a new full frame C in

the victim.

USERC ← USERP;

U S E R P ← 0/ ;

CHILDRENP←0/;

RIGHTP←0/.

70

Page 70: CILK/CILK++ and Reducers

RANDOM WORK

STEALING

71

Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo

Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).

Page 71: CILK/CILK++ and Reducers

RETURN FROM A CALL

Let C be a child frame of the parent frame P

that originally called C, and suppose that C

returns.

• If C is a stack frame, do nothing,

• If C is a full frame.

• Transfer ownership of view

• Children and Right are empty

• USERP ← USERC

77

Page 72: CILK/CILK++ and Reducers

RETURN FROM A

SPAWN

Let C be a child frame of the parent frame P that originally spawned C, and suppose that C returns.

• Always do USERC ← REDUCE(USERC,RIGHTC)

• If C is a stack frame, do nothing

• If C is a full frame

• If C has siblings,

• RIGHTL ← REDUCE(RIGHTL,USERC)

• C is the leftmost child

• CHILDRENP ← REDUCE(CHILDRENP,USERC)

78

Page 73: CILK/CILK++ and Reducers

SYNC

A cilk_sync statement waits until all children have com-

pleted. When frame P executes a cilk_sync, one of following

two cases applies:

• If P is a stack frame, do nothing.

• If P is a full frame,

• USERP ← REDUCE(CHILDRENP,USERP).

82

Page 74: CILK/CILK++ and Reducers

BENEFITS OF

REDUCERS

83

Page 75: CILK/CILK++ and Reducers

OUTLINE

• CILK and CILK++ Language Features and

Usages

• Work stealing runtime

• CILK++ Reducers

• Conclusions

84

Page 76: CILK/CILK++ and Reducers

CONCLUSIONS

• CILK and CILK++ provide a programmer friendly programming model

• Extension to C

• Incremental parallelism

• Scaling on future machines

• Non-compromising performance

• Work stealing runtime

• Minimizing overheads

• Reducers

85

Page 77: CILK/CILK++ and Reducers

FINAL NOTES

• Designed for an idealized shared memory

model

• Today’s architectures are typically NUMA

• Task creation can be lazier

• http://ieeexplore.ieee.org/xpls/abs_all.jsp?

arnumber=6012915&tag=1

• Cilk_for

• Divide and conquer parallelization

86