Top Banner
Parallel Inclusion- based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1
30

Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

Dec 17, 2015

Download

Documents

Kevin Rice
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

1

Parallel Inclusion-based Points-to Analysis

Mario Méndez-Lojo Augustine Mathew

Keshav Pingali

The University of Texas at Austin (USA)

Page 2: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

2

Points-to analysis

• Static analysis technique– approximate locations pointed by (pointer) variable– useful for optimization, verification…

• Dimensions– flow sensitivity

What is the set of locations pointed by x?x=&a; x=&b;

flow sensitive: {b}, flow insensitive: {a,b}

– context sensitivity• Focus: context insensitive + flow insensitive solutions

– inclusion-based, not unification-based– available in modern compilers (gcc, LLVM…)

Page 3: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

3

Inclusion-based points-to analysis

• First proposed by Andersen [Andersen thesis’94]

• Much research focused on performance improvements– heuristics for cycle detection [Fahndrich PLDI’98; Hardekopf PLDI’07]

– offline preprocessing [Rountev PLDI’00]

– better ordering [Magno CGO’09]

– BDD-based representation [Lhotak PLDI’04]

– …• What about parallelization?– “future work” [Khalon PLDI’07, Magno CGO’09,….]

– never done before

Page 4: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

4

Parallel inclusion-based points-to analysis

• Challenges– highly irregular code

• BDD, sparse bit vectors, etc

– some phases of the algorithms are difficult to parallelize• SCC detection/DFS

• Contributions1. novel graph formulation2. parallelization of Andersen’s algorithm

• exploits algorithmic structure• up to 5x speedup vs. Hardekopf & Lin’s state-of-the-art

implementation [PLDI’07]

Page 5: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

5

Agenda

inclusion-based points-to analysis

graph formulation

parallel inclusion-based points-to

analysisefficient

parallelization

Parallelization of graph (irregular)

algorithms

parallelization of irregular algorithms

Page 6: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

6

Agenda

inclusion-based points-to analysis

graph formulation

parallel inclusion-based points-to

analysisefficient

parallelization

parallelization of irregular algorithms

Page 7: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

7

Andersen’s algorithm for C programs

1. Extract pointer assignmentsa= &b, a=b, a=*b, *a=b

2. Transform statements into set constraints

3. Iteratively solve system of constraints

C code name constraint

a = &b address of pts(a) {b}⊇

a = b copy pts(a) pts(b)⊇

a = b ∗ load ∀v pts(b) : pts(a) pts(v)∈ ⊇

*a = b store ∀v pts(a) : pts(v) pts(b)∈ ⊇

Page 8: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

8

Example

program

a=&v;

*a=b;

b=x;

x=&w;

Page 9: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

9

Example

program

a=&v;

*a=b;

b=x;

x=&w;

constraints

)()(:)( bptsvptsaptsv

)()( xptsbpts

}{)( vapts

}{)( wxpts

ptsa {}b {}v {}w {}

x {}

Page 10: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

10

Example

program

a=&v;

*a=b;

b=x;

x=&w;

constraints

)()(:)( bptsvptsaptsv

)()( xptsbpts

}{)( vapts

}{)( wxpts

ptsa {v}b {w}v {w}w {}

x {w}

Page 11: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

11

Constraint representation shortcomings

• Difficult reasoning about algorithm– separate representation

• constraints• points-to sets

– in parallel• which constraints can be processed simultaneously?• which points-to can be modified simultaneously?

• Cycle collapsing complicates things– representative table

• Need simpler representation

Page 12: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

12

Proposed graph representation1. Extract pointer assignments

a= &b, a=b, a=*b, *a=b

2. Create initial constraint graph– nodes ≡ variables– edges ≡ statements

3. Apply graph rewrite rules until fixpoint (next slide)

C code name edge

a = &b address of

a = b copy

a = b ∗ load

*a = b store

Page 13: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

13

Graph rewrite rulesname rule ensures

copy )()( bptsapts

Page 14: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

14

Graph rewrite rules

Example:

name rule ensures

copy )()( bptsapts

program

b=&v;

a=&x

a=b;

b=&w;

Page 15: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

15

Graph rewrite rulesname rule ensures

copy )()( bptsapts

load)()(

:)(

vptsapts

bptsv

store )()(

:)(

bptsvpts

aptsv

Page 16: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

16

Example revisited

program

a=&v;

*a=b;

b=x;

x=&w;

constraints

)()(:)( bptsvptsaptsv

)()( xptsbpts

}{)( vapts

}{)( wxpts

pts

a {}

b {}

v {}

w {}

x {}

pts

a {v}

b {w}

v {w}

w {}

x {w}

init

solve

solve

init

Page 17: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

17

Advantages of graph formulation

• Solving process entirely expressed as graph rewriting• Merging can be easily incorporated– equivalent edge– new rules

• Leverage existing techniques for parallelizing graph algorithms [next few slides]

push equivalent

Page 18: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

18

Agenda

inclusion-based points-to analysis

graph formulation

parallel inclusion-based points-to

analysisefficient

parallelization

parallelization of irregular algorithms

Page 19: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

19

Graph algorithms – Galois approach

•Active node –node where computation is needed–Andersen: node violating a rule’s

invariant•Activity –application of certain code to active

node–Andersen: rewrite rule

•Neighborhood–set of nodes/edges read/written by

activity–Andersen: 3 nodes involved in rule

Page 20: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

20

Parallelization of graph algorithms

Page 21: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

21

Parallelization of graph algorithms

Page 22: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

22

Parallelization of graph algorithms

• Correct parallel execution– neighborhoods do not overlap → activities can be executed in parallel– baseline conflict detection policy

• Implementation– use speculation– each node has an associated exclusive abstract lock– graph operations → acquire locks on read/written nodes– lock already owned → conflict → activity rolled back

Page 23: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

23

Parallelizing Andersen’s algorithm• Baseline conflict detection– activity acquires 3 locks, processes rule– conflict when rules sharing nodes are processed

simultaneously

• Correct but too restrictive

activity 1 adds p-edge <a,v> activity 2 adds p-edge <x,v>

Page 24: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

24

Optimal conflict detection • Avoid abstract locks on read nodes– edges never removed from graph

activity 1 adds p-edge <a,v> activity 2 adds p-edge <x,v>

Page 25: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

25

Optimal conflict detection • Avoid abstract locks on read nodes

– edges never removed from graph• Avoid abstract locks on written nodes

– edge additions commute with each other– concrete implementation guarantees consistency

• Conflicts → abstract locks→ rollbacks– speculation not necessary!

activity 2 adds p-edge <b,v>activity 1 adds p-edge <b,v>

Page 26: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

26

Implementation

• Implemented in Galois system [Kulkarni PLDI’07]

– graph implementations, scheduling policies, etc.– conflict detection turned off → speculation overheads

• Key data structures– Binary Decision Diagram

• points-to edges• lock-based hash set

– sparse bit vector• copy/store/load edges• lock-free linked list

• Download at http://www.ices.utexas.edu/~marioml

Page 27: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

Results: runtimes

• Intel Xeon machine, 8 cores– (our) sequential vs parallel– whole analysis– JVM 1.6, 64 bits, Linux 2.6.30

• Input: suite of C programs– gcc: 120K vars, 156K stmts– tshark: 1500K vars, 1700K stmts

• Low parallelization overheads–not more than 30%

• Good scalability–↑cores → ↓runtime

1 2 4 6 80.0

200.0

400.0

600.0

800.0

1,000.0

1,200.0

1,400.0

1,600.0gcc

tim

e (s

ec.)

1 2 4 6 80

10,000

20,000

30,000

40,000

50,000

60,000

70,000tshark

tim

e (s

ec.)

Page 28: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

28

Results: speedups• Reference (sequential) analysis

– Hardekopf & Lin [PLDI’07, SAS’07]

–written in C++– state-of-the-art, publicly available

implementation

• Xeon machine, 8 cores– reference: phase within LLVM 2.6– parallel: standalone, JVM 1.6

• Speedup wrt C++ version– whole analysis– 2-5x– can be > 1 with 1 thread (tshark)

•Sequential phases limit speedup– SCC detection, BDD init, etc.

gcc

vim svn

pine ph

p

mpl

ayer

gim

p

linux

tsha

rk

0

1

2

3

4

51 thread8 threads

Benchmark

Spee

dup

wrt

C++

seq

uenti

al

≈150K vars, 150K stmsseq: 7% (1 thread)

seq: 26% (8 threads)

≈150K vars, 150K stmsseq: 9% (1 thread)

seq: 32% (8 threads)

Page 29: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

29

Conclusions

• Inclusion-based points-to analysis– widely used technique

• Contributions1. Novel graph representation2. First parallelization of a points-to algorithm• correctness: exploit graph abstraction• efficiency: exploit algorithm structure

• Good results– 2-5x speedup wrt to state-of-the-art

Page 30: Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

30

Thank you!

implementation + slides available athttp://www.ices.utexas.edu/~marioml