PRESTO: Program Analyses and Software Tools Research Group, Ohio State University IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries Atanas (Nasko) Rountev Mariana Sharp Guoqing (Harry) Xu Ohio State University Supported by NSF Career grant CCF-0546040 and IBM Eclipse Innovation grant
23
Embed
IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries
IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries. Atanas (Nasko) Rountev Mariana Sharp Guoqing (Harry) Xu Ohio State University Supported by NSF Career grant CCF-0546040 and IBM Eclipse Innovation grant. Interprocedural Analysis with Large Libraries. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
IDE Dataflow Analysis in the Presence of Large Object-
Oriented Libraries
Atanas (Nasko) RountevMariana Sharp
Guoqing (Harry) XuOhio State University
Supported by NSF Career grant CCF-0546040 and IBM Eclipse Innovation grant
22 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Interprocedural Analysis with Large Libraries
All programs are built with reusable components- Standard libraries in C++, Java, C#- Domain-specific libraries
Whole-program analysis: complete client program C, together with all libraries it uses- Solutions for all program points in C and in the libraries
Summary-based analysis: pre-analyze the library and record reusable library summary information- Solutions for all program points in C
Goal: reduce the cost without losing any precision- e.g., the solutions inside C should be the same
This may be low-hanging fruit
33 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
44 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Interproc. Distributive Environment Problems
Defined by Sagiv, Reps, and Horwitz [TheorCompSci96]
- Subsumes the interprocedural finite distributive subset (IFDS) problems from their [POPL95] work
- Versions of constant propagation, slicing, alias analysis, side-effect analysis, reaching definitions, liveness, etc.
An environment is a map e : D L; e Env(D,L)- D is a set of symbols, L is a meet semi-lattice- Environment meet: (e1 e2)(d) = e1(d) e2(d)
Environment transformer t : Env(D,L) Env(D,L)- Distributive: e.g. t(e1 e2) = t(e1) t(e2)
55 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Dependence Analysis and Type Analysis for Java
Dependencies: for a local variable v at CFG node n, which formal parameters of n’s method influence v?- Restricted form of dep. analysis; useful for SDG building
D = { v1, …, vk }: locals vi
L = powerset of { f1, …, fm }: formals fj; meet is Transformer for v1:=f2: t(e) = e[v1 {f2}] Transformer for v1:=v2+v3: t(e) = e[v1 e(v2) e(v3)] Call v1:=meth(v2): composition of v2-to-formal, valid
same-level paths in meth, return-to-v1
0-CFA type analysis: D = { v1, …, vk, fld1, …, fldm }: locals and fields; L = powerset of set of types
66 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Representation of Environment Transformers
Key issue for any summary-based analysis: how do we represent and manipulate dataflow functions?- For IDE: composition/meet of environment
transformers
Sagiv et al.: a transformer can be represented by a bipartite directed graph with 2(|D|+1) nodes- Edges labeled with functions L L
77 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Composition of Transformers Graph reachability + composition of edge
labels
ll
d1
lf
d2
ll
ll
d1 d2 d3
llll
ll ll
}{ f
t(env) = env[d2 env(d1) env(d3)]
t(env) = env[d1 ]
d3
ll
ll
d1
d1
d2
d2
d3
d3
llll
lf
lf
88 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Precise Whole-Program Analysis Graph reachability along valid interprocedural paths Phase 1: summary function n for each CFG node n
- Represents the solution at n as a function of the solution at the entry of the procedure containing n
- Computed through composition and meet of transformers- Summary function at proc exit used at call sites to proc- Partial functions n: only for the subset of the domain that
is relevant to callers of n’s procedure
Phase 2: Top-down propagation of actual environments (e.g., dependence sets, type sets)
Adapt to library summary generation?
99 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
1010 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Phase 1: Intraprocedural Summary Generation
Produce a set of summary functions n,m
- n is the entry or a call site- m is the exit or a call site- there exists a call-free path from n to m
Similar to the summary functions n from the whole-program analysis, but - complete functions instead of partial functions- all possible compositions and meets of transformers
(as graph operations), until a fixed point is reached
After this, some elements of D are filtered away- e.g., for dependence analysis: locals that are not
actuals of calls and not written the return values from calls
1111 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Soot]- Both data and control dependencies- Simple optimizations: def-use chains, sparse graphs
Cost: 90 minutes time, 1.2GB memory- Includes all Soot-related costs and all I/O
Final summary on disk: 18MB Measurements: number of edges in the graph
representation of transformers- [1]: before any composition or meet- [2]: after intraprocedural composition and meet- [3]: after [2] and intraprocedural filtering: remove
elements that are irrelevant for callers and callees
1717 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Intraprocedural Propagation
0
500000
1000000
1500000
2000000
2500000
3000000
1 2 3
0
100000
200000
300000
400000
500000
600000
700000
1 2 3
dependence analysis:reduction in # edgesfrom [2] to [3]: 53%
type analysis:reduction in # edgesfrom [2] to [3]: 55%
1818 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Interprocedural Propagation for Dep. Analysis
Fixed methods: 25490 (33%); eliminate 7195 (9%) of them because their only callers are in the library
Summary functions for fixed methods- Instantiate at fixed calls within non-fixed methods:
eliminates 21% of all library call sites- Additional intraprocedural propagation and filtering
0
500000
1000000
1500000
2000000
2500000
3000000
1 2 3 4
reduction in # edgesfrom [3] to [4]: 32%
1919 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Summary-Based Analysis of Clients
0%
10%
20%
30%
40%
50%
60%
70%
compress db
fractal
jack
javac
javacup-0.10j
jb-6.1
jess
jflex-1.4.1
jlex-1.2.6
jtar-1.21
mindterm-1.1.5
mpegaudio
muffin-0.9.3a
rabbit2
raytrace
sablecc-2.18.2
socksecho
socksproxy
violet
Reduction in start-to-end time: IR building, type analysis + call graph, dependence analysis
2020 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Only Dependence Analysis Reduction in analysis time: actual analysis and
a hypothetical best case with no library dependencies
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
compress db
fractal
jack
javac
javacup-0.10j
jb-6.1
jess
jflex-1.4.1
jlex-1.2.6
jtar-1.21
mindterm-1.1.5
mpegaudio
muffin-0.9.3a
rabbit2
raytrace
sablecc-2.18.2
socksecho
socksproxy
violet
2121 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Overview of Results Start-to-end cost: IR, type analysis, dep.
analysis- Average time reduction 51%- Average memory reduction 33%
Only dependence analysis- Average time reduction 69% - Average memory reduction 90%- Very close to a conservative upper bound
Conclusions- Summary generation has reasonable cost- Summary size is small (# edges and total disk size)- Significant savings for analysis running time and
memory usage, compared to whole-program analysis
2222 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Future Work This is a very preliminary study
- Promising initial results, but just the tip of the iceberg
More IDE analyses, with different characteristics- e.g. points-to analysis, side-effect analysis, constant
propagation, typestate properties, etc.
Beyond IDE analyses- e.g. recent [POPL08] paper by Yorsh et al.
Better handling of callbacks and polymorphic calls- e.g. take advantage of behavioral subtyping
Reusable API for storing and retrieving summary information – generality for many different analyses- Open-source API implementation based on Soot
2323 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University