Compactly Representing First-Order Structures for Static Analysis Tel-Aviv University Roman Manevich Mooly Sagiv I.B.M T.J. Watson Ganesan Ramalingam John Field Deepak Goyal
Jan 22, 2016
Compactly Representing First-Order Structures for
Static Analysis
Tel-Aviv University
Roman Manevich
Mooly Sagiv
I.B.M T.J. Watson
Ganesan RamalingamJohn Field
Deepak Goyal
Motivation
TVLA is a powerful and general abstract interpretation system
Abstract interpretation in TVLA Operational semantics is expressed with
first-order logic formulae Program states are represented as
sets of Evolving First-Order Structures
Space is a major bottleneck
Desired Properties
Sparse data structures Share common sub-structures
Inherited sharing Incidental sharing due to program invariants
But feasible time performance Phase sensitive data structures
Outline
Background First-order structure representations
Base representation (TVLA 0.91)
BDD representation Empirical evaluation Conclusion
First-Order Logical Structures
Generalize shape graphs Arbitrary set of individuals Arbitrary set of predicates on individuals Dynamically evolving
Usually small changes Properties are extracted by evaluating first
order formula: ∃v1 , v: x(v1) ∧ n(v1, v) Join operator requires isomorphism testing
First-Order Structure ADT
Structure : new() /* empty structure */ SetOfNodes : nodeSet(Structure) Node : newNode(Structure) removeNode(Structure, node) Kleene eval(Structure, p(r), <u1, . . . ,ur>)
update(Structure, p(r), <u1, . . . ,ur>, Kleene) Structure copy(Structure)
print_all Example/* list.h */typedef struct node { struct node * n; int data;} * L;
/* print.c */#include “list.h”void print_all(L y) { L x; x = y; while (x != NULL) { /* assert(x != NULL) */ printf(“elem=%d”, xdata); x = xn; }}
print_all Example
S0
copy(S0) : S1
x = yx’(v) := y(v)
nodeset(S0) : {u1, u}eval(S0, y, u1) : 1
update(S1, x, u1, 1)eval(S0, y, u) : 0update(S1, x, u, 0)
u1
y=1u
sm=½n=½
n=½
S1
u1
y=1u
sm=½n=½
n=½
x=1
print_all Example
x = x nfocus : ∃v1 x(v1) ∧ n(v1, v)x’(v) := ∃v1 x(v1) ∧ n(v1, v)
S2.0u1
y=1
usm=½
n=½
S2.1u1
y=1u
x=1n=1
n=½
S2.2u1
y=1u.1x=1
n=1
n=½
n=½
S1
u1
x=1y=1
usm=½
n=½
n=½
n=½
u.0sm=½
while (x != NULL)precondition : ∃v x(v)
Overview and Main Results
1. Two novel representations of first-order structures New BDD representation New representation using functional maps
2. Implementation techniques3. Empirical evaluation
Comparison of different representations Space is reduced by a factor of 4–10 New representations scale better
Base Representation (Tal Lev-Ami SAS 2000)
Two-Level Map : Predicate (Node Tuple Kleene)
Sparse Representation Limited inherited sharing by
“Copy-On-Write”
x1x2x3f
0000
0010
0100
0111
1000
1011
1100
1111
x3 x3 x3 x3
x2 x2
x1
10 0 0 0 1 0 1
BDDs in a Nutshell (Bryant 86) Ordered Binary Decision Diagrams Data structure for Boolean functions Functions are represented as (unique) DAGs
x3 x3 x3 x3
x2 x2
x1
0 1
x3 x3
x2 x2
x1
0 1
x3
x2
x1
0 1
Duplicate Terminals Duplicate Nonterminals Redundant Tests
BDDs in a Nutshell (Bryant 86) Ordered Binary Decision Diagrams Data structure for Boolean functions Functions are represented as (unique) DAGs
Also achieve sharing across functions
Encoding Structures Using Integers
Static encoding of Predicates Kleene values
Dynamic encoding of nodes 0, 1, …, n-1
Encode predicate p’s values as ep(p).en(u1). en(u2) . … . en(un) . ek(Kleene)
BDD Representation of Integer Sets
Characteristic function S={1,5} 1=<001> 5=<101>
S = (¬x1¬x2x3) (x1¬x2x3)
10
x2
x1
x3
x2
BDD Representation of Integer Sets
Characteristic function S={1,5} 1=<001> 5=<101>
S = (¬x1¬x2x3) (x1¬x2x3)
1
x2
x1
x3
x2
1
S0
BDD Representation Example
S0u1
y=1
usm=½
n=½
n=½
1
S0 S1
BDD Representation Example
x=y
S1
u1
x=1y=1
usm=½
n=½
n=½
S0u1
y=1
usm=½
n=½
n=½
1
S0 S1
S2.2
BDD Representation Example
x=y
x=xn
S2.2u1
y=1u.1x=1
n=1
n=½
n=½
n=½
u.0sm=½
S1
u1
x=1y=1
usm=½
n=½
n=½
S0u1
y=1
usm=½
n=½
n=½
1
S0 S1
S2.2
BDD Representation Example
x=y
x=xn
S2.2u1
y=1u.1x=1
n=1
n=½
n=½
n=½
u.0sm=½
S1
u1
x=1y=1
usm=½
n=½
n=½
S0u1
y=1
usm=½
n=½
n=½
Improved BDD Representation Using this representation directly
doesn’t save space Observation
Node names can be arbitrarily remapped without affecting the ADT semantics
Our heuristics Use canonic node names to encode nodes Increases incidental sharing Reduces isomorphism test to pointer comparison
4-10 space reduction
Reducing Time Overhead Current implementation not optimized
Expensive formula evaluation Hybrid representation
Distinguish between phases:mutable phase Join immutable phase
Dynamically switch representations
Functional Representation
Alternative representation for first-order structures Structures represented by maps from integers to
Kleene values Tailored for representing first-order structures Achieves better results than BDDs Techniques similar to the BDD representation More details in the paper
Empirical Evaluation
Benchmarks: Cleanness Analysis (SAS 2000) Garbage Collector CMP (PLDI 2002) of Java Front-End and Kernel
Benchmarks Mobile Ambients (ESOP 2000)
Stress testing the representations We use “relational analysis” Save structures in every CFG location
Space Results
12.8 22.7
168.2187.7
402.8
5.5 16.7 12.9 9.6
51.6
0
50
100
150
200
250
300
350
400
450
JFE KERNEL CA MA GC
Base
OBDD total
Functional
Abstract Counters
Ignore language/implementation details A more reliable measurement technique
Count only crucial space information Independent of C/Java
Abstract Counters Results
0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
40,000,000
45,000,000
JFE KERNEL CA MA GC
Base
OBDD
Functional
Trends in theCleanness Analysis Benchmark
505
564
74 54
50 420
100
200
300
400
500
600
1 2 3 4 5 6 7 8 9 10
BaseOBDDFunctional
What’s Missing from this Work?
Investigate other node mapping heuristics Compactly represent sets of structures Time optimizations
Conclusions Two novel representations of first-order structures
New BDD representation New representation using functional maps
Implementation techniques Normalization techniques are crucial
Empirical evaluation Comparison of different representations Space is reduced by a factor of 4–10 New representations scale better
Conclusions The use of BDDs for static analysis
is not a panacea for space saving Domain-specific encoding crucial for saving space Failed attempts
Original implementation of Veith’s encoding PAG
The End