Compactly Representing First-Order Structures for Static Analysis

Compactly Representing First-Order Structures for

Static Analysis

Tel-Aviv University

Roman Manevich

Mooly Sagiv

I.B.M T.J. Watson

Ganesan RamalingamJohn Field

Deepak Goyal

Motivation

TVLA is a powerful and general abstract interpretation system

Abstract interpretation in TVLA Operational semantics is expressed with

first-order logic formulae Program states are represented as

sets of Evolving First-Order Structures

Space is a major bottleneck

Desired Properties

Sparse data structures Share common sub-structures

Inherited sharing Incidental sharing due to program invariants

But feasible time performance Phase sensitive data structures

Outline

Background First-order structure representations

Base representation (TVLA 0.91)

BDD representation Empirical evaluation Conclusion

First-Order Logical Structures

Generalize shape graphs Arbitrary set of individuals Arbitrary set of predicates on individuals Dynamically evolving

Usually small changes Properties are extracted by evaluating first

order formula: ∃v1 , v: x(v1) ∧ n(v1, v) Join operator requires isomorphism testing

First-Order Structure ADT

Structure : new() /* empty structure */ SetOfNodes : nodeSet(Structure) Node : newNode(Structure) removeNode(Structure, node) Kleene eval(Structure, p(r), <u1, . . . ,ur>)

update(Structure, p(r), <u1, . . . ,ur>, Kleene) Structure copy(Structure)

print_all Example/* list.h */typedef struct node { struct node * n; int data;} * L;

/* print.c */#include “list.h”void print_all(L y) { L x; x = y; while (x != NULL) { /* assert(x != NULL) */ printf(“elem=%d”, xdata); x = xn; }}

print_all Example

S0

copy(S0) : S1

x = yx’(v) := y(v)

nodeset(S0) : {u1, u}eval(S0, y, u1) : 1

update(S1, x, u1, 1)eval(S0, y, u) : 0update(S1, x, u, 0)

u1

y=1u

sm=½n=½

n=½

S1

u1

y=1u

sm=½n=½

n=½

x=1

print_all Example

x = x nfocus : ∃v1 x(v1) ∧ n(v1, v)x’(v) := ∃v1 x(v1) ∧ n(v1, v)

S2.0u1

y=1

usm=½

n=½

S2.1u1

y=1u

x=1n=1

n=½

S2.2u1

y=1u.1x=1

n=1

n=½

n=½

S1

u1

x=1y=1

usm=½

n=½

n=½

n=½

u.0sm=½

while (x != NULL)precondition : ∃v x(v)

Overview and Main Results

1. Two novel representations of first-order structures New BDD representation New representation using functional maps

2. Implementation techniques3. Empirical evaluation

Comparison of different representations Space is reduced by a factor of 4–10 New representations scale better

Base Representation (Tal Lev-Ami SAS 2000)

Two-Level Map : Predicate (Node Tuple Kleene)

Sparse Representation Limited inherited sharing by

“Copy-On-Write”

x1x2x3f

0000

0010

0100

0111

1000

1011

1100

1111

x3 x3 x3 x3

x2 x2

x1

10 0 0 0 1 0 1

BDDs in a Nutshell (Bryant 86) Ordered Binary Decision Diagrams Data structure for Boolean functions Functions are represented as (unique) DAGs

x3 x3 x3 x3

x2 x2

x1

0 1

x3 x3

x2 x2

x1

0 1

x3

x2

x1

0 1

Duplicate Terminals Duplicate Nonterminals Redundant Tests

BDDs in a Nutshell (Bryant 86) Ordered Binary Decision Diagrams Data structure for Boolean functions Functions are represented as (unique) DAGs

Also achieve sharing across functions

Encoding Structures Using Integers

Static encoding of Predicates Kleene values

Dynamic encoding of nodes 0, 1, …, n-1

Encode predicate p’s values as ep(p).en(u1). en(u2) . … . en(un) . ek(Kleene)

BDD Representation of Integer Sets

Characteristic function S={1,5} 1=<001> 5=<101>

S = (¬x1¬x2x3) (x1¬x2x3)

10

x2

x1

x3

x2

BDD Representation of Integer Sets

Characteristic function S={1,5} 1=<001> 5=<101>

S = (¬x1¬x2x3) (x1¬x2x3)

1

x2

x1

x3

x2

1

S0

BDD Representation Example

S0u1

y=1

usm=½

n=½

n=½

1

S0 S1


x=y

S1

u1

x=1y=1

usm=½

n=½

n=½

S0u1

y=1

usm=½

n=½

n=½

1

S0 S1

S2.2


x=y

x=xn

S2.2u1

y=1u.1x=1

n=1

n=½

n=½

n=½

u.0sm=½

S1

u1

x=1y=1

usm=½

n=½

n=½

S0u1

y=1

usm=½

n=½

n=½

1

S0 S1

S2.2


x=y

x=xn

S2.2u1

y=1u.1x=1

n=1

n=½

n=½

n=½

u.0sm=½

S1

u1

x=1y=1

usm=½

n=½

n=½

S0u1

y=1

usm=½

n=½

n=½

Improved BDD Representation Using this representation directly

doesn’t save space Observation

Node names can be arbitrarily remapped without affecting the ADT semantics

Our heuristics Use canonic node names to encode nodes Increases incidental sharing Reduces isomorphism test to pointer comparison

4-10 space reduction

Reducing Time Overhead Current implementation not optimized

Expensive formula evaluation Hybrid representation

Distinguish between phases:mutable phase Join immutable phase

Dynamically switch representations

Functional Representation

Alternative representation for first-order structures Structures represented by maps from integers to

Kleene values Tailored for representing first-order structures Achieves better results than BDDs Techniques similar to the BDD representation More details in the paper

Empirical Evaluation

Benchmarks: Cleanness Analysis (SAS 2000) Garbage Collector CMP (PLDI 2002) of Java Front-End and Kernel

Benchmarks Mobile Ambients (ESOP 2000)

Stress testing the representations We use “relational analysis” Save structures in every CFG location

Space Results

12.8 22.7

168.2187.7

402.8

5.5 16.7 12.9 9.6

51.6

0

50

100

150

200

250

300

350

400

450

JFE KERNEL CA MA GC

Base

OBDD total

Functional

Abstract Counters

Ignore language/implementation details A more reliable measurement technique

Count only crucial space information Independent of C/Java

Abstract Counters Results

0

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

35,000,000

40,000,000

45,000,000

JFE KERNEL CA MA GC

Base

OBDD

Functional

Trends in theCleanness Analysis Benchmark

505

564

74 54

50 420

100

200

300

400

500

600

1 2 3 4 5 6 7 8 9 10

BaseOBDDFunctional

What’s Missing from this Work?

Investigate other node mapping heuristics Compactly represent sets of structures Time optimizations

Conclusions Two novel representations of first-order structures

New BDD representation New representation using functional maps

Implementation techniques Normalization techniques are crucial

Empirical evaluation Comparison of different representations Space is reduced by a factor of 4–10 New representations scale better

Conclusions The use of BDDs for static analysis

is not a panacea for space saving Domain-specific encoding crucial for saving space Failed attempts

Original implementation of Veith’s encoding PAG

The End

Compactly Representing First-Order Structures for Static Analysis

Documents