Top Banner
Program Analysis in Datalog Using the tool CS 510/08
22

Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

Mar 10, 2018

Download

Documents

dinhhuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

Program Analysis in Datalog

Using the tool

CS 510/08

Page 2: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

bddbddb System Overview

Joeqfrontend

Java bytecode

Datalogprogram

Input relations

Output relations

Page 3: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Compiler Frontend

Convert IR into tuplesTuples format:# V0:16 F0:11 V1:160 0 10 1 21470 0 1464

header line

one tuple per line

Page 4: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Compiler Frontend

Robust frontends:Joeq compilerSoot compilerSUIF compiler (for C code)

Still experimental:Eclipse frontendgcc frontend…

Page 5: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Extracting Relations

Idea: Iterate thru compiler IR, numbering and dumping relations of interest.

TypesMethodsFieldsVariables…

Page 6: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

joeq.Main.GenRelations

Generate initial relations for points-to analysis.Does initial pass to discover call graph.

Options:-fly: dump on-the-fly call graph info-cs : dump context-sensitive info-ssa : dump SSA representation-partial : no call graph discovery-Dpa.dumppath= : where to save files-Dpa.icallgraph= : location of initial call graph-Dpa.dumpdotgraph : dump call graph in dot

Page 7: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Format of a Datalog file

DomainsName Size ( map file )V 65536 var.mapH 32768

RelationsName ( <attribute list> ) flagsStore (v1 : V, f : F, v2 : V) inputPointsTo (v : V, h : H) input, output

RulesHead :- Body .PointsTo(v1,h) :- Assign(v1,v), PointsTo(v,h).

Page 8: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Demo

Page 9: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Program Analysis in Datalog

Context Sensitivity

CS 510/08

Page 10: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Existing Solution

Call strings based

Page 11: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Cloning-Based Solution

Simple brute force technique.Clone every path through the call graph.Run context-insensitive algorithm on expanded call graph.

The catch: exponential blowup

Page 12: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Cloning is exponential!

Page 13: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Recursion

Actually, cloning is unbounded in the presence of recursive cycles.Technique: We treat all methods within a strongly-connected component as a single node.

Page 14: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Recursion

A

G

B C D

E F

A

G

B C D

E F E F E F

G G

Page 15: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Top 20 Sourceforge Java Apps

Number of Clones

1.E+001.E+021.E+041.E+061.E+081.E+101.E+121.E+141.E+16

1000 10000 100000 1000000Size of program (variable nodes)

Num

ber o

f clo

nes

1016

1012

108

104

100

Page 16: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Cloning is infeasible (?)

Typical large program has ~1014 paths.If you need 1 byte to represent a clone:

Would require 256 terabytes of storage>12 times size of Library of CongressRegistered ECC 1GB DIMMs: $41.7 million

Power: 96.4 kilowatts = Power for 128 homes500 GB hard disks: 564 x $195 = $109,980

Time to read sequential: 70.8 days

Page 17: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Key Insight

There are many similarities across contexts.Many copies of nearly-identical results.

BDDs can represent large sets of redundant data efficiently.

Need a BDD encoding that exploits the similarities.

Page 18: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Context-sensitive Pointer Analysis Algorithm

1. First, do context-insensitive pointer analysis to get call graph.

2. Number clones.3. Do context-insensitive algorithm on the

cloned graph.

Results explicitly generated for every clone.Individual results retrievable with Datalog query.

Page 19: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Counting rule

0<=i<=k-1

Page 20: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Expanded Call Graph

A

DB C

E

F G

H

A0

D0B0 C0

E1

F2 G0

H0

E0 E2

F0 F1 G2G1

H1 H2 H3 H4 H5

Page 21: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Numbering Clones

A

DB C

E

F G

H

0 0 0

0 1 2

0-2 0-2

0-2 3-5

0A0

D0B0 C0

E1

F2 G0

H0

E0 E2

F0 F1 G2G1

H1 H2 H3 H4 H5

Page 22: Program Analysis in Datalog - Purdue · PDF fileDoes initial pass to discover call graph. Options:-fly: dump on-the-fly call graph info ... Individual results retrievable with Datalog

cs5

10

So

ftware

En

gin

eerin

g

Context Sensitive PointsTo

vP0(v,h) means there is an invocation site h that assigns a newly allocated object to variable v