Top Banner
Nanjing University Tian Tan 2020 Datalog-Based Static Program Analysis Program Analysis
93

Datalog-Based Program Analysis

Mar 11, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Datalog-Based Program Analysis

Nanjing University

Tian Tan

2020

Datalog-Based

Static Program Analysis

Program Analysis

Page 2: Datalog-Based Program Analysis

1. Motivation2. Introduction to Datalog3. Pointer Analysis via Datalog4. Taint Analysis via Datalog

Tian Tan @ Nanjing University 2

Contents

Page 3: Datalog-Based Program Analysis

Contents

1. Motivation2. Introduction to Datalog3. Pointer Analysis via Datalog4. Taint Analysis via Datalog

Tian Tan @ Nanjing University 3

Page 4: Datalog-Based Program Analysis

Imperative vs Declarative

Goal: select adults from a set of persons

Tian Tan @ Nanjing University 4

Page 5: Datalog-Based Program Analysis

Imperative vs Declarative

Goal: select adults from a set of personsโ€ข Imperative: how to do (~implementation)

Tian Tan @ Nanjing University 5

Set<Person> selectAdults(Set<Person> persons) {Set<Person> result = new HashSet<>();for (Person person : persons)if (person.getAge() >= 18)result.add(person);

return result;}

Page 6: Datalog-Based Program Analysis

Imperative vs Declarative

Goal: select adults from a set of personsโ€ข Imperative: how to do (~implementation)

โ€ข Declarative: what to do (~specification)

Tian Tan @ Nanjing University 6

SELECT * FROM Persons WHERE Age >= 18;

Set<Person> selectAdults(Set<Person> persons) {Set<Person> result = new HashSet<>();for (Person person : persons)if (person.getAge() >= 18)result.add(person);

return result;}

Page 7: Datalog-Based Program Analysis

How to Implement Program Analyses?

Tian Tan @ Nanjing University 7

Kind Statement Rule

New i: x = new T() ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Assign x = y ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Store x.f = y ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ , ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฆ๐‘ฆ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“)

Load y = x.f ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ , ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)

Call l: r = x.k(a1,โ€ฆ,an)

๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ , ๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘– , k)๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›

๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก)

๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘š๐‘š๐‘๐‘๐‘—๐‘— , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘Ÿ๐‘Ÿ)

Specification

Page 8: Datalog-Based Program Analysis

Pointer Analysis, Imperative Implementation

Tian Tan @ Nanjing University 8

Solve(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)while WL is not empty do

remove ๐‘›๐‘›, ๐‘๐‘๐‘๐‘๐‘๐‘ from WLฮ” = pts โ€“ pt(n)Propagate(n, ฮ”)if n represents a variable x then

foreach ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ฮ” doforeach x.f = y โˆˆ ๐‘†๐‘† do

AddEdge(y, ๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“)foreach y = x.f โˆˆ ๐‘†๐‘† do

AddEdge(๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“, y)ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)

AddReachable(m)if m โˆ‰ RM then

add m to RM๐‘†๐‘† โˆช= ๐‘†๐‘†๐‘š๐‘šforeach i: x = new T() โˆˆ ๐‘†๐‘†๐‘š๐‘š do

add ๐‘ฅ๐‘ฅ, {๐‘œ๐‘œ๐‘–๐‘–} to WLforeach x = y โˆˆ ๐‘†๐‘†๐‘š๐‘š do

AddEdge(y, x)

AddEdge(s, t)if s โ†’ t โˆ‰ PFG then

add s โ†’ t to PFGif ๐‘๐‘๐‘๐‘(๐‘๐‘) is not empty then

add ๐‘๐‘, ๐‘๐‘๐‘๐‘(๐‘๐‘) to WL

ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)foreach l: r = x.k(a1,โ€ฆ,an) โˆˆ ๐‘†๐‘† do๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘–, k)add ๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก, {๐‘œ๐‘œ๐‘–๐‘–} to WLif l โ†’ ๐‘š๐‘š โˆ‰ CG then

add l โ†’ ๐‘š๐‘š to CGAddReachable(๐‘š๐‘š)foreach parameter ๐‘๐‘๐‘–๐‘– of ๐‘š๐‘š do

AddEdge(๐‘Ž๐‘Ž๐‘–๐‘–, ๐‘๐‘๐‘–๐‘–)AddEdge(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ, ๐‘Ÿ๐‘Ÿ)

Propagate(n, pts)if pts is not empty then

pt(n) โ‹ƒ= ptsforeach n โ†’ s โˆˆ PFG do

add ๐‘๐‘, pts to WL

Page 9: Datalog-Based Program Analysis

Pointer Analysis, Imperative Implementation

Tian Tan @ Nanjing University 9

Solve(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)while WL is not empty do

remove ๐‘›๐‘›, ๐‘๐‘๐‘๐‘๐‘๐‘ from WLฮ” = pts โ€“ pt(n)Propagate(n, ฮ”)if n represents a variable x then

foreach ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ฮ” doforeach x.f = y โˆˆ ๐‘†๐‘† do

AddEdge(y, ๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“)foreach y = x.f โˆˆ ๐‘†๐‘† do

AddEdge(๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“, y)ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)

AddReachable(m)if m โˆ‰ RM then

add m to RM๐‘†๐‘† โˆช= ๐‘†๐‘†๐‘š๐‘šforeach i: x = new T() โˆˆ ๐‘†๐‘†๐‘š๐‘š do

add ๐‘ฅ๐‘ฅ, {๐‘œ๐‘œ๐‘–๐‘–} to WLforeach x = y โˆˆ ๐‘†๐‘†๐‘š๐‘š do

AddEdge(y, x)

AddEdge(s, t)if s โ†’ t โˆ‰ PFG then

add s โ†’ t to PFGif ๐‘๐‘๐‘๐‘(๐‘๐‘) is not empty then

add ๐‘๐‘, ๐‘๐‘๐‘๐‘(๐‘๐‘) to WL

ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)foreach l: r = x.k(a1,โ€ฆ,an) โˆˆ ๐‘†๐‘† do๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘–, k)add ๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก, {๐‘œ๐‘œ๐‘–๐‘–} to WLif l โ†’ ๐‘š๐‘š โˆ‰ CG then

add l โ†’ ๐‘š๐‘š to CGAddReachable(๐‘š๐‘š)foreach parameter ๐‘๐‘๐‘–๐‘– of ๐‘š๐‘š do

AddEdge(๐‘Ž๐‘Ž๐‘–๐‘–, ๐‘๐‘๐‘–๐‘–)AddEdge(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ, ๐‘Ÿ๐‘Ÿ)

Propagate(n, pts)if pts is not empty then

pt(n) โ‹ƒ= ptsforeach n โ†’ s โˆˆ PFG do

add ๐‘๐‘, pts to WL

โ€ข How to implement worklist?โ€ข Array list or linked list?โ€ข Which worklist entry should be

processed first?

Page 10: Datalog-Based Program Analysis

Pointer Analysis, Imperative Implementation

Tian Tan @ Nanjing University 10

Solve(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)while WL is not empty do

remove ๐‘›๐‘›, ๐‘๐‘๐‘๐‘๐‘๐‘ from WLฮ” = pts โ€“ pt(n)Propagate(n, ฮ”)if n represents a variable x then

foreach ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ฮ” doforeach x.f = y โˆˆ ๐‘†๐‘† do

AddEdge(y, ๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“)foreach y = x.f โˆˆ ๐‘†๐‘† do

AddEdge(๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“, y)ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)

AddReachable(m)if m โˆ‰ RM then

add m to RM๐‘†๐‘† โˆช= ๐‘†๐‘†๐‘š๐‘šforeach i: x = new T() โˆˆ ๐‘†๐‘†๐‘š๐‘š do

add ๐‘ฅ๐‘ฅ, {๐‘œ๐‘œ๐‘–๐‘–} to WLforeach x = y โˆˆ ๐‘†๐‘†๐‘š๐‘š do

AddEdge(y, x)

AddEdge(s, t)if s โ†’ t โˆ‰ PFG then

add s โ†’ t to PFGif ๐‘๐‘๐‘๐‘(๐‘๐‘) is not empty then

add ๐‘๐‘, ๐‘๐‘๐‘๐‘(๐‘๐‘) to WL

ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)foreach l: r = x.k(a1,โ€ฆ,an) โˆˆ ๐‘†๐‘† do๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘–, k)add ๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก, {๐‘œ๐‘œ๐‘–๐‘–} to WLif l โ†’ ๐‘š๐‘š โˆ‰ CG then

add l โ†’ ๐‘š๐‘š to CGAddReachable(๐‘š๐‘š)foreach parameter ๐‘๐‘๐‘–๐‘– of ๐‘š๐‘š do

AddEdge(๐‘Ž๐‘Ž๐‘–๐‘–, ๐‘๐‘๐‘–๐‘–)AddEdge(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ, ๐‘Ÿ๐‘Ÿ)

Propagate(n, pts)if pts is not empty then

pt(n) โ‹ƒ= ptsforeach n โ†’ s โˆˆ PFG do

add ๐‘๐‘, pts to WL

โ€ข How to implement worklist?โ€ข Array list or linked list?โ€ข Which worklist entry should be

processed first?โ€ข How to implement points-to set (pt)?

โ€ข Hash set or bit vector?

Page 11: Datalog-Based Program Analysis

Pointer Analysis, Imperative Implementation

Tian Tan @ Nanjing University 11

Solve(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)while WL is not empty do

remove ๐‘›๐‘›, ๐‘๐‘๐‘๐‘๐‘๐‘ from WLฮ” = pts โ€“ pt(n)Propagate(n, ฮ”)if n represents a variable x then

foreach ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ฮ” doforeach x.f = y โˆˆ ๐‘†๐‘† do

AddEdge(y, ๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“)foreach y = x.f โˆˆ ๐‘†๐‘† do

AddEdge(๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“, y)ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)

AddReachable(m)if m โˆ‰ RM then

add m to RM๐‘†๐‘† โˆช= ๐‘†๐‘†๐‘š๐‘šforeach i: x = new T() โˆˆ ๐‘†๐‘†๐‘š๐‘š do

add ๐‘ฅ๐‘ฅ, {๐‘œ๐‘œ๐‘–๐‘–} to WLforeach x = y โˆˆ ๐‘†๐‘†๐‘š๐‘š do

AddEdge(y, x)

AddEdge(s, t)if s โ†’ t โˆ‰ PFG then

add s โ†’ t to PFGif ๐‘๐‘๐‘๐‘(๐‘๐‘) is not empty then

add ๐‘๐‘, ๐‘๐‘๐‘๐‘(๐‘๐‘) to WL

ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)foreach l: r = x.k(a1,โ€ฆ,an) โˆˆ ๐‘†๐‘† do๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘–, k)add ๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก, {๐‘œ๐‘œ๐‘–๐‘–} to WLif l โ†’ ๐‘š๐‘š โˆ‰ CG then

add l โ†’ ๐‘š๐‘š to CGAddReachable(๐‘š๐‘š)foreach parameter ๐‘๐‘๐‘–๐‘– of ๐‘š๐‘š do

AddEdge(๐‘Ž๐‘Ž๐‘–๐‘–, ๐‘๐‘๐‘–๐‘–)AddEdge(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ, ๐‘Ÿ๐‘Ÿ)

Propagate(n, pts)if pts is not empty then

pt(n) โ‹ƒ= ptsforeach n โ†’ s โˆˆ PFG do

add ๐‘๐‘, pts to WL

โ€ข How to implement worklist?โ€ข Array list or linked list?โ€ข Which worklist entry should be

processed first?โ€ข How to implement points-to set (pt)?

โ€ข Hash set or bit vector?โ€ข How to connect PFG nodes and pointers?

Page 12: Datalog-Based Program Analysis

Pointer Analysis, Imperative Implementation

Tian Tan @ Nanjing University 12

Solve(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)while WL is not empty do

remove ๐‘›๐‘›, ๐‘๐‘๐‘๐‘๐‘๐‘ from WLฮ” = pts โ€“ pt(n)Propagate(n, ฮ”)if n represents a variable x then

foreach ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ฮ” doforeach x.f = y โˆˆ ๐‘†๐‘† do

AddEdge(y, ๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“)foreach y = x.f โˆˆ ๐‘†๐‘† do

AddEdge(๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“, y)ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)

AddReachable(m)if m โˆ‰ RM then

add m to RM๐‘†๐‘† โˆช= ๐‘†๐‘†๐‘š๐‘šforeach i: x = new T() โˆˆ ๐‘†๐‘†๐‘š๐‘š do

add ๐‘ฅ๐‘ฅ, {๐‘œ๐‘œ๐‘–๐‘–} to WLforeach x = y โˆˆ ๐‘†๐‘†๐‘š๐‘š do

AddEdge(y, x)

AddEdge(s, t)if s โ†’ t โˆ‰ PFG then

add s โ†’ t to PFGif ๐‘๐‘๐‘๐‘(๐‘๐‘) is not empty then

add ๐‘๐‘, ๐‘๐‘๐‘๐‘(๐‘๐‘) to WL

ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)foreach l: r = x.k(a1,โ€ฆ,an) โˆˆ ๐‘†๐‘† do๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘–, k)add ๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก, {๐‘œ๐‘œ๐‘–๐‘–} to WLif l โ†’ ๐‘š๐‘š โˆ‰ CG then

add l โ†’ ๐‘š๐‘š to CGAddReachable(๐‘š๐‘š)foreach parameter ๐‘๐‘๐‘–๐‘– of ๐‘š๐‘š do

AddEdge(๐‘Ž๐‘Ž๐‘–๐‘–, ๐‘๐‘๐‘–๐‘–)AddEdge(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ, ๐‘Ÿ๐‘Ÿ)

Propagate(n, pts)if pts is not empty then

pt(n) โ‹ƒ= ptsforeach n โ†’ s โˆˆ PFG do

add ๐‘๐‘, pts to WL

โ€ข How to implement worklist?โ€ข Array list or linked list?โ€ข Which worklist entry should be

processed first?โ€ข How to implement points-to set (pt)?

โ€ข Hash set or bit vector?โ€ข How to connect PFG nodes and pointers?โ€ข How to associate variables to the

relevant statements?

Page 13: Datalog-Based Program Analysis

Pointer Analysis, Imperative Implementation

Tian Tan @ Nanjing University 13

Solve(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)WL=[ ],PFG={},S={},RM={},CG={}AddReachable(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘’๐‘’)while WL is not empty do

remove ๐‘›๐‘›, ๐‘๐‘๐‘๐‘๐‘๐‘ from WLฮ” = pts โ€“ pt(n)Propagate(n, ฮ”)if n represents a variable x then

foreach ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ฮ” doforeach x.f = y โˆˆ ๐‘†๐‘† do

AddEdge(y, ๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“)foreach y = x.f โˆˆ ๐‘†๐‘† do

AddEdge(๐‘œ๐‘œ๐‘–๐‘– . ๐‘“๐‘“, y)ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)

AddReachable(m)if m โˆ‰ RM then

add m to RM๐‘†๐‘† โˆช= ๐‘†๐‘†๐‘š๐‘šforeach i: x = new T() โˆˆ ๐‘†๐‘†๐‘š๐‘š do

add ๐‘ฅ๐‘ฅ, {๐‘œ๐‘œ๐‘–๐‘–} to WLforeach x = y โˆˆ ๐‘†๐‘†๐‘š๐‘š do

AddEdge(y, x)

AddEdge(s, t)if s โ†’ t โˆ‰ PFG then

add s โ†’ t to PFGif ๐‘๐‘๐‘๐‘(๐‘๐‘) is not empty then

add ๐‘๐‘, ๐‘๐‘๐‘๐‘(๐‘๐‘) to WL

ProcessCall(x, ๐‘œ๐‘œ๐‘–๐‘–)foreach l: r = x.k(a1,โ€ฆ,an) โˆˆ ๐‘†๐‘† do๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘–, k)add ๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก, {๐‘œ๐‘œ๐‘–๐‘–} to WLif l โ†’ ๐‘š๐‘š โˆ‰ CG then

add l โ†’ ๐‘š๐‘š to CGAddReachable(๐‘š๐‘š)foreach parameter ๐‘๐‘๐‘–๐‘– of ๐‘š๐‘š do

AddEdge(๐‘Ž๐‘Ž๐‘–๐‘–, ๐‘๐‘๐‘–๐‘–)AddEdge(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ, ๐‘Ÿ๐‘Ÿ)

Propagate(n, pts)if pts is not empty then

pt(n) โ‹ƒ= ptsforeach n โ†’ s โˆˆ PFG do

add ๐‘๐‘, pts to WL

โ€ข How to implement worklist?โ€ข Array list or linked list?โ€ข Which worklist entry should be

processed first?โ€ข How to implement points-to set (pt)?

โ€ข Hash set or bit vector?โ€ข How to connect PFG nodes and pointers?โ€ข How to associate variables to the

relevant statements?โ€ข โ€ฆ

So many implementation details

Page 14: Datalog-Based Program Analysis

Pointer Analysis, Declarative Implementation (via Datalog)

Tian Tan @ Nanjing University 14

VarPointsTo(this, o),Reachable(m),CallGraph(l, m) <-

VCall(l, x, k),VarPointsTo(x, o),Dispatch(o, k, m),ThisVar(m, this).

VarPointsTo(pi, o) <-CallGraph(l, m),Argument(l, i, ai),Parameter(m, i, pi),VarPointsTo(ai, o).

VarPointsTo(r, o) <-CallGraph(l, m),MethodReturn(m, ret),VarPointsTo(ret, o),CallReturn(l, r),

VarPointsTo(x, o) <-Reachable(m),New(x, o, m).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

Page 15: Datalog-Based Program Analysis

VarPointsTo(x, o) <-Reachable(m),New(x, o, m).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

Pointer Analysis, Declarative Implementation (via Datalog)

Tian Tan @ Nanjing University 15

VarPointsTo(this, o),Reachable(m),CallGraph(l, m) <-

VCall(l, x, k),VarPointsTo(x, o),Dispatch(o, k, m),ThisVar(m, this).

VarPointsTo(pi, o) <-CallGraph(l, m),Argument(l, i, ai),Parameter(m, i, pi),VarPointsTo(ai, o).

VarPointsTo(r, o) <-CallGraph(l, m),MethodReturn(m, ret),VarPointsTo(ret, o),CallReturn(l, r),โ€ข Succinct

โ€ข Readable (logic-based specification)โ€ข Easy to implement

Page 16: Datalog-Based Program Analysis

1. Motivation2. Introduction to Datalog3. Pointer Analysis via Datalog4. Taint Analysis via Datalog

Tian Tan @ Nanjing University 16

Contents

Page 17: Datalog-Based Program Analysis

Datalog

โ€ข Datalog is a declarative logic programming language that is a subset of Prolog.

โ€ข It emerged as a database language (mid-1980s)*โ€ข Now it has a variety of applications

โ€ข Program analysisโ€ข Declarative networkingโ€ข Big dataโ€ข Cloud computingโ€ข โ€ฆ

Tian Tan @ Nanjing University 17

David Maier, K. Tuncay Tekle, Michael Kifer, and David S. Warren, โ€œDatalog: Concepts, History, and Outlookโ€. Chapter, 2018.

*

Page 18: Datalog-Based Program Analysis

Datalog

โ€ข No side-effectsโ€ข No control flowsโ€ข No functionsโ€ข Not Turing-complete

Tian Tan @ Nanjing University 18

Datalog = Data + Logic(and, or, not)

Page 19: Datalog-Based Program Analysis

Datalog

โ€ข No side-effectsโ€ข No control flowsโ€ข No functionsโ€ข Not Turing-complete

Tian Tan @ Nanjing University 19

Datalog = Data + Logic(and, or, not)

Page 20: Datalog-Based Program Analysis

Predicates (Data)

โ€ข In Datalog, a predicate (relation) is a set of statementsโ€ข Essentially, a predicate is a table of data

Tian Tan @ Nanjing University 20

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

Age Age is a predicate, which states the age of some persons.

Page 21: Datalog-Based Program Analysis

Predicates (Data)

โ€ข In Datalog, a predicate (relation) is a set of statementsโ€ข Essentially, a predicate is a table of dataโ€ข A fact asserts that a particular tuple (a row) belongs

to a relation (a table), i.e., it represents a predicate being true for a particular combination of values

Tian Tan @ Nanjing University 21

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

Age is a predicate, which states the age of some persons. For Age:โ€ข (โ€œXiaomingโ€,18) means

โ€œXiaoming is 18โ€, which is a factโ€ข (โ€œAbaoโ€,23) means โ€œAbao is 23โ€,

which is not a fact

Age

Page 22: Datalog-Based Program Analysis

Atoms

โ€ข Atoms are basic elements of Datalog, which represent predicates of the form

โ€ข Termsโ€ข Variables: stand for any valuesโ€ข Constants

Tian Tan @ Nanjing University 22

P(X1,X2,โ€ฆ,Xn)

Name of predicate Arguments (terms)

Page 23: Datalog-Based Program Analysis

Atoms

โ€ข Atoms are basic elements of Datalog, which represent predicates of the form

โ€ข Termsโ€ข Variables: stand for any valuesโ€ข Constants

โ€ข Examplesโ€ข Age(person,age)โ€ข Age(โ€œXiaomingโ€,18)

Tian Tan @ Nanjing University 23

P(X1,X2,โ€ฆ,Xn)

Name of predicate Arguments (terms)

Page 24: Datalog-Based Program Analysis

Atoms (Cont.)

โ€ข P(X1,X2,โ€ฆ,Xn) is called relational atomโ€ข P(X1,X2,โ€ฆ,Xn) evaluates to true when predicate P

contains the tuple described by X1,X2,โ€ฆ,Xn

Tian Tan @ Nanjing University 24

Page 25: Datalog-Based Program Analysis

Atoms (Cont.)

โ€ข P(X1,X2,โ€ฆ,Xn) is called relational atomโ€ข P(X1,X2,โ€ฆ,Xn) evaluates to true when predicate P

contains the tuple described by X1,X2,โ€ฆ,Xnโ€ข Age(โ€œXiaomingโ€,18) is

Tian Tan @ Nanjing University 25

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

Age

Page 26: Datalog-Based Program Analysis

Atoms (Cont.)

โ€ข P(X1,X2,โ€ฆ,Xn) is called relational atomโ€ข P(X1,X2,โ€ฆ,Xn) evaluates to true when predicate P

contains the tuple described by X1,X2,โ€ฆ,Xnโ€ข Age(โ€œXiaomingโ€,18) is trueโ€ข Age(โ€œAlanโ€,23) is

Tian Tan @ Nanjing University 26

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

Age

Page 27: Datalog-Based Program Analysis

Atoms (Cont.)

โ€ข P(X1,X2,โ€ฆ,Xn) is called relational atomโ€ข P(X1,X2,โ€ฆ,Xn) evaluates to true when predicate P

contains the tuple described by X1,X2,โ€ฆ,Xnโ€ข Age(โ€œXiaomingโ€,18) is trueโ€ข Age(โ€œAlanโ€,23) is false

Tian Tan @ Nanjing University 27

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

Age

Page 28: Datalog-Based Program Analysis

Atoms (Cont.)

โ€ข P(X1,X2,โ€ฆ,Xn) is called relational atomโ€ข P(X1,X2,โ€ฆ,Xn) evaluates to true when predicate P

contains the tuple described by X1,X2,โ€ฆ,Xnโ€ข Age(โ€œXiaomingโ€,18) is trueโ€ข Age(โ€œAlanโ€,23) is false

โ€ข In addition to relational atoms, Datalog also has arithmetic atoms

โ€ข E.g., age >= 18Tian Tan @ Nanjing University 28

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

Age

Page 29: Datalog-Based Program Analysis

Datalog Rules (Logic)

โ€ข Rule is a way of expressing logical inferencesโ€ข Rules also serve to specify how facts are deducedโ€ข The form of a rule is

Tian Tan @ Nanjing University 29

H <- B1,B2,โ€ฆ,Bn.

Page 30: Datalog-Based Program Analysis

Datalog Rules (Logic)

โ€ข Rule is a way of expressing logical inferencesโ€ข Rules also serve to specify how facts are deducedโ€ข The form of a rule is

Tian Tan @ Nanjing University 30

H <- B1,B2,โ€ฆ,Bn.

Head (consequent)H is an atom

Body (antecedent)Bi is a (possibly negated) atomEach Bi is called a subgoal

The meaning of a rule is โ€œhead is true if body is trueโ€

Page 31: Datalog-Based Program Analysis

Datalog Rules (Cont.)

โ€œ,โ€ can be read as (logical) and, i.e., body B1,B2,โ€ฆ,Bnis true if all subgoals B1, B2, โ€ฆ, and Bn are true

For example, we can deduce adults via Datalog rule:

Tian Tan @ Nanjing University 31

H <- B1,B2,โ€ฆ,Bn.

Adult(person) <-Age(person,age),age >= 18.

Page 32: Datalog-Based Program Analysis

Datalog Rules (Cont.)

โ€œ,โ€ can be read as (logical) and, i.e., body B1,B2,โ€ฆ,Bnis true if all subgoals B1, B2, โ€ฆ, and Bn are true

For example, we can deduce adults via Datalog rule:

Tian Tan @ Nanjing University 32

H <- B1,B2,โ€ฆ,Bn.

Adult(person) <-Age(person,age),age >= 18.

How to interpret the rules?

Page 33: Datalog-Based Program Analysis

Interpretation of Datalog Rules

โ€ข Consider all possible combinations of values of the variables in the subgoals

โ€ข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true

โ€ข The head predicate consists of all true atoms

Tian Tan @ Nanjing University 33

H(X1,X2) <- B1(X1,X3),B2(X2,X4),โ€ฆ,Bn(Xm).

Page 34: Datalog-Based Program Analysis

Rule Interpretation: An Example

Tian Tan @ Nanjing University 34

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

AgeAdult(person) <-Age(person,age),age >= 18.

โ€ข Consider all possible combinations of values of the variables in the subgoals

โ€ข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true

โ€ข The head predicate consists of all true atoms

Page 35: Datalog-Based Program Analysis

Rule Interpretation: An Example

Tian Tan @ Nanjing University 35

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

AgeAdult(person) <-Age(person,age),age >= 18.

Age(โ€œXiaomingโ€,18),18>=18.Adult(โ€œXiaomingโ€) <-

โ€ข Consider all possible combinations of values of the variables in the subgoals

โ€ข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true

โ€ข The head predicate consists of all true atoms

Page 36: Datalog-Based Program Analysis

Rule Interpretation: An Example

Tian Tan @ Nanjing University 36

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

AgeAdult(person) <-Age(person,age),age >= 18.

Age(โ€œXiaomingโ€,18),18>=18.Age(โ€œXiaohongโ€,23),23>=18.

Adult(โ€œXiaomingโ€) <-Adult(โ€œXiaohongโ€) <-

โ€ข Consider all possible combinations of values of the variables in the subgoals

โ€ข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true

โ€ข The head predicate consists of all true atoms

Page 37: Datalog-Based Program Analysis

Rule Interpretation: An Example

Tian Tan @ Nanjing University 37

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

AgeAdult(person) <-Age(person,age),age >= 18.

Age(โ€œXiaomingโ€,18),18>=18.Age(โ€œXiaohongโ€,23),23>=18.Age(โ€œAlanโ€,16),16>=18.

Adult(โ€œXiaomingโ€) <-Adult(โ€œXiaohongโ€) <-

โ€ข Consider all possible combinations of values of the variables in the subgoals

โ€ข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true

โ€ข The head predicate consists of all true atoms

Page 38: Datalog-Based Program Analysis

Rule Interpretation: An Example

Tian Tan @ Nanjing University 38

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

AgeAdult(person) <-Age(person,age),age >= 18.

Age(โ€œXiaomingโ€,18),18>=18.Age(โ€œXiaohongโ€,23),23>=18.Age(โ€œAlanโ€,16),16>=18.Age(โ€œAbaoโ€,31),31>=18.

Adult(โ€œXiaomingโ€) <-Adult(โ€œXiaohongโ€) <-

Adult(โ€œAbaoโ€) <-

โ€ข Consider all possible combinations of values of the variables in the subgoals

โ€ข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true

โ€ข The head predicate consists of all true atoms

Page 39: Datalog-Based Program Analysis

Rule Interpretation: An Example

Tian Tan @ Nanjing University 39

โ€ข Consider all possible combinations of values of the variables in the subgoals

โ€ข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true

โ€ข The head predicate consists of all true atoms

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

AgeAdult(person) <-Age(person,age),age >= 18.

person

Xiaoming

Xiaohong

Abao

AdultDatalog program = Facts + Rules

Page 40: Datalog-Based Program Analysis

Rule Interpretation: An Example

Tian Tan @ Nanjing University 40

โ€ข Consider all possible combinations of values of the variables in the subgoals

โ€ข If a combination makes all subgoals true, then the head atom (with corresponding values) is also true

โ€ข The head predicate consists of all true atoms

person age

Xiaoming 18

Xiaohong 23

Alan 16

Abao 31

AgeAdult(person) <-Age(person,age),age >= 18.

person

Xiaoming

Xiaohong

Abao

AdultDatalog program = Facts + Rules

Where does initial data come from?

Page 41: Datalog-Based Program Analysis

EDB and IDB PredicatesConventionally, predicates in Datalog are divided into two kinds:1. EDB (extensional database)

โ€ข The predicates that are defined in a prioriโ€ข Relations are immutableโ€ข Can be seen as input relations

2. IDB (intensional database)โ€ข The predicates that are established only by rulesโ€ข Relations are inferred by rulesโ€ข Can be seen as output relations

Tian Tan @ Nanjing University 41

Page 42: Datalog-Based Program Analysis

EDB and IDB PredicatesConventionally, predicates in Datalog are divided into two kinds:1. EDB (extensional database)

โ€ข The predicates that are defined in a prioriโ€ข Relations are immutableโ€ข Can be seen as input relations

2. IDB (intensional database)โ€ข The predicates that are established only by rulesโ€ข Relations are inferred by rulesโ€ข Can be seen as output relations

Tian Tan @ Nanjing University 42

H <- B1,B2,โ€ฆ,Bn.

โ€ข H can only be IDBโ€ข Bi can be EDB or IDB

Page 43: Datalog-Based Program Analysis

Logical Or

There are two ways to express logical or in Datalog1. Write multiple rules with the same head

2. Use logical or operator โ€œ;โ€

Tian Tan @ Nanjing University 43

person hobby

Xiaoming cooking

Xiaoming singing

Xiaohong jogging

Abao sleeping

Alan swimming

โ€ฆ โ€ฆ

HobbySportFan(person) <- Hobby(person, โ€œjoggingโ€).SportFan(person) <- Hobby(person,โ€œswimmingโ€).

SportFan(person) <-Hobby(person,โ€œjoggingโ€);Hobby(person,โ€œswimmingโ€).

Page 44: Datalog-Based Program Analysis

Logical Or

There are two ways to express logical or in Datalog1. Write multiple rules with the same head

2. Use logical or operator โ€œ;โ€

Tian Tan @ Nanjing University 44

person hobby

Xiaoming cooking

Xiaoming singing

Xiaohong jogging

Abao sleeping

Alan swimming

โ€ฆ โ€ฆ

Hobby

The precedence of โ€œ;โ€ (or) is lower than โ€œ,โ€ (and), so disjunctions may be enclosed by parentheses, e.g., H <- A,(B;C).

SportFan(person) <- Hobby(person, โ€œjoggingโ€).SportFan(person) <- Hobby(person,โ€œswimmingโ€).

SportFan(person) <-Hobby(person,โ€œjoggingโ€);Hobby(person,โ€œswimmingโ€).

Page 45: Datalog-Based Program Analysis

Negation

โ€ข In Datalog rules, a subgoal can be a negated atom, which negates its meaning

โ€ข Negated subgoal is written as !B(โ€ฆ), and read as not B(โ€ฆ)

Tian Tan @ Nanjing University 45

H(X1,X2) <- B1(X1,X3),!B2(X2,X4),โ€ฆ,Bn(Xm).

Page 46: Datalog-Based Program Analysis

Negation

โ€ข In Datalog rules, a subgoal can be a negated atom, which negates its meaning

โ€ข Negated subgoal is written as !B(โ€ฆ), and read as not B(โ€ฆ)

โ€ข For example, to compute the students who need to take a make-up exam, we can write

Tian Tan @ Nanjing University 46

H(X1,X2) <- B1(X1,X3),!B2(X2,X4),โ€ฆ,Bn(Xm).

MakeupExamStd(student) <-Student(student),!PassedStd(student).

Where Student stores all students, and PassedStd stores the students who passed the exam.

Page 47: Datalog-Based Program Analysis

Recursion

โ€ข Datalog supports recursive rules, which allows that an IDB predicate can be deduced (directly/indirectly) from itself

Tian Tan @ Nanjing University 47

Page 48: Datalog-Based Program Analysis

Recursion

โ€ข Datalog supports recursive rules, which allows that an IDB predicate can be deduced (directly/indirectly) from itself

โ€ข For example, we can compute the reachability information (i.e., transitive closure) of a graph with recursive rules:

Tian Tan @ Nanjing University 48

Reach(from, to) <-Edge(from, to).

Reach(from, to) <-Reach(from, node),Edge(node, to).

Where Edge(a,b) means that the graph has an edge from node a to node b,and Reach(a,b) means that b is reachable from a.

Page 49: Datalog-Based Program Analysis

Recursion (Cont.)

โ€ข Without recursion, Datalog can only express the queries of basic relational algebra

โ€ข Basically a SQL with SELECT-FROM-WHERE

โ€ข With recursion, Datalog becomes much more powerful, and is able to express sophisticated program analyses, such as pointer analysis

Tian Tan @ Nanjing University 49

Page 50: Datalog-Based Program Analysis

Rule Safety

Are these rules ok?

Tian Tan @ Nanjing University 50

A(x) <- B(y), x > y. A(x) <- B(y), !C(x,y).

Page 51: Datalog-Based Program Analysis

Rule Safety

Are these rules ok?

Tian Tan @ Nanjing University 51

A(x) <- B(y), x > y. A(x) <- B(y), !C(x,y).

For both rules, infinite values of x can satisfy the rule,which makes A an infinite relation.

Page 52: Datalog-Based Program Analysis

Rule Safety

Are these rules ok?

โ€ข A rule is safe if every variable appears in at least one non-negated relational atom

โ€ข Above two rules are unsafeโ€ข In Datalog, only safe rules are allowed

Tian Tan @ Nanjing University 52

A(x) <- B(y), x > y. A(x) <- B(y), !C(x,y).

For both rules, infinite values of x can satisfy the rule,which makes A an infinite relation.

Page 53: Datalog-Based Program Analysis

Recursion and Negation

Is this rule ok?

Tian Tan @ Nanjing University 53

A(x) <- B(x), !A(x)

Page 54: Datalog-Based Program Analysis

Recursion and Negation

Is this rule ok?

Tian Tan @ Nanjing University 54

A(x) <- B(x), !A(x)

Suppose B(1) is true.If A(1) is false, then A(1) is true.If A(1) is true, A(1) should not be true.โ€ฆ

Page 55: Datalog-Based Program Analysis

Recursion and Negation

Is this rule ok?

The rule is contradictory and makes no sense

In Datalog, recursion and negation of an atom must be separated. Otherwise, the rules may contain contradiction and the inference fails to converge.

Tian Tan @ Nanjing University 55

A(x) <- B(x), !A(x)

Suppose B(1) is true.If A(1) is false, then A(1) is true.If A(1) is true, A(1) should not be true.โ€ฆ

Page 56: Datalog-Based Program Analysis

Execution of Datalog Programs

โ€ข Datalog engine deduces facts by given rules and EDB predicates until no new facts can be deduced. Some modern Datalog engines

LogicBlox, Soufflรฉ, XSB, Datomic, Flora-2, โ€ฆ

Tian Tan @ Nanjing University 56

EDB

RulesIDB

Datalog engine

Page 57: Datalog-Based Program Analysis

Execution of Datalog Programs

โ€ข Datalog engine deduces facts by given rules and EDB predicates until no new facts can be deduced. Some modern Datalog engines

LogicBlox, Soufflรฉ, XSB, Datomic, Flora-2, โ€ฆ

โ€ข Monotonicity: Datalog is monotone as facts cannot be deleted

Tian Tan @ Nanjing University 57

EDB

RulesIDB

Datalog engine

Page 58: Datalog-Based Program Analysis

Execution of Datalog Programs

โ€ข Datalog engine deduces facts by given rules and EDB predicates until no new facts can be deduced. Some modern Datalog engines

LogicBlox, Soufflรฉ, XSB, Datomic, Flora-2, โ€ฆ

โ€ข Monotonicity: Datalog is monotone as facts cannot be deleted

โ€ข Termination: A Datalog program always terminates as1) Datalog is monotone2) Possible values of IDB predicates are finite (rule safety)

Tian Tan @ Nanjing University 58

EDB

RulesIDB

Datalog engine

Page 59: Datalog-Based Program Analysis

1. Motivation2. Introduction to Datalog3. Pointer Analysis via Datalog4. Taint Analysis via Datalog

Tian Tan @ Nanjing University 59

Contents

Page 60: Datalog-Based Program Analysis

Pointer Analysis via Datalog

โ€ข EDB: pointer-relevant information that can be extracted from program syntactically

โ€ข IDB: pointer analysis resultsโ€ข Rules: pointer analysis rules

Tian Tan @ Nanjing University 60

Page 61: Datalog-Based Program Analysis

Pointer Analysis via Datalog

โ€ข EDB: pointer-relevant information that can be extracted from program syntactically

โ€ข IDB: pointer analysis resultsโ€ข Rules: pointer analysis rules

Tian Tan @ Nanjing University 61

New x = new T()

Assign x = y

Store x.f = y

Load y = x.f

Call r = x.k(a, โ€ฆ) Then discuss method calls later

First focus on these statements(suppose the program has just one method)

Page 62: Datalog-Based Program Analysis

Datalog Model for Pointer Analysis

Kind Statement

New i: x = new T()

Assign x = y

Store x.f = y

Load y = x.f

Tian Tan @ Nanjing University 62

New(x : V, o : O)

Assign(x : V, y : V)

Load(y : V, x : V, f : F)

Store(x : V, f : F, y : V)

Variables: VFields: FObjects: O

VarPointsTo(v: V, o : O)

FieldPointsTo(oi : O, f: F, oj : O)

EDB

IDB

e.g., fact VarPointsTo(๐‘ฅ๐‘ฅ,๐‘œ๐‘œ๐‘–๐‘–) represents ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

e.g., fact FieldsPointsTo(๐‘œ๐‘œ๐‘–๐‘–,๐‘“๐‘“,๐‘œ๐‘œ๐‘—๐‘—) represents ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“)

Page 63: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 63

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

Variables: VFields: FObjects: O

Page 64: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 64

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

New(x : V, o : O)

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Variables: VFields: FObjects: O

Page 65: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 65

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

New(x : V, o : O)

Assign(x : V, y : V)

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Variables: VFields: FObjects: O

Page 66: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 66

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

New(x : V, o : O)

Assign(x : V, y : V)

Store(x : V, f : F, y : V)

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Store

๐‘๐‘ ๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘๐‘ ๐‘“๐‘“ ๐‘‘๐‘‘Variables: V

Fields: FObjects: O

Page 67: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 67

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

New(x : V, o : O)

Assign(x : V, y : V)

Load(x : V, y : V, f : F)

Store(x : V, f : F, y : V)

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Store

๐‘๐‘ ๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘๐‘ ๐‘“๐‘“ ๐‘‘๐‘‘

Load

๐‘’๐‘’ ๐‘‘๐‘‘ ๐‘“๐‘“

Variables: VFields: FObjects: O

Page 68: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 68

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

New(x : V, o : O)

Assign(x : V, y : V)

Load(x : V, y : V, f : F)

Store(x : V, f : F, y : V)

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Store

๐‘๐‘ ๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘๐‘ ๐‘“๐‘“ ๐‘‘๐‘‘

Load

๐‘’๐‘’ ๐‘‘๐‘‘ ๐‘“๐‘“

Variables: VFields: FObjects: O

Page 69: Datalog-Based Program Analysis

Datalog Rules for Pointer Analysis

Tian Tan @ Nanjing University 69

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

Kind Statement Rule

New i: x = new T() ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Assign x = y ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Store x.f = y๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฆ๐‘ฆ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“)

Load y = x.f๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)

Page 70: Datalog-Based Program Analysis

Datalog Rules for Pointer Analysis

Tian Tan @ Nanjing University 70

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

Kind Statement Rule

New i: x = new T() ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Assign x = y ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Store x.f = y๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฆ๐‘ฆ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“)

Load y = x.f๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)

Page 71: Datalog-Based Program Analysis

Datalog Rules for Pointer Analysis

Tian Tan @ Nanjing University 71

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

Kind Statement Rule

New i: x = new T() ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Assign x = y ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Store x.f = y๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฆ๐‘ฆ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“)

Load y = x.f๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)

Page 72: Datalog-Based Program Analysis

Datalog Rules for Pointer Analysis

Tian Tan @ Nanjing University 72

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

Kind Statement Rule

New i: x = new T() ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Assign x = y ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Store x.f = y๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฆ๐‘ฆ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“)

Load y = x.f๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)

Page 73: Datalog-Based Program Analysis

Datalog Rules for Pointer Analysis

Tian Tan @ Nanjing University 73

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

Kind Statement Rule

New i: x = new T() ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Assign x = y ๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฅ๐‘ฅ)

Store x.f = y๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฆ๐‘ฆ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“)

Load y = x.f๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘ ๐‘œ๐‘œ๐‘–๐‘– .๐‘“๐‘“๐‘œ๐‘œ๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘ฆ๐‘ฆ)

Page 74: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 74

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Store

๐‘๐‘ ๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘๐‘ ๐‘“๐‘“ ๐‘‘๐‘‘

Load

๐‘’๐‘’ ๐‘‘๐‘‘ ๐‘“๐‘“

VarPointsTo FieldPointsTo

VarPointsTo(v:V, o:O)

FieldPointsTo(oi:O, f:F, oj:O)

New(x:V, o:O)

Assign(x:V, y:V) Load(x:V, y:V, f:F)

Store(x:V, f:F, y:V)

Page 75: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 75

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Store

๐‘๐‘ ๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘๐‘ ๐‘“๐‘“ ๐‘‘๐‘‘

Load

๐‘’๐‘’ ๐‘‘๐‘‘ ๐‘“๐‘“

VarPointsTo

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

FieldPointsTo

VarPointsTo(v:V, o:O)

FieldPointsTo(oi:O, f:F, oj:O)

New(x:V, o:O)

Assign(x:V, y:V) Load(x:V, y:V, f:F)

Store(x:V, f:F, y:V)

VarPointsTo(๐‘๐‘,๐‘œ๐‘œ1) <-New(๐‘๐‘,๐‘œ๐‘œ1).

VarPointsTo(๐‘๐‘,๐‘œ๐‘œ3) <-New(๐‘๐‘,๐‘œ๐‘œ3).

Page 76: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 76

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Store

๐‘๐‘ ๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘๐‘ ๐‘“๐‘“ ๐‘‘๐‘‘

Load

๐‘’๐‘’ ๐‘‘๐‘‘ ๐‘“๐‘“

VarPointsTo

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3๐‘Ž๐‘Ž ๐‘œ๐‘œ1

FieldPointsTo

VarPointsTo(v:V, o:O)

FieldPointsTo(oi:O, f:F, oj:O)

New(x:V, o:O)

Assign(x:V, y:V) Load(x:V, y:V, f:F)

Store(x:V, f:F, y:V)

VarPointsTo(๐‘Ž๐‘Ž,๐‘œ๐‘œ1) <-Assign(๐‘Ž๐‘Ž,๐‘๐‘),VarPointsTo(๐‘๐‘,๐‘œ๐‘œ1).

Page 77: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 77

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Store

๐‘๐‘ ๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘๐‘ ๐‘“๐‘“ ๐‘‘๐‘‘

Load

๐‘’๐‘’ ๐‘‘๐‘‘ ๐‘“๐‘“

VarPointsTo

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3๐‘Ž๐‘Ž ๐‘œ๐‘œ1๐‘‘๐‘‘ ๐‘œ๐‘œ3

FieldPointsTo

VarPointsTo(v:V, o:O)

FieldPointsTo(oi:O, f:F, oj:O)

New(x:V, o:O)

Assign(x:V, y:V) Load(x:V, y:V, f:F)

Store(x:V, f:F, y:V)

Page 78: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 78

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Store

๐‘๐‘ ๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘๐‘ ๐‘“๐‘“ ๐‘‘๐‘‘

Load

๐‘’๐‘’ ๐‘‘๐‘‘ ๐‘“๐‘“

VarPointsTo

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3๐‘Ž๐‘Ž ๐‘œ๐‘œ1๐‘‘๐‘‘ ๐‘œ๐‘œ3

VarPointsTo(v:V, o:O)

FieldPointsTo(oi:O, f:F, oj:O)

New(x:V, o:O)

Assign(x:V, y:V) Load(x:V, y:V, f:F)

Store(x:V, f:F, y:V)

FieldPointsTo

๐‘œ๐‘œ3 ๐‘“๐‘“ ๐‘œ๐‘œ1

FieldPointsTo(๐‘œ๐‘œ3,๐‘“๐‘“, ๐‘œ๐‘œ1) <-Store(๐‘๐‘,๐‘“๐‘“,๐‘Ž๐‘Ž),VarPointsTo(๐‘๐‘,๐‘œ๐‘œ3),VarPointsTo(๐‘Ž๐‘Ž,๐‘œ๐‘œ1).

Page 79: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 79

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Store

๐‘๐‘ ๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘๐‘ ๐‘“๐‘“ ๐‘‘๐‘‘

Load

๐‘’๐‘’ ๐‘‘๐‘‘ ๐‘“๐‘“

VarPointsTo

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3๐‘Ž๐‘Ž ๐‘œ๐‘œ1๐‘‘๐‘‘ ๐‘œ๐‘œ3

VarPointsTo(v:V, o:O)

FieldPointsTo(oi:O, f:F, oj:O)

New(x:V, o:O)

Assign(x:V, y:V) Load(x:V, y:V, f:F)

Store(x:V, f:F, y:V)

FieldPointsTo

๐‘œ๐‘œ3 ๐‘“๐‘“ ๐‘œ๐‘œ1๐‘œ๐‘œ3 ๐‘“๐‘“ ๐‘œ๐‘œ3

Page 80: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 80

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Store

๐‘๐‘ ๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘๐‘ ๐‘“๐‘“ ๐‘‘๐‘‘

Load

๐‘’๐‘’ ๐‘‘๐‘‘ ๐‘“๐‘“

VarPointsTo

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3๐‘Ž๐‘Ž ๐‘œ๐‘œ1๐‘‘๐‘‘ ๐‘œ๐‘œ3๐‘’๐‘’ ๐‘œ๐‘œ1๐‘’๐‘’ ๐‘œ๐‘œ3

VarPointsTo(v:V, o:O)

FieldPointsTo(oi:O, f:F, oj:O)

New(x:V, o:O)

Assign(x:V, y:V) Load(x:V, y:V, f:F)

Store(x:V, f:F, y:V)

FieldPointsTo

๐‘œ๐‘œ3 ๐‘“๐‘“ ๐‘œ๐‘œ1๐‘œ๐‘œ3 ๐‘“๐‘“ ๐‘œ๐‘œ3

Page 81: Datalog-Based Program Analysis

An Example

Tian Tan @ Nanjing University 81

1 b = new C();2 a = b;3 c = new C();4 c.f = a;5 d = c;6 c.f = d;7 e = d.f;

New

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3

Assign

๐‘Ž๐‘Ž ๐‘๐‘๐‘‘๐‘‘ ๐‘๐‘

Store

๐‘๐‘ ๐‘“๐‘“ ๐‘Ž๐‘Ž๐‘๐‘ ๐‘“๐‘“ ๐‘‘๐‘‘

Load

๐‘’๐‘’ ๐‘‘๐‘‘ ๐‘“๐‘“

VarPointsTo

๐‘๐‘ ๐‘œ๐‘œ1๐‘๐‘ ๐‘œ๐‘œ3๐‘Ž๐‘Ž ๐‘œ๐‘œ1๐‘‘๐‘‘ ๐‘œ๐‘œ3๐‘’๐‘’ ๐‘œ๐‘œ1๐‘’๐‘’ ๐‘œ๐‘œ3

FieldPointsTo

๐‘œ๐‘œ3 ๐‘“๐‘“ ๐‘œ๐‘œ1๐‘œ๐‘œ3 ๐‘“๐‘“ ๐‘œ๐‘œ3

VarPointsTo(x, o) <-New(x, o).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

VarPointsTo(v:V, o:O)

FieldPointsTo(oi:O, f:F, oj:O)

New(x:V, o:O)

Assign(x:V, y:V) Load(x:V, y:V, f:F)

Store(x:V, f:F, y:V)

Page 82: Datalog-Based Program Analysis

Handle Method Calls

Tian Tan @ Nanjing University 82

Kind Statement Rule

Call l: r = x.k(a1,โ€ฆ,an)

๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ , ๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘– , k)๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›

๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก)

๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘š๐‘š๐‘๐‘๐‘—๐‘— , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘Ÿ๐‘Ÿ)

EDBโ€ข VCall(l:S, x:V, k:M)โ€ข Dispatch(o:O, k:M, m:M)โ€ข ThisVar(m:M, this:V)IDBโ€ข Reachable(m:M)โ€ข CallGraph(l:S, m:M)

Statements (Labels): S

Methods: M

Page 83: Datalog-Based Program Analysis

Handle Method Calls

Tian Tan @ Nanjing University 83

Kind Statement Rule

Call l: r = x.k(a1,โ€ฆ,an)

๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ , ๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘– , k)๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›

๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก)

๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘š๐‘š๐‘๐‘๐‘—๐‘— , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘Ÿ๐‘Ÿ)

VarPointsTo(this, o),Reachable(m),CallGraph(l, m) <-

VCall(l, x, k),VarPointsTo(x, o),Dispatch(o, k, m),ThisVar(m, this).

EDBโ€ข VCall(l:S, x:V, k:M)โ€ข Dispatch(o:O, k:M, m:M)โ€ข ThisVar(m:M, this:V)IDBโ€ข Reachable(m:M)โ€ข CallGraph(l:S, m:M)

Statements (Labels): S

Methods: M

Page 84: Datalog-Based Program Analysis

Handle Method Calls

Tian Tan @ Nanjing University 84

Kind Statement Rule

Call l: r = x.k(a1,โ€ฆ,an)

๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ , ๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘– , k)๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›

๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก)

๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘š๐‘š๐‘๐‘๐‘—๐‘— , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘Ÿ๐‘Ÿ)

VarPointsTo(pi, o) <-CallGraph(l, m),Argument(l, i, ai),Parameter(m, i, pi),VarPointsTo(ai, o).

EDBโ€ข Argument(l:S, i:N, ai:V)โ€ข Parameter(m:M, i:N, pi:V)

Statements (Labels): S

Methods: M

Nature numbers(indexes)

N

Page 85: Datalog-Based Program Analysis

Handle Method Calls

Tian Tan @ Nanjing University 85

Kind Statement Rule

Call l: r = x.k(a1,โ€ฆ,an)

๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ , ๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘– , k)๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›

๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก)

๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘š๐‘š๐‘๐‘๐‘—๐‘— , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘Ÿ๐‘Ÿ)

VarPointsTo(r, o) <-CallGraph(l, m),MethodReturn(m, ret),VarPointsTo(ret, o),CallReturn(l, r).

EDBโ€ข MethodReturn(m:M, ret:V)โ€ข CallReturn(l:S, r:V)

Statements (Labels): S

Methods: M

Page 86: Datalog-Based Program Analysis

Handle Method Calls

Tian Tan @ Nanjing University 86

Kind Statement Rule

Call l: r = x.k(a1,โ€ฆ,an)

๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘ ๐‘ฅ๐‘ฅ , ๐‘š๐‘š = Dispatch(๐‘œ๐‘œ๐‘–๐‘– , k)๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›

๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ)๐‘œ๐‘œ๐‘–๐‘– โˆˆ ๐‘๐‘๐‘๐‘(๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘ก๐‘–๐‘–๐‘ก๐‘ก)

๐‘œ๐‘œ๐‘ข๐‘ข โˆˆ ๐‘๐‘๐‘๐‘ ๐‘š๐‘š๐‘๐‘๐‘—๐‘— , 1 โ‰ค ๐‘Ž๐‘Ž โ‰ค ๐‘›๐‘›๐‘œ๐‘œ๐‘ฃ๐‘ฃ โˆˆ ๐‘๐‘๐‘๐‘(๐‘Ÿ๐‘Ÿ)

VarPointsTo(this, o),Reachable(m),CallGraph(l, m) <-

VCall(l, x, k),VarPointsTo(x, o),Dispatch(o, k, m),ThisVar(m, this).

VarPointsTo(pi, o) <-CallGraph(l, m),Argument(l, i, ai),Parameter(m, i, pi),VarPointsTo(ai, o).

VarPointsTo(r, o) <-CallGraph(l, m),MethodReturn(m, ret),VarPointsTo(ret, o),CallReturn(l, r).

Page 87: Datalog-Based Program Analysis

Whole-Program Pointer Analysis

Tian Tan @ Nanjing University 87

Reachable(m) <-EntryMethod(m).

VarPointsTo(x, o) <-Reachable(m),New(x, o, m).

VarPointsTo(x, o) <-Assign(x, y),VarPointsTo(y, o).

FieldPointsTo(oi, f, oj) <-Store(x, f, y),VarPointsTo(x, oi),VarPointsTo(y, oj).

VarPointsTo(y, oj) <-Load(y, x, f),VarPointsTo(x, oi),FieldPointsTo(oi, f, oj).

VarPointsTo(this, o),Reachable(m),CallGraph(l, m) <-

VCall(l, x, k),VarPointsTo(x, o),Dispatch(o, k, m),ThisVar(m, this).

VarPointsTo(pi, o) <-CallGraph(l, m),Argument(l, i, ai),Parameter(m, i, pi),VarPointsTo(ai, o).

VarPointsTo(r, o) <-CallGraph(l, m),MethodReturn(m, ret),VarPointsTo(ret, o),CallReturn(l, r).

Page 88: Datalog-Based Program Analysis

1. Motivation2. Introduction to Datalog3. Pointer Analysis via Datalog4. Taint Analysis via Datalog

88

Contents

Tian Tan @ Nanjing University

Page 89: Datalog-Based Program Analysis

Datalog Model for Taint Analysis

On top of pointer analysisโ€ข EDB predicates

โ€ข Source(m : M) // source methodsโ€ข Sink(m : M) // sink methodsโ€ข Taint(l : S, t : T) // associates each call site to

the tainted data from the call site

โ€ข IDB predicateโ€ข TaintFlow(t : T, m : M) // detected taint flows,

e.g., TaintFlow(๐‘๐‘,๐‘š๐‘š) denotes that tainted data ๐‘๐‘ may flow to sink method ๐‘š๐‘š

Tian Tan @ Nanjing University 89

Page 90: Datalog-Based Program Analysis

Taint Analysis via Datalog

Tian Tan @ Nanjing University 90

Kind Statement Rule

Call l: r = x.k(a1,โ€ฆ,an)๐‘™๐‘™ โ†’ ๐‘š๐‘š โˆˆ ๐ถ๐ถ๐ถ๐ถ๐‘š๐‘š โˆˆ ๐‘†๐‘†๐‘œ๐‘œ๐‘†๐‘†๐‘Ÿ๐‘Ÿ๐‘๐‘๐‘’๐‘’๐‘๐‘๐‘๐‘๐‘™๐‘™ โˆˆ ๐‘๐‘๐‘๐‘(๐‘Ÿ๐‘Ÿ)

โ€ข Handles sources (generates tainted data)

โ€ข Handles sinks (generates taint flow information)Kind Statement Rule

Call l: r = x.k(a1,โ€ฆ,an)

๐‘™๐‘™ โ†’ ๐‘š๐‘š โˆˆ ๐ถ๐ถ๐ถ๐ถ๐‘š๐‘š โˆˆ ๐‘†๐‘†๐‘†๐‘†๐‘›๐‘›๐‘†๐‘†๐‘๐‘

โˆƒ๐‘†๐‘†, 1 โ‰ค ๐‘†๐‘† โ‰ค ๐‘›๐‘›: ๐‘๐‘๐‘—๐‘— โˆˆ ๐‘๐‘๐‘๐‘(๐‘Ž๐‘Ž๐‘†๐‘†)๐‘๐‘๐‘—๐‘— ,๐‘š๐‘š โˆˆ ๐‘‡๐‘‡๐‘Ž๐‘Ž๐‘†๐‘†๐‘›๐‘›๐‘๐‘๐‘‡๐‘‡๐‘™๐‘™๐‘œ๐‘œ๐‘‡๐‘‡๐‘๐‘

VarPointsTo(r, t) <-CallGraph(l, m),Source(m),CallReturn(l, r),Taint(l, t).

TaintFlow(t, m) <-CallGraph(l, m),Sink(m),Argument(l, _, ai),VarPointsTo(ai, t),Taint(_, t).

Page 91: Datalog-Based Program Analysis

Datalog-Based Program Analysis

โ€ข Prosโ€ข Succinct and readableโ€ข Easy to implementโ€ข Benefit from off-the-shelf optimized Datalog engines

โ€ข Consโ€ข Restricted expressiveness, i.e., it is impossible or

inconvenient to express some logicsโ€ข Cannot fully control performance

Tian Tan @ Nanjing University 91

Page 92: Datalog-Based Program Analysis

The X You Need To Understand in This Lecture

โ€ข Datalog language

โ€ข How to implement pointer analysis via Datalog

โ€ข How to implement taint analysis via Datalog

Page 93: Datalog-Based Program Analysis

ๅ—ไบฌๅคงๅญฆ

ๆŽๆจพ

่ฐญๆทป

่ฎก็ฎ—ๆœบ็ง‘ๅญฆไธŽๆŠ€ๆœฏ็ณป

็จ‹ๅบ่ฎพ่ฎก่ฏญ่จ€

้™ๆ€ๅˆ†ๆž็ ”็ฉถ็ป„

ไธŽ

่ฝฏไปถๅˆ†ๆž