Transcript

Yoyak: static analysis framework

Heejong Lee

ScalaDays 2015

Speaker Introduction

• Has been working in a static analysis industry since 2008

• Studied programming language theory at a graduate school

• Has been developing several static analyzers which are mostly commercial ones

• Began to use Scala six years ago and still actively using it in everyday development

Agenda

• Static analysis

• Theory of abstract interpretation

• Yoyak framework: implementation highlights

• Yoyak framework: Scala experience

• Yoyak framework: Roadmap

Static Analysis

What is Static Analysis?

• Analyze source codes without actually running it

• Someone prefers to call it white box test

• Used for finding bugs, optimizing a compiled binary, calculating a software metric, proving safety properties, etc.

Examples of Static Analysis

• Finding bugs : symbolic execution

• Optimizing a compiled binary: data flow analysis

• Calculating a software metric: syntactic analysis

• Proving safety properties: model checking, abstract interpretation, type system

Two important terms in Static Analysis

• Soundness

• The analysis result should contain all possibilities which can happen in the runtime

• If the analysis uses an over-approximation, it is sound

• Completeness

• The analysis result should not contain any possibility which cannot happen in the runtime

• If the analysis uses an under-approximation, it is complete

Two important terms in Static Analysis

Over-approximation of Semantics

Program Semantics

Under-approximation of Semantics

Abstract Interpretation

The beauty of abstraction

http://cargocollective.com/carlyfox/Design

What is the result of this expression?

19224⇥ 7483919⇥ (11952� 20392)

What is the result of this expression?

19224⇥ 7483919⇥ (11952� 20392)

= �1214270048744640

How long does it take without a calculator?

What is the result of this expression?

19224⇥ 7483919⇥ (11952� 20392)

= �1214270048744640

What if we do not have an interest in the exact number, rather we just want to know whether it is positive or negative?

What is the result of this expression?

19224⇥ 7483919⇥ (11952� 20392)

+⇥ +⇥ �= �

= n (n 2 Z ^ n < 0)

What is the result of this expression?

19224⇥ 7483919⇥ (11952� 20392)

= �1214270048744640

= n (n 2 Z ^ n < 0)

takes 30 seconds

takes 3 seconds

• inaccurate but not incorrect • accurate enough for a specific purpose • much faster than a real calculation

This is abstract interpretation

Is this program safe from buffer overruns?

void foo(int x) {String[] strs = new String[10];int index = 0;while(x > 0) {

index = index + 1;x = x - 1;

}strs[index] = "hello!";

}

No, ArrayIndexOutOfBoundsException may occur at the last line

void foo(int x) {String[] strs = new String[10];int index = 0;while(x > 0) {

index = index + 1;x = x - 1;

}strs[index] = "hello!";

}

index = [0,0]

index = [1,∞]

index = [0,∞]

• Roughly but soundly execute the program

Abstract interpretation for dummies

?

Abstract interpretation for brains

First, we need to precisely define what “domain” and “semantics” means in a mathematical way

Let me introduce you Javar language

1

1

What this program means?

Javar-1

C ! n (n 2 Z)

Javar-1 semantic domain

n 2 V alue = ZJCK 2 V alue

Javar-1 semantics

JnK = n

1+1

Javar-2

C ! n op n (n 2 Z, op 2 {+,�, ⇤, /})

Javar-{1,2} semantic domain

n 2 V alue = ZJCK 2 V alue

Javar-2 semantics

JnK = n

Jn1 + n2K = Jn1K + Jn2KJn1 � n2K = Jn1K � Jn2KJn1 ⇤ n2K = Jn1K ⇥ Jn2KJn1 / n2K = Jn1K ÷ Jn2K

x := x + 1

Javar-3

C ! x := E

E ! n (n 2 Z)| x| E op E (op 2 {+,�, ⇤, /})

Javar-3 semantic domain

M 2 Memory = V ar ! V alue

n 2 V alue = Zx 2 V ar = V ariables

JCK 2 Memory ! Memory

JEK 2 Memory ! Z

Javar-3 semantics

Jx := EKM = M{x ! JEKM}JnKM = n

JxKM = M(x)

JE1{+,�, ⇤, /}E2KM = JE1KM{+,�,⇥,÷}JE2KM

x := 100 + 2; if(x) x := x * 10 else x := x / 2; while(x) x := x - 1

Javar-4

C ! x := E

| if (E) C else C

| while (E) C

| C;C

E ! n (n 2 Z)| x| E op E (op 2 {+,�, ⇤, /})

Javar-{3,4} semantic domain

M 2 Memory = V ar ! V alue

n 2 V alue = Zx 2 V ar = V ariables

JCK 2 Memory ! Memory

JEK 2 Memory ! Z

Javar-4 semantics

Jx := EKM = M{x ! JEKM}Jif(E) C1 else C2KM = if JEKM 6= 0 then JC1KM else JC2KM

Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M

JnKM = n

JxKM = M(x)

JE1{+,�, ⇤, /}E2KM = JE1KM{+,�,⇥,÷}JE2KM

This is not a definition

Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M

GNU = GNU’s Not Unix

The existence and uniqueness of the fixed-point is guaranteed by domain theory

Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M

Jwhile(E) CK = �M.if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M

F = �M.if JEKM 6= 0 then F (JCKM) else M

F = H(F )

Jwhile(E) CK = fix(�F.�M.if JEKM 6= 0 then F (JCKM) else M)

Abstract interpretation revisited

• Safely estimate program semantics in a finite time

• Abstraction is not omission, guarantees soundness

• Most of static analysis techniques can be defined in a form of abstract interpretation

Key Elements of Abstract Interpretation

• Domain : concrete domain, abstract domain

• Semantics : concrete semantics, abstract semantics

• Galois connection : pair of abstraction and concretization functions

• CPO : complete partial order

• Continuous function : preserving upper bound

Galois Connection

8x 2 D, x 2 D : ↵(x) v x () x v �(x)

x

x

D D

CPO

exists partial order ⊑

exists element x where x ⊑ y (for all y ∈ D)

for all ordered subset of D, there exists upper bound x where x ∈ D

Lattices

Partially ordered set in which every two elements have a unique LUB(⊔)

and a unique GLB(⊓)

Continuous Function

x

D

8ordered subset S ✓ D,F (G

x2S

x) =G

x2S

F (x)

D

y

z

F (x)

F (y)

F (z)

Abstract Interpretation in a NutshellConcrete Abstract

Program Semantics

Domain D should be CPO should be CPO

Galois Connection

Semantic Function F should be continuous should be monotonic

Program Execution

F : D ! D F : D ! D

lfp F =G

i2NF i(?)

G

i2NF i(?) v X

↵ : D ! D � : D ! D

Performing analysis using abstract interpretation = calculating in a finite time X

And the following formula is always satisfied (soundness guarantee)

lfp F v �X

Abstract Interpretation in a Nutshell

lfp F v �X

false positives

lfp F

X

lfp F

↵ � F v F � ↵

D D

Is this program safe from buffer overruns?

void foo(int x) {String[] strs = new String[10];int index = 0;if(x > 0) {

index = 1;} else {

index = 10;}strs[index] = "hello!";

}

void foo(int x) {String[] strs = new String[10];int index = 0;if(x > 0) {

index = 1;} else {

index = 10;}strs[index] = "hello!";

}

index = [0,0]

index = [1,1]

index = [10,10]

index = [1,10]

Interval analysis based on abstract interpretation

• Concrete domain: the domain in the real world

Memory = V ar ! V alue

V alue = 2Z

C 2 C ! Memory ! Memory

V 2 E ! Memory ! V alue

Interval analysis based on abstract interpretation

• Concrete semantics: the semantics in the real world

C x := E m = m{x 7! V E m}C if(E) C1 C2 m = V E m ? C C1 m : C C2 m

C while(E) C m = V E m ? C while(E) C (C C m) : m

C C1;C2 m = C C2 (C C1 m)

V x m = m x

V n m = {n}V E1 + E2 m = (V E1 m) + (V E2 m)

Interval analysis based on abstract interpretation

• Concrete execution of a program

? @ F (?) @ F (F (?)) @ F (F (F (?)))... @ F i(?) = F i+1(?)

is the execution result of a programF

i(?) 2 Memory

F = �m.C C m

lfp F =G

i2NF i({})

Interval analysis based on abstract interpretation

• Abstract domain: the domain we will use in an analysis

ˆMemory = V ar ! ˆ

V alue

ˆV alue = Z [ {?}

Z = {[a, b] | a 2 Z [ {�1}, b 2 Z [ {1}, a b}C 2 C ! ˆ

Memory ! ˆMemory

V 2 E ! ˆMemory ! ˆ

V alue

[0,0] [1,1] [2,2] ……..[-1,-1][-2,-2][-3,-3]

[-1,0] [0,1] [0,2][-2,-1][-3,-2]

[-3,-1] [-2,0] [-1,1] [0,2]

[-2,1][-3,0] [-1,2]

……..

[-∞,∞]

[0,∞]

[-1,∞]

[-2,∞]

……..

[-∞,0]

[-∞,1]

[-∞,2]

…….……

………………

………………..…

……..

…….……

………………

………………..…

Lattice of Interval Domain

Interval analysis based on abstract interpretation

• Abstract semantics: the semantics we will use in an analysis

C x := E m = m{x 7! V E m}C if(E) C1 C2 m = C C1 m t C C2 m

C while(E) C m = m t C while(E) C (C C m)

C C1;C2 m = C C2 (C C1 m)

V x m = m x

V n m = ↵{n}V E1 + E2 m = (V E1 m)+(V E2 m)

Interval analysis based on abstract interpretation

• Abstract execution of a program

is the analysis result of a program

F = �m.C C mG

i2NF i({}) v X

? @ F (?) @ F (F (?)) @ F (F (F (?)))... @ F i(?) v X

X

Interval analysis based on abstract interpretation

• Widening

What if this chain has infinite length?

? @ F (?) @ F (F (?)) @ F (F (F (?)))... @ F i(?) v X

? @ F (?) @ F (F (?)) @ F (F (F (?)))... @ F i�1(?)rF i(?) v X

rWe need a widening operator

Interval analysis based on abstract interpretation

• Widening

? @ [0, 0] @ [0, 1] @ [0, 2]... @ [0, i� 1] r [0, i] v [0,1]

void foo(int x) {String[] strs = new String[10];int index = 0;while(x > 0) {

index = index + 1;x = x - 1;

}strs[index] = "hello!";

}

index = [0,0]

index = [1,∞]

index = [0,∞]

Is this program safe from buffer overruns?

void foo(int x) {String[] strs = new String[10];int index = 0;if(x > 0) {

index = 1;} else {

index = 10;}strs[index] = "hello!";

}

Interval analysis based on abstract interpretation

0

213 4

5 6

index = 0; if(x > 0) index = 1 else index = 10; result = index

C C0 m = C C2 (C C1 m)

C C1 m = m{index 7! ↵{0}}C C2 m = C C4 (C C3 m)

C C3 m = C C5 m t C C6 m

C C4 m = m{result 7! m index}C C5 m = m{index 7! ↵{1}}C C6 m = m{index 7! ↵{10}}

Interval analysis based on abstract interpretation

C C0 {} = C C2 (C C1 {})C C1 {} = {index 7! [0, 0]}

C C2 {index 7! [0, 0]} = C C4 (C C3 {index 7! [0, 0]})C C3 {index 7! [0, 0]} = C C5 {index 7! [0, 0]} t C C6 {index 7! [0, 0]}C C4 {index 7! [1, 10]} = {index 7! [1, 10], result 7! [1, 10]}C C5 {index 7! [0, 0]} = {index 7! [1, 1]}C C6 {index 7! [0, 0]} = {index 7! [10, 10]}

C C0 {} = {index 7! [1, 10], result 7! [1, 10]}

void foo(int x) {String[] strs = new String[10];int index = 0;if(x > 0) {

index = 1;} else {

index = 10;}strs[index] = "hello!";

}

index may have an integer between 1 and 10

Since the size of the buffer strs is 10, ArrayIndexOutOfBoundsException may occur here

Is this program safe from buffer overruns?

YoyakDo not reinvent the wheel

https://trimaps.com/assets/website/dontreinventthemap-6ba62b8ba05d4957d2ed772584d7e4cd.png

Motivation

• Do no reinvent the wheel : many components that static analyzers often use are reusable

• CFG data types : construction, optimization, visualization

• Graph algorithms : unrolling loops, finding loop heads, finding topological order

• Intermediate language data types : construction, optimization, pretty printing

• Common abstract domains : integer interval, abstract object, abstract memory

• Common abstract semantics : assignment, invoking methods, evaluating binary expressions

Motivation

• Perfect to be a framework : the theory of abstract interpretation guarantees soundness and termination of the analysis if a user supplies valid abstract domain and semantics

Generic fixed point computation engine

Abstract domain D Abstract semantics F

Fixed point x = F(x) (x∈D)

OverviewYoyak

Abstract Domain Fixed Point Computation Abstract Semantics

MapDom

MemDom

Interval

ArithmeticOps

LatticeOps

StdSemanticsForwardAnalysis

AbstractTransferable

Widening

Galois

ILFlowSensitiveFixedPoint

Computation

Worklist

WideningAtLoopHeads

InterproceduralIteration

DoWidening

CommonIL

Attachable

Typable

Fixed-point Computation in Yoyak

Built-in work-list algorithm

x := 10

Assume (y == 0) println(“0”)

println(“2”)

Assume (y != 0)

Assume (y == 1) println(“0”) Assume (y != 1)

Assume (z) throw new Ex();

ENTRY

EXIT

Assume (!z) println(“done”) return;

def computeFixedPoint(startNodes: List[BasicBlock])(implicit widening: Option[Widening[D]] = None) : MapDom[BasicBlock,D] = { worklist.add(startNodes:_*) var map = MapDom.empty[BasicBlock,D] while(worklist.size() > 0) { val bb = worklist.pop().get val prevInputs = memoryFetcher(map,bb) val prev = getInput(map,prevInputs) val (mapOut,next) = work(map,prev,bb) val orig = map.get(bb) val isStableOpt = ops.<=(next,orig) if(isStableOpt.isEmpty) { println("error: abs. transfer func. is not distributive") } if(!isStableOpt.get) { val widened = if(widening.nonEmpty) { doWidening(widening.get)(orig,next,bb) } else next map = mapOut.update(bb->widened) val nextWork = getNextBlocks(bb) worklist.add(nextWork:_*) } } map

Fixed-point Computation in Yoyak

Built-in work-list algorithm

trait FlowSensitiveFixedPointComputation[D<:Galois] extends FlowSensitiveIteration[D] with CfgNavigator[D] with DoWidening[D] {

def computeFixedPoint(startNodes: List[BasicBlock])(implicit widening: Option[Widening[D]] = None) : MapDom[BasicBlock,D] = {

class FlowSensitiveForwardAnalysis[D<:Galois](val cfg: CFG)( implicit val ops: LatticeOps[D], val absTransfer: AbstractTransferable[D], val widening: Option[Widening[D]] = None) extends FlowSensitiveFixedPointComputation[D] with WideningAtLoopHeads[D] {

Abstract Semantics in Yoyak

Built-in work-list algorithm

trait AbstractTransferable[D<:Galois] { protected def transferIdentity(stmt: Identity, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferAssign(stmt: Assign, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferInvoke(stmt: Invoke, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferIf(stmt: If, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferAssume(stmt: Assume, input: D#Abst)( implicit context: Context) : D#Abst = input

// so on

Abstract Semantics in Yoyak

Built-in standard semantic

trait StdSemantics[A<:Galois,D,Mem<:MemDomLike[A,D,Mem]] extends AbstractTransferable[GaloisIdentity[Mem]] { val arithOps : ArithmeticOps[A]

override protected def transferAssign(stmt: Assign, input: Mem)( implicit context: Context) : Mem = { val (rv,output) = eval(stmt.rv,input) output.update(stmt.lv,rv) }

Abstract Domain in Yoyak

Composable abstract domains

class MapDom[K,V <: Galois : LatticeOps] {

trait LatticeOps[D <: Galois] extends ParOrdOps[D] { def \/(lhs: D#Abst, rhs: D#Abst) : D#Abst def bottom : D#Abst

trait ParOrdOps[D <: Galois] { def <=(lhs: D#Abst, rhs: D#Abst) : Option[Boolean]

trait Galois { type Conc type Abst

Abstract Domain in Yoyak

Built-in Interval Domain

scala> import com.simplytyped.yoyak.framework.domain.arith._ import com.simplytyped.yoyak.framework.domain.arith._

scala> import com.simplytyped.yoyak.framework.domain.arith.Interval._ import com.simplytyped.yoyak.framework.domain.arith.Interval._

scala> val intv1 = Interv.of(10) intv1: com.simplytyped.yoyak.framework.domain.arith.Interval = Interv(IInt(10),IInt(10))

scala> val intv2 = Interv.in(IInt(-10),IInt(10)) intv2: com.simplytyped.yoyak.framework.domain.arith.Interval = Interv(IInt(-10),IInt(10))

scala> val intv3 = Interv.in(IInfMinus,IInf) intv3: com.simplytyped.yoyak.framework.domain.arith.Interval = IntervTop

scala> val intv4 = Interv.in(IInt(-10),IInf) intv4: com.simplytyped.yoyak.framework.domain.arith.Interval = Interv(IInt(-10),IInf)

Abstract Domain in Yoyak

Built-in Interval Domain

scala> import IntervalInt.arithOps import IntervalInt.arithOps

scala> arithOps.+(intv1,intv2) // [10,10] + [-10,10] res1: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = Interv(IInt(0),IInt(20))

scala> arithOps.-(intv1,intv2) // [10,10] - [-10,10] res2: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = Interv(IInt(0),IInt(20))

scala> arithOps.+(intv2,intv3) // [-10,10] + [-∞,∞] res3: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = IntervTop

scala> arithOps.*(intv2,intv4) // [-10,10] * [-10,∞] res4: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = IntervTop

scala> arithOps.*(intv1,intv4) // [10,10] * [-10,∞] res5: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = Interv(IInt(-100),IInf)

Abstract Domain in Yoyak

Built-in Standard Object Model

trait StdObjectModel[A<:Galois,D<:Galois,This<:StdObjectModel[A,D,This]] extends MemDomLike[A,D,This] with ArrayJoinModel[A,D,This] {

implicit val arithOps : ArithmeticOps[A] implicit val boxedOps : LatticeWithTopOps[D]

def update(kv: (Loc,AbsValue[A,D])) : This def remove(loc: Local) : This def alloc(from: Stmt) : (AbsRef,This) def get(k: Loc) : AbsValue[A,D] def isStaticAddr(addr: AbsAddr) : Boolean def isDynamicAddr(addr: AbsAddr) : Boolean

class MemDom[A <: Galois : ArithmeticOps, D <: Galois : LatticeWithTopOps] extends StdObjectModel[A,D,MemDom[A,D]] {

Abstract Domain in Yoyak

Built-in Memory Domain

scala> import com.simplytyped.yoyak.framework.domain.mem.MemDom scala> import com.simplytyped.yoyak.framework.domain.mem.MemElems._ scala> import com.simplytyped.yoyak.framework.domain.Galois._ scala> import com.simplytyped.yoyak.framework.domain.arith.Interv scala> import com.simplytyped.yoyak.framework.domain.arith.IntervalInt scala> import com.simplytyped.yoyak.il.CommonIL.Value._

scala> val memory = new MemDom[IntervalInt,SetAbstraction[String]] memory: com.simplytyped.yoyak.framework.domain.mem.MemDom[com.simplytyped.yoyak.framework.domain.arith.IntervalInt,com.simplytyped.yoyak.framework.domain.Galois.SetAbstraction[String]] = com.simplytyped.yoyak.framework.domain.mem.MemDom@8443a1

Abstract Domain in Yoyak

scala> val memory2 = memory.update(Local("x") -> AbsArith[IntervalInt](Interv.of(1)))

scala> val memory3 = memory.update(Local("x") -> AbsArith[IntervalInt](Interv.of(10)))

scala> val memory4 = MemDom.ops[IntervalInt,SetAbstraction[String]].\/(memory2,memory3)

scala> memory4.get(Local("x")) res1: com.simplytyped.yoyak.framework.domain.mem.MemElems.AbsValue[com.simplytyped.yoyak.framework.domain.arith.IntervalInt,com.simplytyped.yoyak.framework.domain.Galois.SetAbstraction[String]] = AbsArith(Interv(IInt(1),IInt(10)))

Built-in Memory Domain

IL in Yoyak

CommonIL

abstract class Stmt extends Attachable { override def equals(that: Any): Boolean = this eq that.asInstanceOf[AnyRef] override def hashCode() : Int = System.identityHashCode(this)

private[Stmt] def copyAttr(stmt: Stmt) : this.type = {sourcePos = stmt.pos; this} }

IL in Yoyak

CommonIL

case class Block(stmts: StatementContainer) extends Stmtcase class Switch(v: Value.Loc, keys: List[Value.t], targets: List[Target]) extends Stmtcase class Placeholder(x: AnyRef) extends Stmt

sealed trait CoreStmt extends Stmtcase class If(cond: Value.CondBinExp, target: Target) extends CoreStmtcase class Goto(target: Target) extends CoreStmt

sealed trait CfgStmt extends CoreStmtcase class Identity(lv: Value.Local, rv: Value.Param) extends CfgStmtcase class Assign(lv: Value.Loc, rv: Value.t) extends CfgStmtcase class Invoke(ret: Option[Value.Local], callee: Type.InvokeType) extends CfgStmtcase class Assume(cond: Value.CondBinExp) extends CfgStmtcase class Return(v: Option[Value.Loc]) extends CfgStmtcase class Nop() extends CfgStmtcase class EnterMonitor(v: Value.Loc) extends CfgStmtcase class ExitMonitor(v: Value.Loc) extends CfgStmtcase class Throw(v: Value.Loc) extends CfgStmt

IL in YoyakStmt

x := 10; switch (y) { case 0: println(“0”); break; case 1: println(“1”); default: println(“2”); } if(z) { throw new Exception(); } else { println(“done”); } return 0;

x := 10; if(y == 0) { println(“0”); goto D; } if(y == 1) { println(“1”); } D: println(“2”); if(z) { throw new Exception(); } else { println(“done”); } return 0;

CoreStmt

x := 10

Assume (y == 0) println(“0”)

println(“2”)

Assume (y != 0)

Assume (y == 1) println(“0”) Assume (y != 1)

Assume (z) throw new Ex();

ENTRY

EXIT

Assume (!z) println(“done”) return;

CfgStmt

Simple Interval Analysis in Yoyakclass IntervalAnalysis(cfg: CFG) { def run() = { import IntervalAnalysis.{memDomOps,absTransfer,widening} val analysis = new FlowSensitiveForwardAnalysis[GMemory](cfg) val output = analysis.compute output }}

object IntervalAnalysis { type Memory = MemDom[IntervalInt,SetAbstraction[Any]] type GMemory = GaloisIdentity[Memory] implicit val absTransfer : AbstractTransferable[GMemory] = new StdSemantics[IntervalInt,SetAbstraction[Any],Memory] { val arithOps: ArithmeticOps[IntervalInt] = IntervalInt.arithOps }

implicit val memDomOps : LatticeOps[GMemory] = MemDom.ops[IntervalInt,SetAbstraction[Any]] implicit val widening : Option[Widening[GMemory]] = { implicit val NoWideningForSetAbstraction = Widening.NoWidening[SetAbstraction[Any]] Some(MemDom.widening[IntervalInt,SetAbstraction[Any]]) }}

Simple Interval Analysis in YoyakMemDom

StdObjectModel

MapDom

AbsValue

AbsRef

AbsArithIntervalInt

AbsBoxSetAb[Any]

AbsBottom

AbsTop

AbsObject

AbsAddr

IntervalAnalysis

FlowSensitiveForwardAnalysis

FlowSensitiveFixedPointComputation

Worklist

LatticeOps

FlowSensitiveIterationAbstract

Transferable

CfgNavigator

WideningAtLoopHeads

Widening

MapDom

BasicBlock

MemDom

MemDom.op

IntervalInt.widening

IntervalAnalysisTransferFunction

CFG

Fixed-point result

StdSemantics

ArithmeticOps

IntervalInt.arithOps

Yoyak : Scala Experience

• Scala is a very good language to implement a static analyzer

• Function is a first class citizen

• Type class support

• Algebraic data type support

• Native support for mutable and immutable values

• Excellent support for parallelization

Yoyak : Scala Experience

• Function is a first class citizen

Natural way to express mathematical logic

// optimize Cfg(insertAssume _ andThen removeIfandGoto) apply rawCfg

Yoyak : Scala Experience

• Type class support

Can avoid F-bounded polymorphism which is the fast lane to overworking

• F-bounded polymorphism

• Commonly happen when inheritance meets immutability

• Seriously deteriorate code readability

Yoyak : Scala Experience• F-bounded polymorphism

trait Queue[T, This <: Queue[T, This]] {def push(elem: T) : This

}trait GoodQueue[T, This <: GoodQueue[T, This]] extends Queue[T, This] {

def pop : (T, This)}trait BetterQueue[T, R, This <: BetterQueue[T, R, This]] extends GoodQueue[T, This] {

def giveMeSomethingNew : R}trait QueueUnited[T, R, Q <: Queue[T, Q], G <: GoodQueue[T, G], B <: BetterQueue[T, R, B], This <: QueueUnited[T, R, Q, G, B, This]] extends BetterQueue[T, R, This] {

def giveUp : Unit}

• Always need the type of concrete subclass • Reiterate all type variables again in subclass reference • Type class liberates methods from inheritance

Yoyak : Scala Experience• Type class

trait QueueLike[T,This] {def push(elem: T) : This

}trait GoodQueueLike[T,This] {

implicit val queueLike : QueueLike[T,This]def push(elem: T) : This = queueLike.push(elem)def pop(q: This) : (T,This)

}trait BetterQueueLike[T,R,This] {

implicit val goodQueueLike : GoodQueueLike[T,This]def push(elem: T) : This = goodQueueLike.push(elem)def pop(q: This) : (T,This) = goodQueueLike.pop(q)def giveMeSomethingNew : R

}class QueueUnited[T,R,This](implicit val q : QueueLike[T,This], g : GoodQueueLike[T,This], b : BetterQueueLike[T,R,This]) {

def push(elem: T) : This = b.push(elem)def pop(q: This) : (T,This) = b.pop(q)def giveMeSomethingNew : R = b.giveMeSomethingNewdef giveUp : Unit = {}

}

Yoyak : Scala Experience• Type class in Yoyak

trait StdObjectModel[A<:Galois,D<:Galois,This<:StdObjectModel[A,D,This]] extends MemDomLike[A,D,This] with ArrayJoinModel[A,D,This] { implicit val arithOps : ArithmeticOps[A] implicit val boxedOps : LatticeWithTopOps[D]

Use both methods in an appropriate place

Yoyak : Scala Experience

• Algebraic data type support

Natural way to express an abstract syntax tree of a program

;

if(x)

a = 1 a = 2

println(a)

Seq( If(“x”,Assign(“a”,1), Assign(“a”,2)), Invoke(“println”,List(“a”)))

Yoyak : Scala Experience

• Algebraic data type support

Easy to navigate the abstract syntax tree

def eval(v: Value.t, input: Mem)(implicit context: Context) : (AbsValue[A,D],Mem) = { v match { case x : Value.Constant => evalConstant(x,input) case x : Value.Loc => evalLoc(x,input) case x : Value.BinExp => evalBinExp(x,input) case Value.This => (AbsRef(Set("$this")),input) case Value.CaughtExceptionRef => (AbsRef(Set("$caughtex")),input) case Value.CastExp(v, ofTy) => evalLoc(v,input) case Value.InstanceOfExp(v, ofTy) => (AbsTop,input) case Value.LengthExp(v) => (AbsTop,input) case Value.NewExp(ofTy) => input.alloc(context.stmt) case Value.NewArrayExp(ofTy, size) => input.alloc(context.stmt)

Yoyak : Scala Experience

• Native support for mutable and immutable values

Memory

x

y

z

Object

f

g

1

“A”

In some cases, mutability is more important than immutability

Yoyak : Scala Experience

• Native support for mutable and immutable values

Memory

x

y

z

Object

f

g

1

“A”

NewObject

f

g

2

“A”

memory.filter{_._2 == object}.foldLeft(memory) { case (m,(k,_)) => m + (k -> newObject)}

O(n)

Yoyak : Scala Experience

• Native support for mutable and immutable values

Memory

x

y

z

NewObject

f

g

2

“A”

object.update(newObject) O(1)

Yoyak : Scala Experience

• Native support for mutable and immutable values

Memory

x

y

z

Object

f

g

1

“A”

NewObject

f

g

2

“A”

If we frequently update immutable objects in a big memory, it may result in severe inefficiency

Yoyak : Scala Experience

• Excellent support for parallelization

• Static analysis does not sufficiently utilize today’s advancement of computing scalability (multicore machines, big data technologies, cloud computing)

• Scala has a perfect platform to experiment parallelization which called Akka

• Many fun things to try with Yoyak powered by Akka

Yoyak : Scala Experience• Excellent support for parallelization

Worklist Parallelization can be naturally

implemented by Akka’s Actor model

Yoyak : Roadmap

• Add more built-in abstract domains

• Optimize analysis performance

• Visualize analysis details

• Build Scala compiler plug-in

Yoyak : Roadmap

• Add more built-in abstract domains

Interval domain cannot represent the relation between two variables

x = [2,8], y = [1,7] produce 49 combinations of (x,y) pairs

100 1 2 3 4 5 6 7 8 9

10

0

1

2

3

4

5

6

7

8

9

X Axis

Y A

xis

Yoyak : Roadmap

• Add more built-in abstract domains

Octagon domain can represent the relation between two variables

100 1 2 3 4 5 6 7 8 9

10

0

1

2

3

4

5

6

7

8

9

X Axis

Y A

xis

http://www.di.ens.fr/~mine/publi/article-mine-HOSC06.pdf

Yoyak : Roadmap

• Add more built-in abstract domains

2-interval domain is more precise than interval domain

100 1 2 3 4 5 6 7 8 9

10

0

1

2

3

4

5

6

7

8

9

X Axis

Y A

xis

Yoyak : Roadmap

• Optimize analysis performance

• {Worklist, Method, Class}-level parallelization

• Reduce abstract memory size by removing unused variables (faster join operation for abstract memory)

• Optional faster but unsound analysis

Yoyak : Roadmap

• Visualize analysis details

It is hard to know what a static analyzer is doing at a specific moment because…

• Static analyzer’s behavior is very different for each input program

• Often need to inspect and compare a map with thousands of entries

• Unable to look over the big picture by ordinary Java debuggers

Yoyak : Roadmap

• Visualize analysis details

Example from SAT solvers

Visualization of the search tree generated by a basic DPLL

algorithm

DPVis

Yoyak : Roadmap

• Build Scala compiler plug-in

• Programming language researchers foresee that the semantic program analyzer will be merged with compiler systems in the near future as the type system did

Syntactic Analysis Grammar Checking Type System Semantic Analysis

Yoyak : Roadmap

• Build Scala compiler plug-in

• Scala compiler is well modularized, cleanly coded (as compared to other compiler systems), so it is an excellent platform for experimenting new ideas

• Pure Scala code is safe from null, however linked Java libraries are not

• It would be great if Scala compiler can detect possible null dereferences at a compile time and issue a warning

Thank you!

Further Questions,

ScalaDays 2015

twitter @heejongl

gmail heejong@gmail.com

top related