Top Banner
IBM Haifa Research Lab: Software Asset Management Group IBM PLE Seminar 2005 © 2005 IBM Corporation Safety Checks and Semantic Understanding via Program Analysis Techniques Nurit Dor Joint Work: EranYahav, Inbal Ronen, Sara Porat
22

Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

Jan 22, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

IBM PLE Seminar 2005 © 2005 IBM Corporation

Safety Checks and Semantic Understanding via Program Analysis

Techniques Nurit Dor

Joint Work: EranYahav, Inbal Ronen, Sara Porat

Page 2: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Goal

� Find properties of a program � Anti-patterns that indicate potential bugs� Semantic-patterns that have a meaning of interest

� Technology� Lightweight specifications� Conservative (sound) static analysis � Combing static and dynamic analyses

� Challenges� Scale to real programs� Produce a reasonable number of false positives� Utilizing dynamic�information as much as possible

Page 3: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

public class SimpleExample1 {

public static void main(String[] args) {

FileComponent f1 = new FileComponent();

foo(f1);

...

bar(f1);

}

public static void foo(FileComponent f) {

...

f.close();

...

}

public static void bar(FileComponent f) {

...

f.read(); ...

}

}

“Finding Bugs is Easy”

FileComponent f = new FileComponent();

f.close();

f.read();

Not really…

Page 4: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

private Connection getConnection (){

if (…)

return DriverManager.getConnection(DBUrl);

else {

Context initial = new InitialContext();

DataSource dataSource = (DataSource) initial.lookup(DSName);

return dataSource.getConnection();

}

}

public void execute(String query){

Connection conn = getConnection();

Statement stmt = conn.createStatement();

stmt.execute(query); // which DB and table is accessed?

}

Understanding program dependency is easy?

Not really…

Page 5: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Finding Properties is Hard

� Handling “non-local” properties

� Interprocedural analysis

� Producing a reasonable number of false positives

� “Not finding non-bugs” is hard

� Correlating statements , e.g. which SQL statement relates to which database connection

� Inferring values and not just control and data flow

� Determine values that can occur at runtime

� Scaling to real programs

Page 6: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Agenda

� Motivation

� IBM Research Projects

� CARDS

� SAFE� Pattern Language : Specifying properties

� Typestate Algorithms : Identifying properties instances

� Inferring pointer aliasing

� Handling multiple objects� Combing Static and Dynamic Analyses

Page 7: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

CAPA – Common Architecture for Program Analysis

� IBM Research cross lab project

� Goal: A program analysis infrastructure effort to help Research

� quickly create software lifecycle applications that exploit various flavors of program analysis

� foster sharing and collaboration between groups

� speed technology transfer to product groups

Page 8: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

CARDS (Combining Analyses: Runtime, Dynamic and Static) HRLGoal: End-To-End Impact analysisWhat happens if a change a database table?

Page 9: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

� Scalable and flexible error-detection (“bug finding”) and verification

• Detecting violations of simple correctness properties

• Verify the absence of these properties

� Wide range of techniques

• Detect common bug patterns based on XML representation of a program

• Integrated pointer-analysis and Interprocedural typestate checking

� More precise than existing tools (=less false alarms)

� Experimental version deployed to early adopters within IBM SWG

Watson Research project

Page 10: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation IBM PLE Seminar 2005

Agenda

� Motivation� IBM Research Projects

� CARDS� SAFE

�Pattern Language : Specifying properties� Typestate Algorithms : Identifying property instances

� Inferring pointer aliasing� Handling multiple objects

� Combing Static and Dynamic Analyses

Page 11: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation IBM PLE Seminar 2005

Specifying pattern

� Used for modeling runtime properties of interest� Sequences of method invocations that have a specific semantic meaning

• Data flow relationships– Result of one invocation is the target/parameter of a second invocation

• Control flow relationships– Order may or may not be meaningful

• Some method invocations are semantically equivalent – Usage of abstract patterns and inheritance

� List of values (parameters, return values,..) to resolve

� Patterns are written in XML and converted into an automata

� A pattern instance is a set of specific method calls that are detected in the code.� The same method call can be part of several pattern instances (of the same

or different pattern)

Page 12: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

public class SimpleExample1 {

public static void main(String[] args) {

FileComponent f1 = new FileComponent();

foo(f1);

...

bar(f1);

}

public static void foo(FileComponent f) {

...

f.close();

...

}

public static void bar(FileComponent f) {

...

f.read(); ...

}

}

Finite State Automata

FileComponent may not be read after being closed

open closed err

close readread

*

close

Page 13: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Finite State Stack Automata

Identify database and table access statements

init connected executed

res = DriverManager.getConnection (s)

DBName := sconn := res

gotStat

private Connection getConnection (){

return DriverManager.getConnection(DBUrl);

}

public void execute(String query){

Connection conn = getConnection();

Statement stmt = conn.createStatement();

stmt.execute(query);

}

res = Connection.createStatement ()

conn == targetstat := res

Statement.execute(s)

stat ==� targetSQLstmt := s

Page 14: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Agenda

� Motivation� IBM Research Projects

� CARDS� SAFE

� Pattern Language : Specifying properties�Typestate Algorithms : Identifying property instances

� Inferring pointer aliasing� Handling multiple objects

� Combing Static and Dynamic Analyses

Page 15: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

TypeState Base algorithm – Single Objects

� Based on flow insensitive global pointer analysis

� Concrete objects are represented by a finite set of abstract objects, e.g. for each allocation site

� Iterative algorithm that tracks <o, state>

� Each object is handled separately

� Handles pointer aliasing conservatively, i.e. weak-updates

f1.close…

f2.read

<o, open>

open closed err

close readread

*

close

<o, open>, <o,close>

<o, open>,<o,err>

Page 16: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

TypeState Uniqueness algorithm

� Compute which abstract objects may represent at most one runtime object

� If a pointer may only point to a single unique abstract object, perform a strong update

f1.close

f2.read

<o, open>f1

f2

<o,close>

<o,close>,<o,err>

o

o’

Page 17: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

TypeState with Access Path

� Track which access paths are definitely pointing to the tracked abstract object

� perform strong update

f2 = f1

f1.close

f2.read

<o, open, {f1}>

f1

f2

o

o’

<o, open, {f1, f2}>

<o, close, {f1, f2}>

<o, err, {f1, f2}>

Page 18: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

TypeState for Multiple Objects

� Track memory < {conn = o, stat = o’}, typestate>

� On Statements

� Check precondition

� Update Memory

init connected executed

res = DriverManager.getConnection (s)

DBName := sconn := res

gotStat

……

res = Connection.createStatement ()

conn == targetstat := res

Statement.execute(s)

stat ==� targetSQLstmt := s

Page 19: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Agenda

� Motivation� IBM Research Projects

� CARDS� SAFE

� Pattern Language : Specifying Properties� Type state Algorithms : Identifying property instances

� Inferring pointer aliasing� Handling multiple objects

�Combing static and dynamic analyses

Page 20: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Inferring values by utilizing dynamic information

� For some properties data values are of interest� Sparsely log execution (data and control) of a set of

predefined method invocations� Methods indicated by the properties� Common external input methods

� Correlate runtime method invocation to the source code according to level of existing monitoring precision� Caller-callee � Line number� Byte code offset

Page 21: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Static and Dynamic Combination

� Execute the program and obtain log files of method invocations� Statically perform typestate algorithm

� Report pattern instances

� Statically perform data value flow of static and dynamic values� Report all possible values that may reach program points of interest

� Report pattern instances with values� Limitations

� May report values on a program point that can never reach this point � Is not (and can never) be sound� May lose precision due to the two phase approach: typestate and

value resolution

Page 22: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation�� IBM PLE Seminar 2005

Empirical results

� CARDS dependency analysis� Detects database accesses on J2EE and Java applications� Infers call graph from dynamic logging

� Safe error detection� Verifies usages of Socket, Vector, Iterator,..� Scaling is good: ~10min for 100,000LOC� Best Typestate checking algorithm verifies 95.6% of candidate statements

(i.e. may reach an error state)� False alarms are due to

• imprecision in pointer aliasing• Logic of the program implies the safety, e.g. a flag indicating if a vector is

empty of not