Static Analysis of Memory Errors Mooly Sagiv Tel Aviv University
Feb 02, 2016
Static Analysis of Memory Errors
Mooly Sagiv
Tel Aviv University
Project Goals
• Statically determine that data are used in a sound way
• No unexpected software behavior• In C
• No undefined semantics (ANSI C)• Prevent bad programming styles
• In Java• Certain exceptions will never be raised
• Sound analysis• Minimal false alarms
Sample Cleanness Problems
1. C String related errors1. Unsafe calls to strcpy(), strcat()…
2. Out of bound references
3. Pointer arithmetic
2. Java interface requirements for library usages
Nurit Dor & Greta Yorsh
http://www.cs.tau.ac.il/~nurr
Are String Violations Common?
FUZZ study (1995)• Random test programs on various
systems 9 different UNIX systems 18% – 23% hang or crash 80% are string related errors
CERT advisory• 50% of attacks are abuses of buffer
overflows
Example – unsafe call to strcpy()
simple(){
char s[20];char *p;char t[10];
strcpy(s,”Hello”);p = s + 5;strcpy(p,” world!”);strcpy(t,s);
}
Example – unsafe call to strcpy()
simple(){
char s[20];char *p;char t[10];
strcpy(s,”Hello”);p = s + 5;strcpy(p,” world!”);strcpy(t,s);
}
cleanness is always violated:alloc(t) = 10
len(s) = 12
Example – unsafe pointer arithmetic
/* from web2c [strpascal.c] */
void null_terminate(char *s)
{
while ( *s != ‘ ‘ )
s++;
*s = 0;
}
Example – unsafe pointer arithmetic
/* from web2c [strpascal.c] */
void null_terminate(char *s)
{
while ( *s != ‘ ‘ )
s++;
*s = 0;
}
Cleanness is potentially violated:offtset(s) =alloc(buff(s))
Complicated Example
/* from web2c [fixwrites.c] */
#define BUFSIZ 1024
char buf[BUFSIZ];
char insert_long(char *cp)
{
char temp[BUFSIZ];…
for (i = 0; &buf[i] < cp ; ++i)
temp[i] = buf[i];
strcpy(&temp[i],”(long)”);
strcpy(&temp[i+6],cp);…
cp
buf
(long)temp
Complicated Example
/* from web2c [fixwrites.c] */
#define BUFSIZ 1024
char buf[BUFSIZ];
char insert_long(char *cp)
{
char temp[BUFSIZ];…
for (i = 0; &buf[i] < cp ; ++i)
temp[i] = buf[i];
strcpy(&temp[i],”(long)”);
strcpy(&temp[i+6],cp);…
cp
buf
( l o n g )temp
Cleanness is potentially violated:7 + offset (cp) BUFSIZ
Complicated Example
/* from web2c [fixwrites.c] */
#define BUFSIZ 1024
char buf[BUFSIZ];
char insert_long(char *cp)
{
char temp[BUFSIZ];…
for (i = 0; &buf[i] < cp ; ++i)
temp[i] = buf[i];
strcpy(&temp[i],”(long)”);
strcpy(&temp[i+6],cp);…
cp
buf
(long)temp
Cleanness is potentially violated:offset(cp)+7 +len(cp) BUFSIZ7 + offset (cp) < BUFSIZ
Vulnerable String Manipulation
Pointers to buffers char *p= buffer; … while( ) p++;
Standard string manipulation functions
strcpy(), strcat(), … NULL termination
strncpy(), …
C Static String Verifier (CSSV) Objectives
• Modular analysis– Procedure pre-condition/post-condition/mod
• Automatically generate procedure specification • Handle full C
– Multi-level pointers
– Structures
• Reduce complexity of transformation– Linear in the number of variables
CSSV
Cfiles
Procedure’sPointer info
Pointer Analysis
C2IP
PreModPost Integer Proc
Integer AnalysisPotential
Error Messages
Procedure name
Cfiles
AWP
Advantages of Procedure Specification
• Modular analysis – Not all the code is available– Enables more expensive analyses
• User control of the verification– Detect errors at point of logical error– Improve the precision of the analysis– Check additional properties
• Beyond ANSI-C
Specification and Soundness
• All errors are detected
• Violation of procedure’s precondition– Call
• Violation of procedure's postcondition– Return
• Violation of statement’s precondition– …a[i]…
char* strcpy(char* dst, char *src)requires
modensures
Specification – strcpy
( string(src) alloc(dst) > len(src))
len(dst), is_nullt(dst)
( len(dst) = = pre@len(src) return = = pre@dst)
Specification – insert_long()/* insert_long.c */#include "insert_long.h" char buf[BUFSIZ];char * insert_long (char *cp) { char temp[BUFSIZ]; int i; for (i=0; &buf[i] < cp; ++i){ temp[i] = buf[i]; } strcpy (&temp[i],"(long)"); strcpy (&temp[i + 6], cp); strcpy (buf, temp); return cp + 6; }
char * insert_long(char *cp) requires( string(cp)
buf cp < buf + BUFSIZ
) mod cp.strlen ensures ( len(cp) = = pre[len(cp) + 6]
return_value = = cp + 6 ;
)
CSSV
Cfiles
Pointer Analysis
C2IP
PreModPost Integer proc
Integer AnalysisPotential
Error Messages
Procedure name
Cfiles
AWP
Procedure’sPointer info
CSSV
Cfiles
Pointer Analysis
C2IPside effect
ModInteger proc
LeafProcedure
Cfiles
AWP
Pre
Procedure’sPointer info
CSSV
Cfiles
PreMod
LeafProcedure
Cfiles
Integer AnalysisPotential
Error Messages
Post
Pointer Analysis
C2IP
Integer proc
Procedure’sPointer info
char * insert_long (char *cp) {
char temp[BUFSIZ]
int i
require string(cp);
for(i=0; &buf[i] < cp; ++i) { temp[i]=cp[i]; }
assert(0 i < 6 - stemp.msize );assume(stemp.len == i + 6);…
int cp.offset;int temp.offset = 0; int stemp.msize = BUFSIZ; int stemp.len ; int stemp.is_nullt;
int i
assume(sbuf.is_nullt 0 cp.offset sbuf.len sbuf.alloc );
for (i=0; i< cp.offset ; ++i ) { assert(0 i stemp.msize (stemp.is_nullt i stemp.len)); assert(-i cp.offset< -i +sbuf.len); if (sbuf.is_nullt sbuf.len == i ) { stemp.len = i; stemp.is_nullt = true; } else …
strcpy(&temp[i],"(long)");
C2IP
AWP
• Approximate the Weakest Precondition
• Backward integer analysis
• Generates a precondition
AWP – insert_long()
• Generate the following precondition:string(cp) len(buf) offset(cp) + 1017
Not the weakest precondition:string(cp) len(buf) 1017
Implementation
• Using:– ASToolKit [Microsoft]
– GOLF [Microsoft – Manuvir Das]
– New Polka [IMAG - Bertrand Jeannet]
• Main steps:– Simplifier
– Pointer analysis
– C2IP
– Integer Analysis
Preliminary results (web2C)
Proc line coreCline
time(sec)
space(Mb)
errors FA
insert_long 14 64 2.0 13 2 0
fprintf_pascal_string 10 25 0.1 0.3 2 0
space_terminate 9 23 0.1 0.2 0 0
external_file_name 14 28 0.2 1.7 2 0
join 15 53 0.6 5.2 2 1
remove_newline 25 105 0.6 4.6 0 0
null_terminate 9 23 0.1 0.2 2 0
Up to four times faster than SAS01
Preliminary results (EADS/RTC_Si)
Proc line coreCline
time(sec)
space(Mb)
errors FA
FiltrerCarNonImp 19 34 1.6 0.5 0 0
SkipLine 12 42 0.8 1.9 0 0
StoreIntInBuffer 37 134 7.9 21 0 0
The Canvas Project Component ANnotation, Verification
And Stuff
J. Field D. Goyal.
G. Ramalingam
http://www.research.ibm.com/menage/canvas
IBM Research
The problem
• Class libraries and software components are supposed to– make building complex applications from "parts" easier– make a market for pre-packaged code...
• ...but in practice– programming with components is hard
• inadequate documentation• lack of source code• increased API complexity (to allow for customization)
• Programmers often resort to iterative trial-and-error methods to get components to work in their application
Canvas Goals
• The component designers specify component conformance constraints
• Develop automated certification tools to determine whether the client satisfies the component's conformance constraints
• focus on JavaTM libraries and JavaBeansTM
Our Approach
• Specify component behavior in a Java like language (EASL)
• Use TVLA for statically analyzing Java heap
• Specialize the algorithm for the component
The Concurrent Modification Problem(PLDI’02 Berlin)
• Static analysis of Java programs manipulating Java 2 collections
• Inconsistent usages of iterators– An Iterator object i defined on a collection
object c
– No use of i may be preceded by update to the contents of c, unless the update was also made via i
class Make { private Worklist worklist; public static void main (String[] args) { Make m = new Make(); m.initializeWorklist(args); m.processWorklist(); } void initializeWorklist(String[] args) { ...; worklist = new Worklist(); ... // add some items to worklist} void processWorklist() { Set s = worklist.unprocessedItems(); for (Iterator i = s.iterator(); i.hasNext()){ Object item = i.next(); if (...) processItem(item); } } void processItem(Object i){ ...;
doSubproblem(...);} void doSubproblem(...) { ... worklist.addItem(newitem); ... }}
public class Worklist { Set s; public Worklist() {. ..; s = new HashSet(); ... } public void addItem(Object item) { s.add(item); } public Set unprocessedItems() { return s; }}return rev; }
EASL Specificationclass Collection { Version version; Collection() { version = new Version(); } boolean add(Object o) { version = new Version(); } Iterator iterator() { return new Iterator(this); }}
class Iterator { Collection set; Version definingVersion; Iterator (Collection s){ definingVersion = s.version; set = s; } void remove() { requires (definingVersion == set.version); set.ver = new Version(); definingVersion = set.version; } Object next() { requires (definingVersion == set.version); }
class Version {}
Prototype
Three Value Logic Analyzer
Analysis result Potential cleanness violations
JavaJava
actiondefinition
actiondefinition
Soot JimpleAST
JimpleAST
CFG +
actions
CFG +
actions
J2TVPTranslator
Specialize
EASL
Empirical Results
Benchmark Loc Err. FA Time
(sec)
Space
(MB)
Structs.
Kernel 683 15 0 60 19 4363
MapTest 335 1 0 61 20 4937
Iterator
Test
126 0 0 0.23 4 208
JFE 2896 1 1 236 49 9878
Conclusion
Ambitious sound analyses Very few false alarmsScaling is an issue
– Use staged analyses– Use modular analysis– Use encapsulation