Top Banner
Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis
40

Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Dec 17, 2015

Download

Documents

Andrew Reeves
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Learning about Bugs in Systems

Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis

Page 2: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bugs and bug finding techniques

Page 3: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bugs

• Bugs are software defects that cause– Incorrect behaviour– Unintended behaviour

• Bugs are expensive – Development costs– Maintenance costs– Repair/recovery costs– Downtime– Sometimes lethal

Page 4: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bug causes

• Design flaws

• Errors in source

• Mistakes in trans. from design -> source

• Unforeseen interactions with users

• Unforeseen interactions with other components

Page 5: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bug Taxonomy

• Resource Management Bugs– Memory

• Leak, stack smashing, access violations, NULL dereference, using uninitialised memory

– File/Socket • Leak, double close, etc.

Page 6: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Examples

• //stack buffer overun for sizes greater than 42   void stack_buf( void* src, int size ) {     char buffer[42];    memcpy( buffer, src, size );  }

• //null dereference• public class nullDeref {

    public static void main(String argv []) {    MyObject o = null;     BufferedReader keyb = new BufferedReader;     String inp = keyb.readLine();     if( inp != "quit" ){       o.construct( inp );     }     System.out.println(o.toString());   }

Page 7: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bug Taxonomy cont’d

• Concurrent Bugs– Data race– Inconsistent synchronisation– Deadlock– Orphaned Threads– Livelock

Page 8: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Example

• //Data race• int cnt = 0;

• void thread1() {• int y = cnt;• cnt = y + 1;• }• void thread2() {• int y = cnt;• cnt = y + 1;• }

Page 9: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bug Taxonomy

• Semantic Bugs– Bugs that contradict programmer intent

– Often specific to APIs and domains• Eg: not releasing an acquired lock

Page 10: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Examples

• /* from sys/kern/disk.c*/int sys_disk_request( u_int sn, struct Xn_name *xn_user, struct buf* reqbp, u_int k )    if( reqbp->bflags & B_SCSICMD )       return sys_disk_scsicmd( sn, k, reqbp );

• //SQL injection• public void authenticate(HttpServletRequest request){

    String username = request.getParameter("user");    java.sql.Statement stmt = con.createStatement();     String query = "select * from users where username = ’" + username + "’and password = ’" + pwd + "’";    stmt.execute(query); ... // process the result of SELECT   }

Page 11: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bug Taxonomy cont’d

• Arithmetic bugs– Divide by 0, floating point errors, range wraps,

truncation errors

• Peformance bugs– Copy instead of mmap

• Misc– Off by one, Short circuiting

conditionals ,unhandled exceptions

Page 12: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Finding bugs

• Lexical Analysis (source code based0– Lint, splint, clint, jlint, etc

• Semantic Analysis (AST/IR based)– ASTLog, CodeQuest, JQuery, etc

• Dynamic Runtime Analysis– Purify, Valgrind, Dynamine

• Program Trace Analysis– PQL, PTQL

Page 13: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bug patterns

• Bugs found by error patterns• Eg. Null dereference pattern

public class nullDeref {     public static void main(String argv []) {    MyObject o = null;     BufferedReader keyb = new BufferedReader;     String inp = keyb.readLine();     if( inp != "quit" ){       o.construct( inp );     }     System.out.println(o.toString());   }

Page 14: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bug patterns

• CVS Logs– Check revised code for patterns

• Bug reports/databases– Match bug report to source code

• Statistical Anal. to extract patterns– Pattern A is correlated to bug X

• Inferred from domain/api

Page 15: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bug indicators

• Associate with poorly written code– Sections with dead/redundant code may

indicate errors

• Associate with style violations– Eg: failing to put constant on left side of

assignment may be (but not necessarily) error

Page 16: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

~Bug Patterns

• Infer models of proper behaviour. Violations flagged as possible bugs– Ammons et al: Mining Specifications– Lam, et al: Dynamine (revision history

mining)– Engler et al: Bugs as deviant behaviour

Page 17: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Case: Linux Kernel

Page 18: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Linux kernel overviewSources: www.coverity.com

www.kernel.org

Latest version: 2.6.15.4 $ uname –a /usr/src/kernels

Page 19: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bug distribution

Linux kernels: 0.17 bugs per 1000 lines of codeCommercial software: 20-30 bugs per 1000 lines of code

Page 20: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Files and functions in linux kernel

Version 2.6.15.4 has 16,249 files

Page 21: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Bug types

• General bug types: (version 2.6.9)– Forward null - null pointer dereference

– Reverse null - null pointer dereference or spurious null check

– Dead code - unused code due to logic flaws

– Overrun static - buffer overrun in the stack

– Resource leak - leak of memory or other system resources

– Free - use of resources that are no longer available

– Null returns - null pointer dereference

– Negative returns - buffer overrrun using a negative offset in stack/heap – Reverse negative - negative returns or spurious check against a negative

value– Overrun dynamic - buffer overrun in the heap

Page 22: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Security defects

– Tainted scalar – unsafe usage of tainted scalars – Tainted string – unsafe usage of tainted string – User pointer – unsafe dereference of user-land pointers – String null – unsafe usage of tainted, potentially non-terminated

strings – String size – unsafe usage of tainted strings with a potentially

unbounded size

– Overrun static – buffer overrun on stack

– Overrun dynamic – buffer overrrun on heap

Page 23: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Future bugs

– Most of detected bugs have been fixed– Linux is growing, there are more and more

contributors– New code, new bugs– Distributed development could decrease

the overall security

Page 24: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Two Case Studies of Open Source Software Development:

Apache and Mozilla

AUDRIS MOCKUSAvaya Labs Research

ROY T FIELDINGDay Software

andJAMES D HERBSLEB

Carnegie Mellon University

Page 25: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

• Apache httpd: open source http server• Mozilla: open source browser including

supporting tools (such as Bugzilla)• Compared to 5 commercial projects

related to telecommunications• Data extracted using scripts (with

manual intervention) from CVS, Bug tracking system and email lists

Page 26: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Apache Code Contribution

Page 27: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Mozilla Code Contribution

Page 28: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Mozilla Modules, Apache and Commercial Projects Comparison

Page 29: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Apache Fixes

Page 30: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Apache Defect Density

Page 31: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Mozilla, Apache and Commercial Defect Density

Page 32: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Summary

• Many developers (~400)• Small core contribute most of code

development (Apache: 15 contribute over 80%)

• High participation of non-core team regarding fixes (Apache: core responsible only for 66%)

• Relatively low defect density

Page 33: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

ML related work and OS error characteristics

Page 34: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

DynaMine• Problem:

– Want to find patterns whose violation causes errors– Want to find patterns for program understanding

• Technique:– Look at revision histories

• Crucial observation:

• Use data mining techniques to find methods that are often added at the same time

• Mining uses the Apriori alg to compute patterns.• Patterns found: method pairs, state machines, more complex

Things that are frequently checked in together often form a

pattern

Page 35: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Cluster Filtering

• Problem: when beta testing software, inputs that cause errors are sparse.

• Idea: cluster execution profiles and select test cases based on clusters.

• Non-faulty profiles will cluster together, so more likely to pick faulty profiles if sample clusters.

• Profile vector = function caller(ee) count (large vector).

Page 36: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Fault Invariant Classifier

• Finds latent code errors by learning over program executions

• Targeted programs:– Test inputs easy to generate, outputs hard

to compute (e.g. complex comp, GUI’s)

• 2 steps: create model and classification

Page 37: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.
Page 38: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

OS error characteristics

• 2001 study showed:• Most OS errors are in drivers• Errors increase as functions become larger.• Distribution: few files have several errors, long tail

has just one or two• Bugs live on average 1.8 years between versions.• Errors cluster where programmer ignorance of

interface or system rules combines with copy-and-paste.

Page 39: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.

Discussion

• Extracting the right program properties is still an art – predicates, invariants, functions calls for dynamic

analysis– Cvs transactions for static analysis

• What do you need for training? Labels?• What features are good for learning bugs?• Do we need to run the code?

Page 40: Learning about Bugs in Systems Laune Harris, Le Hoang Anh, Guy Lichtman, Nikos Michalakis.