Top Banner
Parsing and Type checking all 2 10000 configurations of the Linux kernel 10,000 features, 6 million lines of C code Christian Kästner
68

Parsing and Type checking all 2^10000 configurations of the Linux kernel

Jul 09, 2015

Download

Technology

chk49

In many projects, lexical preprocessors are used to manage different configurations of the project with conditional compilation. Unfortunately, while being a simple way to implement variability, conditional compilation and lexical preprocessor macros hinder automatic analysis, even though such analysis is urgently needed to combat variability-induced complexity. To analyze code with its variability, we need to parse it without preprocessing it. However, current parsing solutions use unsound heuristics, support only a subset of the language, or suffer from exponential explosion. We introduce a novel variability-aware parser that can parse unpreprocessed code without heuristics. On top of parsing, which detects syntax errors in all configurations, we have constructed a variability-aware type system and module system that additionally detect other compiler-time errors. With this infrastructure, we are in the process of checking the entire Linux kernel with 10000 compile-time configuration options and, hence, up to 2^10000 configurations for syntax and type errors.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Parsing and Type checking all 210000 configurations

of the Linux kernel

10,000 features, 6 million lines of C code

Christian Kästner

Page 2: Parsing and Type checking all 2^10000 configurations of the Linux kernel
Page 3: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Feature-Oriented

Product Lines

Page 4: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Database Engine

Page 5: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Printer Firmware

Page 6: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Linux Kernel

Page 7: Parsing and Type checking all 2^10000 configurations of the Linux kernel

7

Page 8: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Software Product Lines in Industry Boeing

Bosch Group Cummins, Inc. Ericsson General Dynamics General Motors Hewlett Packard Lockheed Martin Lucent NASA Nokia Philips Siemens …

Page 9: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Variability ~ Complexity

Page 10: Parsing and Type checking all 2^10000 configurations of the Linux kernel

a unique configuration for every

person on this planet

33 features optional, independent

Page 11: Parsing and Type checking all 2^10000 configurations of the Linux kernel

320 features

more configurations than estimated

atoms in the universe

optional, independent

Page 12: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Correctness?

Page 13: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Product-Line Implementation

Page 14: Parsing and Type checking all 2^10000 configurations of the Linux kernel

static int _rep_queue_filedone(

DB_ENV *dbenv,

REP *rep,

__rep_fileinfo_args *rfp) {

#ifdef NO_QUEUE

COMPQUIET(rep, NULL);

COMPQUIET(rfp, NULL);

return (__db_no_queue_am(dbenv));

#else

db_pgno_t first, last;

u_int32_t flags;

int empty, ret, t_ret;

#ifdef DIAGNOSTIC

DB_MSGBUF mb;

#endif

// over 100 lines of add. code

}

#endif

Conditional Compilation

Excerpt from Oracle’s Berkeley DB

Page 15: Parsing and Type checking all 2^10000 configurations of the Linux kernel

static int _rep_queue_filedone(

DB_ENV *dbenv,

REP *rep,

__rep_fileinfo_args *rfp) {

#ifdef NO_QUEUE

COMPQUIET(rep, NULL);

COMPQUIET(rfp, NULL);

return (__db_no_queue_am(dbenv));

#else

db_pgno_t first, last;

u_int32_t flags;

int empty, ret, t_ret;

#ifdef DIAGNOSTIC

DB_MSGBUF mb;

#endif

// over 100 lines of add. code

}

#endif

#ifdef X

void foo();

#endif

void bar() {

foo();

}

Conditional Compilation

Excerpt from Oracle’s Berkeley DB

Page 16: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Objections / Criticism

“#ifdef considered harmful” “#ifdef hell”

Designed in the 70th and hardly evolved since

“preprocessor diagnostics are poor”

“is difficult to determine if the code being viewed is actually compiled into the system”

“programming errors are easy to make and difficult to detect”

“incomprehensible source texts”

“maintenance becomes a ‘hit or miss’ process”

“CPP makes maintenance difficult”

“source code rapidly becomes a maze”

Page 17: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Virtual Separation of Concerns

Views Visual representation Disciplined mapping

Consistency checking Refactorings

open source, fosd.net/cide

[ICSE’08,ASE’08,Tools‘09,GPCE’09,EASE‘11,..]

Page 18: Parsing and Type checking all 2^10000 configurations of the Linux kernel

10,000 features, 6 million lines of C code

Page 19: Parsing and Type checking all 2^10000 configurations of the Linux kernel

apache, berkely db, cherokee, clamav, dia,

emacs, freebsd, gcc, ghostscript, gimp, glibc, gnumeric, gnuplot, irssi,

libxml, lighttpd, linux, lynx, minix, mplayer, mpsolve,

openldap, opensolaris, openvpn, parrot, php, pidgin,

postgresql, privoxy, python, sendmail, sqlite, subversion,

sylpheed, tcl, vim, xfig, xine-lib, xorg-server, xterm

[ICSE’10, AOSD‘11]

40 Open-Source C Projects N

um

be

r o

f fe

atu

res

Lines of code

Page 20: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Correctness?

Page 21: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Printer Firmware

Page 22: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Checking Products

2000 Features 100 Printers 30 New Printers per Year

Printer Firmware

Page 23: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Checking Products

10000 Features 210000 Configurations

Linux Kernel

Page 24: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Linux Kernel

Checking Product Line

Implementation with 10000 Features

+ Generator

Page 25: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Variability-Aware Analysis Parser Type System Static Analysis Bug Finding Testing Model Checking Theorem Proving …

Page 26: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Variability-Aware Analysis

Product Generation

Conventional Analysis

Page 27: Parsing and Type checking all 2^10000 configurations of the Linux kernel

We aim for a sound and complete approach

Page 28: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Variability-Aware Analysis Parser Type System Static Analysis Bug Finding Testing Model Checking Theorem Proving …

Page 29: Parsing and Type checking all 2^10000 configurations of the Linux kernel
Page 30: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Conflicts

References

Page 31: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Pre

sence C

onditio

ns

true

true

WORLD

BYE

Page 32: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Reachability: pc(caller) -> pc(target) Conflicts: ¬(pc(def1) ˄ pc(def2))

true -> true

true -> (WORLD v BYE)

¬ (WORLD ˄ BYE)

Page 33: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Reachability: pc(caller) -> pc(target) Conflicts: ¬(pc(def1) ˄ pc(def2))

true -> true

true -> (WORLD v BYE)

¬ (WORLD ˄ BYE)

Found 2 type errors: - [WORLD & BYE] file hello.c:8:8 redefinition of msg - [!WORLD & !BYE] file hello.c:11:8 msg undeclared

Page 34: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Variability Model:

VM -> (true -> true)

VM -> (true -> (WORLD v BYE))

P

WORLD BYE

VM ->¬ (WORLD ˄ BYE)

Page 35: Parsing and Type checking all 2^10000 configurations of the Linux kernel

35 BYE WORLD

AST with Variability Information

Extended Lookup Mechanism

greet.c

printf

msg

VWORLD VBYE printf

msg msg

main

ε ε

true -> true

true -> (WORLD v BYE) ¬ (WORLD ˄ BYE)

Page 36: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Formalization: CFJ [ASE’08, TOSEM‘11]

Theorem (Product Generation Preserves Typing): All products that are generated for valid feature selections from a well-typed product line are well typed.

Page 37: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Variability-Aware Analysis

Product Generation

Conventional Analysis

Page 38: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Surface Complexity

SAT Problem

Inherent Complexity

Page 39: Parsing and Type checking all 2^10000 configurations of the Linux kernel

[TOSEM‘11]

Product Line LOC Features Products Time per Product

(sec)

Time f. entire

Product Line (sec)

MobileMedia 5700 14 2784 0.3 2

Mobile RSS Reader

20 000 14 2048 1 8

Lampiro 45 000 11 2048 2 19

Berkeley DB 70 000 42 3.6 billion 3 21

Page 40: Parsing and Type checking all 2^10000 configurations of the Linux kernel

apache, berkely db, cherokee, clamav, dia,

emacs, freebsd, gcc, ghostscript, gimp, glibc, gnumeric, gnuplot, irssi,

libxml, lighttpd, linux, lynx, minix, mplayer, mpsolve,

openldap, opensolaris, openvpn, parrot, php, pidgin,

postgresql, privoxy, python, sendmail, sqlite, subversion,

sylpheed, tcl, vim, xfig, xine-lib, xorg-server, xterm

[ICSE’10, AOSD‘11]

40 Open-Source C Projects va

riab

le c

od

e in

C f

iles

(in

%)

Lines of code

Page 41: Parsing and Type checking all 2^10000 configurations of the Linux kernel
Page 42: Parsing and Type checking all 2^10000 configurations of the Linux kernel

.c

.c

.c

Parse Type Check Lin

ker checks

greet.c

printf

msg

VWORL

D VBYE printf

msg msg

main

ε ε

greet.c

printf

msg

VWORL

D VBYE printf

msg msg

main

ε ε

greet.c

printf

msg

VWORL

D VBYE printf

msg msg

main

ε ε

Page 43: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Challenges Real-world C code C preprocessor Huge size Module system / linker checks

Page 44: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Variability-Aware Analysis Parser Type System Static Analysis Bug Finding Testing Model Checking Theorem Proving …

Page 45: Parsing and Type checking all 2^10000 configurations of the Linux kernel

AST with Variability Information

greet.c

printf

msg

VWORLD VBYE printf

msg msg

main

ε ε

[OOPSLA‘11]

Page 46: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Parsing C without Preprocessing

Page 47: Parsing and Type checking all 2^10000 configurations of the Linux kernel

greet.c

printf

msg

VWORLD VBYE + printf

+ msg + msg

+ main

ε ε

? ? ?

Macro expansion needed for parsing Alternative macros

Undisciplined annotations

Page 48: Parsing and Type checking all 2^10000 configurations of the Linux kernel
Page 49: Parsing and Type checking all 2^10000 configurations of the Linux kernel
Page 50: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Previous Solutions

Disciplined Subset Requires Code Preparation Heuristics and Partial Analysis Inaccurate, False Positives Brute Force Infeasible Effort

Page 51: Parsing and Type checking all 2^10000 configurations of the Linux kernel

TypeChef

+

* VA

2 4 3 5

Variability-Aware

Parser

Variability-Aware

Analysis

https://github.com/ckaestne/TypeChef

Variability-Aware

Lexer

( 2

* 3

) +

4A 5¬A

Page 52: Parsing and Type checking all 2^10000 configurations of the Linux kernel

3 +

4A 4¬A˄B +¬A˄B 6¬A

(

(¬A )¬A

4A (¬A 4¬A˄B +¬A˄B 6¬A

4¬A˄B +¬A˄B 6¬A

)¬A

+

6 4

VB

6

)

true

4

VA

+

3

Library of Variability-Aware

Parser Combinators in Scala

[OOPSLA‘11]

Page 53: Parsing and Type checking all 2^10000 configurations of the Linux kernel

0 C files (x86)

0 included header files per C file

0 distinct macros per C file

0 % conditional

0 seconds per file (median)

0 syntax errors

7665 353

8590 72 30

0 X86 2.6.33.3

Page 54: Parsing and Type checking all 2^10000 configurations of the Linux kernel

X86 2.6.33.3

Type Checking

20 seconds per file

Page 55: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Type Checking BusyBox

files lines of C code features

minutes parsing minutes type checking seconds linker checking

511 260.000

811

51 6 4

Page 56: Parsing and Type checking all 2^10000 configurations of the Linux kernel

//… skipped 260 lines

struct globals {

double cur_time;

//… skipped 11 lines

#if ENABLE_FEATURE_NTPD_SERVER

int listen_fd;

#endif

unsigned verbose;

//… skipped 73 lines

};

//… skipped 1761 lines

int ntpd_main(int argc UNUSED_PARAM, char **argv)

{

#undef G

struct globals G;

//… skipped 81 lines

if (i > (ENABLE_FEATURE_NTPD_SERVER && G.listen_fd != -1)) {

}

}

ntpd.c: 2128 [CONFIG_NTPD && !CONFIG_FEATURE_NTPD_SERVER] field listen_fd unknown in struct globals

Page 57: Parsing and Type checking all 2^10000 configurations of the Linux kernel

FUTURE DIRECTIONS

Page 58: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Correctness?

Page 59: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Variability-Aware Analysis

Product Generation

Conventional Analysis

Page 60: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Variability-Aware Analysis Parser Type System Static Analysis Bug Finding Testing Model Checking Theorem Proving …

Page 61: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Product-Line

Evolution

615 trillion config. 553 quintillion config.

Page 62: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Reengineering Variability

Legacy System Disciplined Variability

Implementation

#ifdef parameters branches in VCS clones domain knowledge

refactoring

plug-ins feature modules

aspects disciplined annotations

runtime variability

[GPCE’09; Grant Prop.]

Page 63: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Compositional Approaches

class Stack { void push(Object o) { elementData[size++] = o; } ... }

refines class Stack { void push(Object o) { Lock l = lock(o); Super.push(o); l.unlock(); } ... }

Base / Platform

Feature: Queue

Feature: Diagnostic

aspect Diagnostics { ... }

class Stack { void push(Object o) { Lock l = lock(o); elementData[size++] = o; l.unlock(); } ... }

Composition

Module

Components

Frameworks, Plug-ins

Feature-Modules / Mixin Layers / …

Aspects / Subjects, Hyper/J, Deltas

[ICSE’09, J.ASE’10, SCP’10, TSE’12]

Page 64: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Predicting

Nonfunctional Properties [SPLC’11, SQJ’11, ICSE’12]

Page 65: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Empirical methods & human factors

[AOSD’11 ESEM’11, EASE’11]

Page 66: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Domain- Specific

Languages: SugarJ

[GPCE’11, OOPSLA’11]

Runtime Updates for Java

[APSEC’08, J. SP&E’11]

Page 67: Parsing and Type checking all 2^10000 configurations of the Linux kernel

Parsing and Type Checking all 210000 Configurations

of the Linux Kernel

https://github.com/ckaestne/TypeChef

greet.c

printf

msg

VWORLD VBYE printf

msg msg

main

ε ε

… 3 +

4 4¬A˄B +¬A˄B 6¬A

(

( )¬A

4 (

4¬A˄B +¬A˄B 6¬A

4¬A˄B +¬A˄B 6¬A

)¬A

)

true

Page 68: Parsing and Type checking all 2^10000 configurations of the Linux kernel

AOP Compiler Extensions

Aspect-Oriented Decomposition of Berkeley DB

2006 2007 2008 2009 2010 2011 2012 1982…

M.Sc. Thesis

Ph.D. Thesis Post Doc

Austin, Texas Magdeburg, Germany Marburg, Germany

Virtual Separation of Concerns (Tool Support for Annotation-Based

Variability Implementation)

TypeChef (Analyzing Real-World C Code)