Parsing and Type checking all 2 10000 configurations of the Linux kernel 10,000 features, 6 million lines of C code Christian Kästner
Jul 09, 2015
Parsing and Type checking all 210000 configurations
of the Linux kernel
10,000 features, 6 million lines of C code
Christian Kästner
Feature-Oriented
Product Lines
Database Engine
Printer Firmware
Linux Kernel
7
Software Product Lines in Industry Boeing
Bosch Group Cummins, Inc. Ericsson General Dynamics General Motors Hewlett Packard Lockheed Martin Lucent NASA Nokia Philips Siemens …
Variability ~ Complexity
a unique configuration for every
person on this planet
33 features optional, independent
320 features
more configurations than estimated
atoms in the universe
optional, independent
Correctness?
Product-Line Implementation
static int _rep_queue_filedone(
DB_ENV *dbenv,
REP *rep,
__rep_fileinfo_args *rfp) {
#ifdef NO_QUEUE
COMPQUIET(rep, NULL);
COMPQUIET(rfp, NULL);
return (__db_no_queue_am(dbenv));
#else
db_pgno_t first, last;
u_int32_t flags;
int empty, ret, t_ret;
#ifdef DIAGNOSTIC
DB_MSGBUF mb;
#endif
// over 100 lines of add. code
}
#endif
Conditional Compilation
Excerpt from Oracle’s Berkeley DB
static int _rep_queue_filedone(
DB_ENV *dbenv,
REP *rep,
__rep_fileinfo_args *rfp) {
#ifdef NO_QUEUE
COMPQUIET(rep, NULL);
COMPQUIET(rfp, NULL);
return (__db_no_queue_am(dbenv));
#else
db_pgno_t first, last;
u_int32_t flags;
int empty, ret, t_ret;
#ifdef DIAGNOSTIC
DB_MSGBUF mb;
#endif
// over 100 lines of add. code
}
#endif
#ifdef X
void foo();
#endif
void bar() {
foo();
}
Conditional Compilation
Excerpt from Oracle’s Berkeley DB
Objections / Criticism
“#ifdef considered harmful” “#ifdef hell”
Designed in the 70th and hardly evolved since
“preprocessor diagnostics are poor”
“is difficult to determine if the code being viewed is actually compiled into the system”
“programming errors are easy to make and difficult to detect”
“incomprehensible source texts”
“maintenance becomes a ‘hit or miss’ process”
“CPP makes maintenance difficult”
“source code rapidly becomes a maze”
Virtual Separation of Concerns
Views Visual representation Disciplined mapping
Consistency checking Refactorings
…
open source, fosd.net/cide
[ICSE’08,ASE’08,Tools‘09,GPCE’09,EASE‘11,..]
10,000 features, 6 million lines of C code
apache, berkely db, cherokee, clamav, dia,
emacs, freebsd, gcc, ghostscript, gimp, glibc, gnumeric, gnuplot, irssi,
libxml, lighttpd, linux, lynx, minix, mplayer, mpsolve,
openldap, opensolaris, openvpn, parrot, php, pidgin,
postgresql, privoxy, python, sendmail, sqlite, subversion,
sylpheed, tcl, vim, xfig, xine-lib, xorg-server, xterm
[ICSE’10, AOSD‘11]
40 Open-Source C Projects N
um
be
r o
f fe
atu
res
Lines of code
Correctness?
Printer Firmware
Checking Products
2000 Features 100 Printers 30 New Printers per Year
Printer Firmware
Checking Products
10000 Features 210000 Configurations
Linux Kernel
Linux Kernel
Checking Product Line
Implementation with 10000 Features
+ Generator
Variability-Aware Analysis Parser Type System Static Analysis Bug Finding Testing Model Checking Theorem Proving …
Variability-Aware Analysis
Product Generation
Conventional Analysis
We aim for a sound and complete approach
Variability-Aware Analysis Parser Type System Static Analysis Bug Finding Testing Model Checking Theorem Proving …
Conflicts
References
Pre
sence C
onditio
ns
true
true
WORLD
BYE
Reachability: pc(caller) -> pc(target) Conflicts: ¬(pc(def1) ˄ pc(def2))
true -> true
true -> (WORLD v BYE)
¬ (WORLD ˄ BYE)
Reachability: pc(caller) -> pc(target) Conflicts: ¬(pc(def1) ˄ pc(def2))
true -> true
true -> (WORLD v BYE)
¬ (WORLD ˄ BYE)
Found 2 type errors: - [WORLD & BYE] file hello.c:8:8 redefinition of msg - [!WORLD & !BYE] file hello.c:11:8 msg undeclared
Variability Model:
VM -> (true -> true)
VM -> (true -> (WORLD v BYE))
P
WORLD BYE
VM ->¬ (WORLD ˄ BYE)
35 BYE WORLD
AST with Variability Information
Extended Lookup Mechanism
greet.c
printf
msg
VWORLD VBYE printf
msg msg
main
ε ε
…
true -> true
true -> (WORLD v BYE) ¬ (WORLD ˄ BYE)
Formalization: CFJ [ASE’08, TOSEM‘11]
Theorem (Product Generation Preserves Typing): All products that are generated for valid feature selections from a well-typed product line are well typed.
Variability-Aware Analysis
Product Generation
Conventional Analysis
Surface Complexity
SAT Problem
Inherent Complexity
[TOSEM‘11]
Product Line LOC Features Products Time per Product
(sec)
Time f. entire
Product Line (sec)
MobileMedia 5700 14 2784 0.3 2
Mobile RSS Reader
20 000 14 2048 1 8
Lampiro 45 000 11 2048 2 19
Berkeley DB 70 000 42 3.6 billion 3 21
apache, berkely db, cherokee, clamav, dia,
emacs, freebsd, gcc, ghostscript, gimp, glibc, gnumeric, gnuplot, irssi,
libxml, lighttpd, linux, lynx, minix, mplayer, mpsolve,
openldap, opensolaris, openvpn, parrot, php, pidgin,
postgresql, privoxy, python, sendmail, sqlite, subversion,
sylpheed, tcl, vim, xfig, xine-lib, xorg-server, xterm
[ICSE’10, AOSD‘11]
40 Open-Source C Projects va
riab
le c
od
e in
C f
iles
(in
%)
Lines of code
.c
.c
.c
Parse Type Check Lin
ker checks
greet.c
printf
msg
VWORL
D VBYE printf
msg msg
main
ε ε
…
greet.c
printf
msg
VWORL
D VBYE printf
msg msg
main
ε ε
…
greet.c
printf
msg
VWORL
D VBYE printf
msg msg
main
ε ε
…
Challenges Real-world C code C preprocessor Huge size Module system / linker checks
Variability-Aware Analysis Parser Type System Static Analysis Bug Finding Testing Model Checking Theorem Proving …
AST with Variability Information
greet.c
printf
msg
VWORLD VBYE printf
msg msg
main
ε ε
…
[OOPSLA‘11]
Parsing C without Preprocessing
greet.c
printf
msg
VWORLD VBYE + printf
+ msg + msg
+ main
ε ε
? ? ?
Macro expansion needed for parsing Alternative macros
Undisciplined annotations
Previous Solutions
Disciplined Subset Requires Code Preparation Heuristics and Partial Analysis Inaccurate, False Positives Brute Force Infeasible Effort
TypeChef
+
* VA
2 4 3 5
Variability-Aware
Parser
Variability-Aware
Analysis
https://github.com/ckaestne/TypeChef
Variability-Aware
Lexer
( 2
* 3
) +
4A 5¬A
3 +
4A 4¬A˄B +¬A˄B 6¬A
(
(¬A )¬A
4A (¬A 4¬A˄B +¬A˄B 6¬A
4¬A˄B +¬A˄B 6¬A
)¬A
+
6 4
VB
6
)
true
4
VA
+
3
Library of Variability-Aware
Parser Combinators in Scala
[OOPSLA‘11]
0 C files (x86)
0 included header files per C file
0 distinct macros per C file
0 % conditional
0 seconds per file (median)
0 syntax errors
7665 353
8590 72 30
0 X86 2.6.33.3
X86 2.6.33.3
Type Checking
20 seconds per file
Type Checking BusyBox
files lines of C code features
minutes parsing minutes type checking seconds linker checking
511 260.000
811
51 6 4
//… skipped 260 lines
struct globals {
double cur_time;
//… skipped 11 lines
#if ENABLE_FEATURE_NTPD_SERVER
int listen_fd;
#endif
unsigned verbose;
//… skipped 73 lines
};
//… skipped 1761 lines
int ntpd_main(int argc UNUSED_PARAM, char **argv)
{
#undef G
struct globals G;
//… skipped 81 lines
if (i > (ENABLE_FEATURE_NTPD_SERVER && G.listen_fd != -1)) {
…
}
…
}
ntpd.c: 2128 [CONFIG_NTPD && !CONFIG_FEATURE_NTPD_SERVER] field listen_fd unknown in struct globals
FUTURE DIRECTIONS
Correctness?
Variability-Aware Analysis
Product Generation
Conventional Analysis
Variability-Aware Analysis Parser Type System Static Analysis Bug Finding Testing Model Checking Theorem Proving …
Product-Line
Evolution
615 trillion config. 553 quintillion config.
Reengineering Variability
Legacy System Disciplined Variability
Implementation
#ifdef parameters branches in VCS clones domain knowledge
refactoring
plug-ins feature modules
aspects disciplined annotations
runtime variability
[GPCE’09; Grant Prop.]
Compositional Approaches
class Stack { void push(Object o) { elementData[size++] = o; } ... }
refines class Stack { void push(Object o) { Lock l = lock(o); Super.push(o); l.unlock(); } ... }
Base / Platform
Feature: Queue
Feature: Diagnostic
aspect Diagnostics { ... }
class Stack { void push(Object o) { Lock l = lock(o); elementData[size++] = o; l.unlock(); } ... }
Composition
Module
Components
Frameworks, Plug-ins
Feature-Modules / Mixin Layers / …
Aspects / Subjects, Hyper/J, Deltas
[ICSE’09, J.ASE’10, SCP’10, TSE’12]
Predicting
Nonfunctional Properties [SPLC’11, SQJ’11, ICSE’12]
Empirical methods & human factors
[AOSD’11 ESEM’11, EASE’11]
Domain- Specific
Languages: SugarJ
[GPCE’11, OOPSLA’11]
Runtime Updates for Java
[APSEC’08, J. SP&E’11]
Parsing and Type Checking all 210000 Configurations
of the Linux Kernel
https://github.com/ckaestne/TypeChef
greet.c
printf
msg
VWORLD VBYE printf
msg msg
main
ε ε
… 3 +
4 4¬A˄B +¬A˄B 6¬A
(
( )¬A
4 (
4¬A˄B +¬A˄B 6¬A
4¬A˄B +¬A˄B 6¬A
)¬A
)
true
AOP Compiler Extensions
Aspect-Oriented Decomposition of Berkeley DB
2006 2007 2008 2009 2010 2011 2012 1982…
M.Sc. Thesis
Ph.D. Thesis Post Doc
Austin, Texas Magdeburg, Germany Marburg, Germany
Virtual Separation of Concerns (Tool Support for Annotation-Based
Variability Implementation)
TypeChef (Analyzing Real-World C Code)