Top Banner
CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer http://www.cs.berkeley.edu/ ~necula/cil – CC ’02 Friday, April
22

CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

CIL: Infrastructure for C Program Analysis and Transformation

George C. Necula, Scott McPeak,S. P. Rahul, Westley Weimer

http://www.cs.berkeley.edu/~necula/cil

ETAPS – CC ’02 Friday, April 12

Page 2: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

What is CIL?

Distills C language into a few key forms with precise semantics

Parser + IR + Program Merger for CMaintains types, close ties to sourceHighly structured, clean subset of CHandles ANSI/GCC/MSVC

Page 3: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Why CIL?

Analyses and TransformationsEasy to use impersonates compiler & linker $ make project CC=cil

Easy to work with converts away tricky syntax leaves just the heart of the language separates concepts

Page 4: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

C Feature Separation

CIL separates language components pure expressions statements with side-effects control-flow embedded CFG

Keeps all programmer names temps serialize side-effects simplified scoping

Page 5: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Example: C Lvalues

An exp referring to a region of storageExample: rec[1].fld[2]May involve 1, 2, 3 memory accesses 1 if rec and fld are both arrays 2 if either one is a pointer 3 if rec and fld are both pointers

Syntax (AST) is insufficient

Page 6: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

CIL Lvalues

An exp referring to a region of storage

lval ::= <base offset>base ::= Var(varinfo) | Mem(exp)offset ::= None | Field(f offset) | Index(exp offset)

Page 7: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

CIL Lvalues

Example: rec[1].fld[2] becomes either:<Var(rec), Index(1, Field(fld, Index(2, None)))> or:<Mem(2 + Lvalue(<Mem(1 + Lvalue(<Var(rec),

None>)), Field(fld, None)>), None>

Full static and operational semantics

Page 8: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Semantics

CIL gives syntax-directed semanticsExample judgment:

(x) = ` Var(x) (&x,)

environment

lvalue formmeaning

Page 9: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

CIL Lvalue Semantics

(x) =

`Var(x) (&x,)

` e : Ptr()

`Mem(e) (e,)

` b (a,)

`None@b (a,)

` b (a1,Arr(1)) `o@(a1+e|1|,1) (a2,2)

`Index(e,o)@b (a2,2)

` o@b (a,)

`<b,o> (a,)

Page 10: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

CIL Source Fidelity

CIL output:struct __anonstruct1 { int fld[3] ;}; typedef struct

__anonstruct1 * Myptr;Myptr rec;(rec + 2)->fld[1] = (int)’h’;

SUIF 2.2.0-4 output:typedef int __ar_1[3];struct type_1 { __ar_1 fld; };struct type_1 * rec;(((((int *)(((char *)&((((struct

type_1 *) (rec))))[2])+0U))))[1]) =(104);

typedef struct { int fld[3]; } * Myptr;Myptr rec;rec[2].fld[1] = ’h’;

Page 11: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Corner Cases

Your analysis will not have to handle: return ({goto L; p;}) && ({L: 5;}); return &(--x ? : z) - & (x++, x);

Full handling of GNU-isms, MSVC-isms attributes initializers

Page 12: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Corner Cases

Your analysis will not have to handle: return ({goto L; p;}) && ({L: 5;});

int tmp;

goto L;

if (p) { L: tmp = 1; }

else { tmp = 0; }

return tmp;

Page 13: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

StackGuard Transform

Cowan et al., USENIX ’98Buffer overrun defense push return addess on private stack pop before returning only change functions with local arrays

40 lines of commented code with CILQuite easy: uses visitors for tree replacement, explicit returns, etc.

Page 14: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Other Transforms

Instrument and log all calls: 150 linesEliminate break, continue, switch: 1101 memory access per assignment: 100Make each function have a single return statement: 90Make all stack arrays heap-allocated: 75Log all value/addr memory writes: 45

Page 15: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Whole-Program Merger

C has incremental linking, compilation coupled with a weak module system!

Example (vortex / gcc / c++2c):

/* foo.c */

struct list { int head;

struct list * tail;

};

struct list * mylist;

/* bar.c */

struct chain { int head;

struct chain * tail;

};

extern struct chain * mylist;

Page 16: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Merging a Project

Determine what files to mergeMerge the files handle file-scoped identifiers C uses name equivalence for types but modules need structural equivalence

Key: Each global identifier has 1 type!

Page 17: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Other Merger Details

Remove duplicate declarations every file includes <stdio.h>

Match struct pointer with no defined body in file A to defined body in file B

Be careful when picking representatives

Page 18: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

How Does it Work?

Make project, pass all files through CILRun your transform and analysisEmit simplified CCompile simplified C with GCC/MSVC… and it works!

Page 19: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Large Programs

Program #LOC *.[ch]

Notes

SPECINT95 360K

GIMP-1.2.2 800K large libraries

linux-2.4.5 2.5M 132% compile time

ACE (in C) 2M 2000 files

Used in the CCured and BLAST projects

Page 20: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Merged Kernel Stats

Stock monolithic Linux 2.4.5 kernelhttp://manju.cs.berkeley.edu/cil/vmlinux.cStatistics: Before | After 324 files | One 12.5MB file 11.3 M-words | 1.5 M-words 7.3 M-LOC (post-process) | 470 K-LOC$ make CC=“cil –merge” HOSTCC=“cil –merge” LD=“cil –merge” AR=“cil –mode=AR –merge”

Page 21: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Conclusion

CIL distills C to a precise, simple subset easy to analyze well-defined semantics close to the original source

Well-suited to complex analyses and source-to-source transformsParses ANSI/GCC/MSVC CRapidly merges large programs

Page 22: CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer necula/cil.

Questions?

Try CIL out:

http://www.cs.berkeley.edu/~necula/cil

Complete source, documentation and test cases freely available