An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Post on 30-Jan-2016

40 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University. Lecture 1 October 29, 2001. ConCert Meeting. Plan. Today: Show and tell. Cartoons Some history Special J compiler Demo Next time: Technical details. Lf i and Oracle-based checking Safety policies - PowerPoint PPT Presentation

Transcript

An Introduction toProof-Carrying Code

Peter LeeCarnegie Mellon University

Lecture 1

October 29, 2001

ConCert Meeting

Plan

Today: Show and tell. Cartoons Some history Special J compiler Demo

Next time: Technical details. Lfi and Oracle-based checking Safety policies Compiler strategy and annotations Engineering considerations Ideas for ConCert-related projects

Arianne 5

On June 4, 1996, the Arianne 5 took off on its maiden flight.

40 seconds into its flight it veered off course and exploded.

It was later found to be an error in reuse of a software component.

For the next two years, virtually every research presentation used this picture.

“Better, Faster, Cheaper”

In 1999, NASA lost both the Mars Polar Lander and the Climate Orbiter.

Later investigations determined software errors were to blame.

Orbiter: Component reuse error.

Lander: Precondition violation.

USS Yorktown

“After a crew member mistakenly entered a zero into the data field of an application, the computer system proceeded to divide another quantity by that zero. The operation caused a buffer overflow, in which data leaked from a temporary storage space in memory, and the error eventually brought down the ship's propulsion system. The result: the Yorktown was dead in the water for more than two hours.”

Programmable mobile devices

By 2003, one in five people will own a mobile communications device.

Nokia expects to sell 500M Java-enabled phones in 2003.

Most of these devices will be power and memory limited.

Security Attacks

According to CERT, the majority of security attacks exploit

input validation failure

buffer overflow

VBShttp://www.cert.org/summaries/CS-2000-04.html

BSOD embarrassments

Observations

Failures often due to simple problems “in the details.”

Reuse is critical but perilous.

Performance still matters a lot.

Safety Engineering

Small theorems about large programs would be useful.

Need clearly specified interfaces and checking of interface compliance.

Must not sacrifice performance.

The Code Safety Problem

Please install and execute this.

Code Safety

CPU

Code

Trusted Host

Is this safe to execute?

TheoremProver

Approach 4Formal Verification

CPU

Code

Flexible andpowerful.

Trusted Host

But really reallyreally hard andmust be correct.

A Key Idea: Explicit Proofs

CertifyingProver

CPU

ProofChecker

Code

Proof

Trusted Host

A Key Idea: Explicit Proofs

CertifyingProver

CPU

Code

Proof

No longer need totrust this component.

ProofChecker

Proof-Carrying Code[Necula & Lee, OSDI’96]

A

B

Formal proof or“explanation” of safety

Typically nativeor VM code

rlrrllrrllrlrlrllrlrrllrrll…

Proof-Carrying Code

CertifyingProver

CPU

Code

Proof

Simple,small (<52KB),and fast.

No longer need totrust this component.

ProofChecker

Reasonable in size (0-10%).

Automation viaCertifying Compilation

CertifyingCompiler

CPULooks and smells like a compiler.

% spjc foo.java bar.class baz.c -ljdk1.2.2

Sourcecode

Proof

Objectcode

CertifyingProver

ProofChecker

The Role ofProgramming Languages

Civilized programming languages can provide “safety for free”.

Well-formed/well-typed safe.

Idea: Arrange for the compiler to “explain” why the target code it generates preserves the safety properties of the source program.

The Role ofJava in this Short Course

In recent years, Java has been the main focus of my work.

Java is just barely a civilized programming language.

We routinely do better than this.

Java

Java is probably a worthwhile subject of research.

However, it contains many outrageous and mostly inexcusable design errors.

As researchers, we should not forget that we have already done much better, and must continue to do better in the future.

Note

Our current approach seems to work for many problems.

But it is the only one we have tried — there are many others.

PCC is a general concept and we have just barely scratched the surface.

Overview of Our Approach

Please install and execute this.

OK, but let me quickly look over the instructions first.

Code producer Host

Overview of Our Approach

Code producer Host

Overview of Our Approach

This store instruction is dangerous!

Code producer Host

Overview of Our Approach

Can you prove that it is always safe?

Code producer Host

Overview of Our Approach

Can you prove that it is always safe?

Yes! Here’s the proof I got from my certifying Java compiler!

Code producer Host

Overview of Our Approach

Your proof checks out. I believe you because I believe in logic.

Code producer Host

Some History

History: early 90’s

Fox project starts building the FoxNet

Need to control memory layout of data Words, bytes, etc. (endianness? alignment?) Boxed vs unboxed data (efficiency? control?) Packet headers (how to write packet filters?)

ML not expressive enough, and compiler technology is inadequate

Harper invents intentional polymorphism, typed intermediate languages, and type-directed compiling

Biagioni, et al., extend SML design

History: mid 90’s

Question: Can these ideas be used in a “production-quality” compiler for a big language like ML?

Morrisett and Tarditi build TIL General hints on IL design Encouraging signs that optimizations are OK

Stone and Harper design the MIL

Lots of work, world-wide, on type-directed compiling

Work begins on TILT

History: mid 90’s

An easy observation in 1995: Types in TIL are not carried all the way down to the

final target code The idea of enclosing LF encodings of proofs with

code is “floating around”

Lee and Necula work on this, but get nowhere Many problems, such as optimizations

Necula goes to DEC SRC to intern with Detlefs and Nelson

Works on extending ESC to catch memory leaks in Modula-3 programs

The next Fall, takes Frank’s Constructive Logic course

History: 1996

Necula and Lee write several standard BPF packet filters in hand-optimized Alpha assembly code.

Simple operational semantics for a core “safe Alpha”

– Checks safety conditions for each instruction execution Proof system for “real Alpha”

– Encoded in LF– Proofs generated and checked using Elf

Results in “self-certified code”, later “proof-carrying code”

Plus proof representations, certifying compilation, safety policies (incl. resource bounds)

Inspires significant follow-on and new work at Cornell, Princeton, INRIA, and many other places

History: 1999

CMU releases PCC to Cedilla Systems Incorporated.

Patent 6,128,774. Oct.2000, Safe to execute verification of software (Necula and Lee)

Patent 6,253,370. June 2001, Method and apparatus for annotating a computer program to facilitate subsequent processing of the program (Abadi, Ghemawat, and Stata)

In less than 26 months, a complete optimizing “ahead-of-time” PCC compiler for Java.

“Applets, Not Craplets”

History: Today

Strong similarities in TILT, PCC, TAL, …

Compiler design is changing

Some day, all compilers will be certifying

History: Today

Are proofs really necessary?

Probably not

And they are messy, compared to types

But as a verification mechanism, proofchecking seems to have some possibly significant engineering advantages over typechecking

The primary contribution

“Proof engineering”.

PCC more clearly defined the proof-engineering problem

How to do checking with minimal overhead and restriction on programs, with minimal time and space overhead in checking, with minimal size and complexity of the checker, and with minimal need for changes when the proof

system changes

K Virtual Machine

Designed to support the CLDC.

Must fit into <128KB.

Must have fast bytecode verification.

kJava class files must be Java-compatible.

Divides bytecode verification into two stages.

kJava and KVM

kJava Compiler

CPU

Sourcecode

Annot

Bytecodes

kJava Preverifier

Verifier

KVM Verification

“Preverification” is performed by the code producer.

Uses global (iterative) analysis to compute the types of stack slots and local vars at every join point.

Second stage is performed by class loader.

Simple linear scan verifies correctness of join-point annotations.

KVM Example[from Frank Yellin]

static void test(Long x) { Number y = x; while (y.IntValue() != 0) { y = nextValue(y); } return y;

0. aload_01. astore_12. goto 10Long Number | <>5. aload_16. invokeStatic nextValue(Number)9. astore_1Long Number | <>10. aload_111. invokeVirtual intValue()14. ffne 517. return

Join-point typingannotations

KVM Verification

The second stage verifier is a 10KB program that requires

a single scan of the code, and

<100 bytes of run-time storage.

Impressive!

This is Java verification done right.

Join-Point Annotations

All of these approaches to certified code make use of join-point typing annotations to reduce code verification to a simple problem.

They are essentially the classical loop invariants of the Dijkstra/ Hoare program verification approach.

Overheads

In TAL and PCC we observe relatively large annotations sizes (~10-20%), sometimes much more.

Unknown for kJava.

Research question:

Can we reduce this size?

Checking speed and storage space is also a problem.

The Special J Compiler

High-Level Architecture

Explanation

CodeVerificationconditiongenerator

Checker

Safetypolicy

Agent

Host

High-Level Architecture

Explanation

CodeVerificationconditiongenerator

Checker

Safetypolicy

Agent

Host

The VCGen

The verification condition generator (VCGen) examines each instruction.

It is a symbolic evaluator that essentially implements the operational semantics of a “safe” version of the machine language.

It checks some simple properties directly. E.g., direct jumps go to legal addrs.

Informally, it invokes the Checker when “dangerous” instructions are encountered.

The VCGen, cont’d

Examples of dangerous instructions:

memory operations

procedure calls

procedure returns

For each such instruction, VCGen creates a verification condition (VC).

High-Level Architecture

Explanation

CodeVerificationconditiongenerator

Checker

Safetypolicy

Agent

Host

The Checker

When given a VC, the Checker attempts to determine its validity.

Sometimes, it consults the “explanation” for help with this.

If successful, it allows VCGen to proceed.

The set of allowable VCs and their valid proofs is defined by the safety policy.

High-Level Architecture

Explanation

CodeVerificationconditiongenerator

Checker

Safetypolicy

Agent

Host

The Safety Policy

The safety policy is defined by an inference system that defines

the language of predicates (for VCs) the axioms and inference rules for

writing valid proofs of VCs. specifications (pre/post-conditions)

for each required entry point in the code.

Operational Semantics

The VCGen is derived (by hnd) directly from the operational semantics of a “safe machine”.

The calls to the checker establish that the code always makes progress (or halts normally) in the operational semantics.

This leads to a standard notion of soundness.

What Can’t Be Enforced?

Liveness properties currently cannot be enforced by this architecture.

In practice, however, safety properties are often “good enough”.

Architecture

Code producer Host

Ginseng

Native code

Proof

Special J

Java binary

~52KB, written in CWritten in OCaml

Annotations

Architecture

Code producer Host

Proof checker

VCGen

Axioms

Native code

Proof

VCSpecial J

Java binary

Annotations

Architecture

Code producer Host

Java binary

Proof generator

Proof checker

VCGen

Axioms

Axioms

Certifying compiler

VCGen

VC

Native code

Proof

VC

Java Virtual Machine

JVM

Java Verifier

JNI

Class file Class file

Native code

Proof-carrying

code

Ch

ecke

r

Show either the Mandelbrot or NBody3D demo.

Crypto Test Suite Results[Cedilla Systems]

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Cedilla J ava J I T

sec

On average, 72.8% faster than Java, 37.5% faster than Java with a JIT.

Java Grande Suite v2.0 [Cedilla Systems]

0

100

200

300

400

500

600

700

Cedilla J ava J I T

sec

Java Grande Bench Suite [Cedilla Systems]

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

arith assign method

CedillaJ avaJ I T

ops

Ginseng

VCGen

Checker

Safety Policy

Dynamic loading

Cross-platformsupport

~15KB, roughly similar to a KVM verifier (but with floating-point).

~4KB, generic.

~19KB, declarative and machine-generated.

~22KB, some optional.

Example: Source Code

public class Bcopy { public static void bcopy(int[] src,

int[] dst) { int l = src.length; int i = 0;

for(i=0; i<l; i++) { dst[i] = src[i]; } }}

Example: Target Code

ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI_bcopy__6arrays5BcopyAIAI:

cmpl $0, 4(%esp)je L6movl 4(%esp), %ebxmovl 4(%ebx), %ecxtestl %ecx, %ecxjg L22ret

L22:xorl %edx, %edxcmpl $0, 8(%esp)je L6movl 8(%esp), %eaxmovl 4(%eax), %esi

L7:ANN_LOOP(INV = {

(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},

MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret

L13:call __Jv_ThrowBadArrayIndex

ANN_UNREACHABLEnop

L6:call __Jv_ThrowNullPointer

ANN_UNREACHABLEnop

Cut Points

Each loop entry must be annotated as a cut point.

VCGen requires this so that checking can be performed in a single scan of the code.

As a convenience, the modified registers are also declared in the cut annotations.

Example: Source Code

public class Bcopy { public static void bcopy(int[] src,

int[] dst) { int l = src.length; int i = 0;

for(i=0; i<l; i++) { dst[i] = src[i]; } }}

Example: Target Code

ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI_bcopy__6arrays5BcopyAIAI:

cmpl $0, 4(%esp)je L6movl 4(%esp), %ebxmovl 4(%ebx), %ecxtestl %ecx, %ecxjg L22ret

L22:xorl %edx, %edxcmpl $0, 8(%esp)je L6movl 8(%esp), %eaxmovl 4(%eax), %esi

L7:ANN_LOOP(INV = {

(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},

MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret

L13:call __Jv_ThrowBadArrayIndex

ANN_UNREACHABLEnop

L6:call __Jv_ThrowNullPointer

ANN_UNREACHABLEnop

A Note about Memory

We define a type for valid heap memory states:

mem : exp

and operators for reading and writing heap memory:

(sel M A)

(upd M A E)

The VCGen Process (1)_bcopy__6arrays5BcopyAIAI:

cmpl $0, src je L6 movl src, %ebx movl 4(%ebx), %ecx testl %ecx, %ecx jg L22 retL22:

xorl %edx, %edx cmpl $0, dst je L6 movl dst, %eax movl 4(%eax), %esiL7: ANN_LOOP(INV = …

A0 = (type src_1 (jarray jint))A1 = (type dst_1 (jarray jint))A2 = (type rm_1 mem)A3 = (csubneq src_1 0)ebx := src_1ecx := (sel4 rm_1 (add src_1 4))

A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0)

edx := 0

A5 = (csubneq dst_1 0)eax := dst_1esi := (sel4 rm_1 (add dst_1 4))

The VCGen Process (2)

L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI, EDX, EFLAGS,FFLAGS,RM)) cmpl %esi, %edx jae L13

movl 8(%ebx,%edx,4), %edi

movl %edi, 8(%eax,%edx,4) …

A3A5A6 = (csubb 0 (sel4 rm_1 (add src_1 4)))

edi := edi_1edx := edx_1rm := rm_2

A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4))!!Verify!! (saferd4 (add src_1 (add (imul edx_1 4) 8)))

The Checker (1)

The checker is asked to verify that(saferd4 (add src_1 (add (imul edx_1 4) 8)))

under assumptionsA0 = (type src_1 (jarray jint))A1 = (type dst_1 (jarray jint))A2 = (type rm_1 mem)A3 = (csubneq src_1 0)A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0)A5 = (csubneq dst_1 0)A6 = (csubb 0 (sel4 rm_1 (add src_1 4)))A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4))

The checker looks in the PCC for a proof of this VC.

The Checker (2)

In addition to the assumptions, the proof may use axioms and proof rules defined by the host, such as

szint : pf (size jint 4)

rdArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} pf (type A (jarray T)) -> pf (type M mem) -> pf (nonnull A) -> pf (size T 4) -> pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (saferd4 (add A OFF)).

Checker (3)

A proof for

(saferd4 (add src_1 (add (imul edx_1 4) 8)))

in the Java specification looks like this (excerpt):

(rdArray4 A0 A2 (sub0chk A3) szint (aidxi 4 (below1 A7)))

This proof can be easily validated via LF type checking.

VCGenSummary

VCGen is a symbolic evaluator for the object language.

It essentially implements a reference interpreter, except:

it uses symbolic values in order to model all possible executions, and

instead of performing run-time checks, it asks a Checker to verify the safety of “dangerous” instructions.

Safety Policies

More formally, we begin by defining the small-step operational semantics of a machine (called the s86).

, , pc instr ’, pc’

We define the machine so that only safe executions are defined.

program

register state

program counter

Safety Policies, cont’d

For convenience we choose the s86 to be a restriction of the x86.

Hence all s86 programs will execute faithfully on a real x86.

Except that on some programs in which the x86 does not execute, the x86 might do something weird.

The goal then is to prove that any given program always makes progress (or returns) in the s86.

With such a proof, the x86 is then just as good as an s86.

Verification Conditions

The point of the verification conditions, then, is to provide such progress theorems for each instruction in the program.

In other words, a VC’s validity says that the corresponding instruction has a defined execution in the s86 operational semantics.

Symbolic Evaluator

We can define the verification condition generator (VCGen) via a symbolic evaluator

SE,,0,Post(i, , L)

The result of symbolic evaluation is a conjunction of VCs, so the overall progress theorem is then

Pre SE,,0,Post(i, , L)

LF signaturepostcondition

entry point

annotations

Soundness

For particular operational semantics (a safe x86 and a safe Alpha), we have presented theorems that say, essentially:

Thm: If Pre SE,,0,Post(i, , L), then execution of , given Pre and 0, and starting from entry point i, will always make progress (or return).

Getting from Concept to Implementation

In an actual implementation, it is also handy to have a bit more than just a VC generator.

Precise syntax for VCs.

Pre/post-conditions for each entry point expected by the host in any downloaded code.

Precisely specified logical system for proving the VCs.

Verifier for “meta-data.”

Safety Policy Implementations

Safety policies are thus given in four parts:

A verification-condition generator (VCGen). A specification of the pre & post conditions

for all required procedures. A specification of the inference rules for

constructing valid proofs. Plug-ins for performing meta-data

verification.

LF (Elf syntax) is used for the rule and pre/post specifications, C for the VCGen and plug-ins.

C?!@$#@!

The use of C to define and implement the VCGen is, at best, expedient and at worst dubious.

However, since any code-inspection system must parse object files (not trivial!) and understand the instruction set, this seems to have practical benefits.

Clearly, a more formal approach would be desirable.

How Do We Know That It’s Right?

How Do We Know That It’s Right?

Although the papers and dissertation follow a rigorous development leading to a soundness result, in practice it is tempting to hack in new things in the LF signature…

ExampleJava Type-Safety Specification

Our largest example of a safety-policy specification is for the “SpecialJ” Java native-code compiler.

It contains about 140 inference rules.

Roughly speaking, these rules can be separated into 5 classes.

Safety PolicyRule Excerpts

/\ : pred -> pred -> pred.\/ : pred -> pred -> pred.=> : pred -> pred -> pred.all : (exp -> pred) -> pred.

pf : pred -> type.

truei : pf true.andi : {P:pred} {Q:pred} pf P -> pf Q -> pf (/\ P Q).andel : {P:pred} {Q:pred} pf (/\ P Q) -> pf P.ander : {P:pred} {Q:pred} pf (/\ P Q) -> pf Q.

1. Standard syntax and rules for first-order logic.

Type of valid proofs, indexed by predicate.

Syntax of predicates.

Inference rules.

= : exp -> exp -> pred.<> : exp -> exp -> pred.

eq_le : {E:exp} {E':exp} pf (csubeq E E') -> pf (csuble E E').

moddist+: {E:exp} {E':exp} {D:exp} pf (= (mod (+ E E') D) (mod (+ (mod E D) E') D)).

=sym : {E:exp} {E':exp} pf (= E E') -> pf (= E' E).<>sym : {E:exp} {E':exp} pf (<> E E') -> pf (<> E' E).

=tr : {E:exp} {E':exp} {E'':exp} pf (= E E') -> pf (= E' E'') -> pf (= E E'').

Safety PolicyRule Excerpts

2. Syntax and rules for arithmetic and equality.

“csuble” means in the x86 machine.

Safety PolicyRule Excerpts

jint : exp.jfloat : exp.jarray : exp -> exp.jinstof : exp -> exp.

of : exp -> exp -> pred.

faddf : {E:exp} {E':exp} pf (of E jfloat) -> pf (of E' jfloat) -> pf (of (fadd E E') jfloat).

ext : {E:exp} {C:exp} {D:exp} pf (jextends C D) -> pf (of E (jinstof C)) -> pf (of E (jinstof D)).

3. Syntax and rules for the Java type system.

Safety PolicySample Rules

aidxi : {I:exp} {LEN:exp} {SIZE:exp} pf (below I LEN) -> pf (arridx (add (imul I SIZE) 8) SIZE LEN).

wrArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} {E:exp} pf (of A (jarray T)) ->

pf (of M mem) -> pf (nonnull A) -> pf (size T 4) ->

pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (of E T) -> pf (safewr4 (add A OFF) E).

4. Rules describing the layout of data structures.

This “sel4” means the result of reading 4 bytes from heap M at address A+4.

Safety PolicySample Rules

nlt0_0 : pf (csubnlt 0 0).nlt1_0 : pf (csubnlt 1 0).nlt2_0 : pf (csubnlt 2 0).nlt3_0 : pf (csubnlt 3 0).nlt4_0 : pf (csubnlt 4 0).

5. Quick hacks.

Sometimes “unclean” things are put into the specification...

The Basic Trick

Recall the bcopy program:public class Bcopy { public static void bcopy(int[] src,

int[] dst) { int l = src.length; int i = 0;

for(i=0; i<l; i++) { dst[i] = src[i]; } }}

Unoptimized Loop Body

L11 :movl 4(%ebx), %eaxcmpl %eax, %edxjae L24

L17 :cmpl $0, 12(%ebp)movl 8(%ebx, %edx, 4), %esije L21

L20 :movl 12(%ebp), %edimovl 4(%edi), %eaxcmpl %eax, %edxjae L24

L23 :movl %esi, 8(%edi, %edx, 4)movl %edi, 12(%ebp)incl %edx

L9 :ANN_INV(ANN_DOM_LOOP,

%LF_(/\ (of rm mem ) (of loc1 (jarray jint) ))%_LF,RB(EBP,EBX,ECX,ESP,FTOP,LOC4,LOC3))cmpl %ecx, %edxjl L11

Bounds check on src.

Bounds check on dst.

Note: L24 raises the ArrayIndex exception.

Unoptimized Code is Easy

In the absence of optimizations, proving the safety of array accesses is relatively easy.

Indeed, in this case it is reasonable for VCGen to verify the safety of the array accesses.

As the optimizer becomes more successful, verification gets harder.

Role of Loop Invariants

It is for this reason that the optimizer’s knowledge must be conveyed to the theorem prover.

Essentially, any facts about program values that were used to perform and code-motion optimizations must be declared in an invariant.

Optimized Loop Body

L7:ANN_LOOP(INV = {

(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},

MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edx

Essential facts about live variables, used by the compiler to eliminate bounds-checks in the loop body.

Certifying Compiling andProving

Intuitively, we will arrange for the Prover to be at least as powerful as the Compiler’s optimizer.

Hence, we will expect the Prover to be able to “reverse engineer” the reasoning process that led to the given machine code.

An informal concept, needing a formal understanding! (Type theory is essential here…)

What is Safety, Anyway?

If the compiler fails to optimize away a bounds-check, it will insert code to perform the check.

This means that programs may still abort at run-time, albeit with a well-defined exception.

Is this safe behavior?

Compiler Development

The PCC infrastructure catches many (probably most) compiler bugs early.

Our standard regression test does not execute the object code!

Principle: Most compiler bugs show up as safety violations.

Example Bug

… L42: movl 4(%eax), %edx

testl %edx, %edxjle L47

L46: … set up for loop … L44: … enter main loop code …

…jl L44jmp L32

L47: fldzfldz

L32: … return sequence …ret

Example Bug

… L42: movl 4(%eax), %edx

testl %edx, %edxjle L47

L46: … set up for loop … L44: … enter main loop code …

…jl L44jmp L32

L47: fldz

L32: … return sequence …ret

Error in rarely executed compensation code is caught by the Proof Generator.

Another Example Bug

Suppose bcopy’s inner loop is changed:

L7: ANN_LOOP( … )cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret

Another Example Bug

Suppose bcopy’s inner loop is changed:

L7: ANN_LOOP( … )cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)addl 2, %edxcmpl %ecx, %edxjl L7ret

Again, PCC spots the danger.

Yet Another

class Floatexc extends Exception {

public static int f(int x) throws Floatexc { return x;} public static int g(int x) { return x;}

public static float handleit (int x, int y) {float fl=0;try { x=f(x); fl=1; y=f(y);}catch (Floatexc b) { fl+=fl; }return fl;

}}

Yet Another

…Install handler…pushl $_6except8Floatexc_Ccall __Jv_InitClassaddl $4, %esp

…Enter try block…L17:

movl $0, -4(%ebp)pushl 8(%ebp)call _6except8Floatexc_MfIaddl $4, %espmovl %eax, %ecx

……A handler…L22:

flds -4(%ebp)fadds -4(%ebp)jmp L18

Another Example[by George Necula]

void fir (int *data, int dlen, int *filter, int flen) { int i, j;

for (i=0; i<=dlen-flen; i++) { int s = 0;

for (j=0; j<flen; j++) s += filter[j] * data[i+j];

data[i] = s; }}

Compiled Example

ri = 0sub t1 = rdl, rfl

L0: CUT(ri,rj,rs,t2,t3,t4,rm)le t2 = ri, t1jeq t2, L3rs = 0rj = 0

L1: CUT(rj,rs,t2,t3,t4)lt t2 = rj, rfljeq t2, L2ult t2 = rj, rfljeq t2, Labortld t3 = [rf + 4*rj]add t2 = ri, rj

ult t4 = t2, rdljeq t4, Labortld t2 = [rd + 4*t2]mul t2 = t3, t2add rs = rs, t2add rj = rj, 1jmp L1

L2: ult t2 = ri, rdljeq t2, Labortst [rd + 4*ri] = rsadd ri = ri, 1jmp L0

L3: retLabort: call abort

/* rd=data, rdl=dlen, rf=filter, rfl=flen */

The Safety Policy

The safety policy defines verification conditions of the form:

true, E = E saferd(M, E), safewr(M, E, E) array(EA, ES, EL), vector(EA, ES, EL) Prefir = array(rd,4,rdl),

vector(rf,4,rfl) Postfir = true

VCGen Example

ri = 0sub t1 = rdl, rfl

L0: CUT(ri,rj,rs,t2,t3,t4,rm)

le t2 = ri, t1jeq t2, L3…

L3: ret

Assume precondition: array(cd,4,cdl) vector(cf,4,cfl)

Set ri = 0

Set t1 = sub(cdl,cfl)

Set rd=cd; rdl=cdl; rf=cf; rfl=cfl; rm=cm

Set ri=ci; rj=cj; rs=cs; t2=c2; t3=c3; t4=c4; rm=cm’

Set t2 = le(ci, sub(cdl,cfl))Assume not(le(ci, sub(cdl,cfl)))

Check postcondition;

Check rd,rdl,rf,rfl have initial values

VCGen Example

ri = 0sub t1 = rdl, rfl

L0: CUT(ri,rj,rs,t2,t3,t4,rm)

le t2 = ri, t1jeq t2, L3rs = 0rj = 0

L1: CUT(rj,rs,t2,t3,t4)

lt t2 = rj, rfljeq t2, L2…

L2: ult t2 = ri, rdljeq t2, Labortst [rd + 4*ri] = rs

Set ri = 0

Set t1 = sub(cdl,cfl)Set ri=ci; rj=cj; rs=cs; t2=c2 t3=c3; t4=c4; rm=cm’

Set t2 = le(ci, sub(cdl,cfl))Assume le(ci, sub(cdl,cfl))Set rs = 0Set rj = 0Set rj=cj’; rs=cs’; t2=c2’; t3=c3’; t4=c4’

Set t2 = lt(cj’, cfl)Assume not(lt(cj’, cfl))

Set t2 = ult(ci, cdl)Assume ult(ci, cdl)Check safewr(cm’, add(cd,mul(4,ci)),cs’)

More on the Safety Policy

Some of the inference rules in the LF signature:

rdarray : saferd(M,add(A,mul(S,I))) <- array(A,S,L), ult(I,L).

rdvector : saferd(M,add(A,mul(S,I))) <- vector(A,S,L), ult(I,L).

wrarray : safewr(M,add(A,mul(S,I)),V) <- array(A,S,L), ult(I,L).

The Checker

When the Checker is invoked on safewr(cm’, add(cd,mul(4,ci)), cs’)

There are assumptions: assume0 : ult(ci,cdl). assume1 : not(lt(cj’,cfl)). assume2 : le(ci, sub(cdl,cfl)). assume3 : vector(cf,4,cfl). assume4 : array(cd,4,cdl).

The Checker, cont’d

The VC safewr(cm’, add(cd,mul(4,ci)), cs’)

can be verified by using the rule wrarray : safewr(M,add(A,mul(S,I)),V) <- array(A,S,L), ult(I,L).

and assumptions assume0 : ult(ci,cdl). assume4 : array(cd,4,cdl).

Proof Representation

A simple (but somewhat naïve) representation of the proof is simply the sequence of proof rules:

wrarray, assume4, assume0

Optimized Code

The previous example was somewhat simplified.

More realistic code is optimized, usually based on inferences about integer values.

Such optimizations require that arithmetic invariants be placed in the cut points.

Optimized Example

ri = 0sub t1 = rdl, rfl

L0: CUT(ri>0,{ri,rj,…})le t2 = ri, t1jeq t2, L3rs = 0rj = 0

L1: CUT(rj>0,{rj,rs,…})lt t2 = rj, rfljeq t2, L2ld t3 = [rf + 4*rj]add t2 = ri, rj

ld t2 = [rd + 4*t2]mul t2 = t3, t2add rs = rs, t2add rj = rj, 1jmp L1

L2: st [rd + 4*ri] = rsadd ri = ri, 1jmp L0

L3: ret

/* rd=data, rdl=dlen, rf=filter, rfl=flen */

VCGen Example

ri = 0sub t1 = rdl, rfl

L0: CUT(ri>0, {ri,rj,rs,t2,t3,t4,rm}

le t2 = ri, t1jeq t2, L3rs = 0rj = 0

Set ri = 0

Set t1 = sub(cdl,cfl)Set ri=ci; rj=cj; rs=cs; t2=c2 t3=c3; t4=c4; rm=cm’

Set t2 = le(ci, sub(cdl,cfl))Assume le(ci, sub(cdl,cfl))

Assume >(ci,0)

Practical Considerations

Trusted Computing Base

The trusted computing base is the software infrastructure that is responsible for ensuring that only safe execution is possible.

Obviously, any bugs in the TCB can lead to unsafe execution.

Thus, we want the TCB to be simple, as well as fast and small.

VCGen’s Complexity

Fortunately, proofs can be quite small, and proofchecking can be quite simple, small, and fast.

VCGen, at core, is also simple and fast.

But in practice it gets to be quite complicated.

VCGen’s Complexity

Some complications: If dealing with machine code, then

VCGen must parse machine code. Maintaining the assumptions and

current context in a memory-efficient manner is not easy.

Note that Sun’s kVM does verification in a single pass and only 8KB RAM!

VC Explosion

a == b

a == c

f(a,c)

a := x c := x

a := y c := y

a=b => (x=c => safef(y,c) x<>c => safef(x,y))

a<>b => (a=x => safef(y,x) a<>x => safef(a,y))

Exponential growth in size of the VC is possible.And it actually happens in practice!

Precondition: safef(i,j)

VC Explosion

a == b

a == c

f(a,c)

a := x c := x

a := y c := y

INV: P(a,b,c,x)

(a=b => P(x,b,c,x)

a<>b => P(a,b,x,x))

(a’,c’. P(a’,b,c’,x) =>

a’=c’ => safef(y,c’) a’<>c’ => safef(a’,y))

Growth can usually becontrolled by careful placementof just the right “join-point” invariants.

Stack Slots

Each procedure will want to use the stack for local storage.

This raises a serious problem because a lot of information is lost by VCGen (such as the value) when data is stored into memory.

Stack Slots

We avoid this problem by assuming that procedures use up to 256 words of stack as registers.

Main restriction:

No indirect addressing of stack slots.

Callee-save Registers

Standard calling conventions dictate that the contents of some registers be preserved.

These callee-save registers are specified along with the pre/post-conditions for each procedure.

The preservation of their values must be verified at every return instruction.

Postcondition

Precondition

ANN_FUNCTION(__Jv_instanceof,

%LF_(/\ (of loc3 (jinstof _4java4lang6Object_C))

(/\ (of (loc2 jint)

(/\ (jelemtype loc1)

(of rm mem))))%_LF,

%LF_(/\ (of eax jbool)

(of rm mem))%_LF,

RB(ESP,EBP,FTOP),

3,4)

Function specifications

Callee-save registersStack spec

Annotations used by Special J

ANN_CLASSANN_FUNCTIONANN_LOCALSANN_INVANN_DOM_LOOPANN_DOMINATORANN_SYMBOLADDRANN_CALLJAVAVIRTUALANN_CALLJAVAINTERFACEANN_JUMPTHROUGHTABLEANN_INSTALLEDJAVAHANDLERANN_UNINSTALLEDJAVAHANDLERANN_UNREACHABLE

ANN_CLASS and ANN_FUNCTION

Normally, ANN_FUNCTION is not used. Instead, ANN_CLASS declares that an object file implements a Java class.

public final class Factor1 { … }

ANN_CLASS(_7Factor1_vt)…

ANN_LOCALS

As a convenience for VCGen, the number of stack slots is declared for each method.

public static void combineTags(Node n, int i) {

}

ANN_LOCALS(__7Factor1_McombineTagsL4NodeXI, 8).text.align 4.globl __7Factor1_McombineTagsL4NodeXI__7Factor1_McombineTagsL4NodeXI :…

ANN_INV / ANN_DOM_LOOP

Loop invariants.

ANN_INV(ANN_DOM_LOOP,

%LF_(/\ (nonnull loc2 )

(/\ (of rm mem )

(of eax (jinstof

_4java4util12ListIterator_vt) )))%_LF,

RB(EBP,ESP,FTOP,LOC4,LOC3,LOC2))

Signifies loop invariant

Invariants

Modified registers

ANN_DOMINATOR

Dominating join points are marked.

ANN_DOMINATOR.L536_dom:

jle .L237

.L237 :ANN_INV(.L536_dom, %LF_(/\ (nonnull loc3 ) (/\ (of rm mem ) (of loc3 (jinstof _4Node_vt) )))%_LF, RB(EBP,ESP,FTOP,LOC5,LOC4))

Invariants

Special J currently emits the followings kinds of invariants:

true, false x = y, x <> y (x,y regs or consts) x < y (signed and unsigned) x : t

jint, jbool, … Jclassdesc jinstof(C) implSpecIntf(x,y,z) …

Virtual method invocation

public static void combineTags(Node n, int i) { if(i>0) { if(!n.isString()) { Iterator iter = n.getSubtrees();

while(iter.hasNext()) { combineTags((Node)(iter.next()), i-1); }

Virtual method invocation, cont’d

For the loop body:pushl $1 # vmethod

ANN_SYMBOLADDR(0)pushl $_4java4util8Iterator_vt # classpushl -4(%ebp) # objectcall __Jv_LookupInterfaceMethodaddl $12, %esppushl -4(%ebp)

ANN_CALLJAVAVIRTUAL(_4java4util8Iterator_vt, 1) # next methodcall *%eaxaddl $4, %esp

ANN_SYMBOLADDR(0)pushl $_4Node_vtpushl $0pushl %eaxcall __Jv_checkCast

Jump tables

public static final void closeToString (int t) throws IOException { if(!isEmpty(t)) { switch (getColor(t)) { case -1 : break ; // no color case 0 : singleTagString('r', noSecond, false); break; case 1 : singleTagString('g', noSecond, false); break; case 2 : singleTagString('b', noSecond, false); break; case 3 : singleTagString('c', noSecond, false); break; case 4 : singleTagString('m', noSecond, false); break; case 5 : singleTagString('y', noSecond, false); break; case 6 : singleTagString('k', noSecond, false); break; case 7 : singleTagString('w', noSecond, false); break; }…

Jump tables, cont’d

ANN_DOMINATOR.L181_dom:

jae .L23.L33 :ANN_JUMPTHROUGHTABLE(.L32, 9)ANN_SYMBOLADDR(0)

jmp *.L32(, %ebx, 4).L24 :

pushl $0pushl $0pushl $119call

__3Tag_MsingleTagStringCCZaddl $12, %espjmp .L23

.L25…

….L32:

.long .L23

.long .L31

.long .L30

.long .L29

.long .L28

.long .L27

.long .L26

.long .L25

.long .L24

Exception handlers

public Object clone() { try { return super.clone(); } catch (CloneNotSupportedException e) { return null; }}

Exception handlers, cont’d

__7Context_Mclone :pushl %ebpmovl %esp, %ebpcall __Jv_GetExcHandler

ANN_SYMBOLADDR(0)pushl $.L11

ANN_SYMBOLADDR(0)pushl

$_4java4lang26CloneNotSupportedException_vtpushl %ebppushl $1pushl (%eax)

ANN_INSTALLJAVAHANDLER(.L11)movl %esp, (%eax)pushl 8(%ebp)

ANN_DOMINATOR.L14_dom:

call __4java4lang6Object_Mcloneaddl $4, %esp

.L9 :movl %eax, 8(%ebp)call __Jv_GetExcHandlermovl (%esp), %ebx

ANN_UNINSTALLJAVAHANDLER(1)…

.L11 :ANN_INV(.L14_dom,

%LF_(of rm mem )%_LF,RB(EBP,ESP,FTOP,LOC3,LOC2))nop

.L12 :xorl %eax, %eaxmovl %ebp, %esppopl %ebpret

Efficient Representation and Validation of Proofs

Goals

We would like a representation for proofs that is

compact, fast to check, requires very little memory to check, and is “canonical,” in the sense of

accommodating many different logics without requiring a reimplementation of the checker.

Three Approaches

1. Direct representation of a logic.

2. Use of a Logical Framework.

3. Oracle strings.

We will reject (1).We consider only (2) and (3).

Logical Framework

For representation of proofs we use the Edinburgh Logical Framework (LF).

LFi

Skip?

LF Example in Elf Syntax

exp : typepred : typepf : pred -> type

true : pred/\ : pred -> pred -> pred=> : pred -> pred -> predall : (exp -> pred) -> pred

truei : pf trueandi : {P:pred} {R:pred} pf P -> pf R -> pf (/\ P R)andel : {P:pred} {R:pred} pf (/\ P R) -> pf Pimpi : {P:pred} {R:pred} (pf P -> pf R) -> pf (=> P R)alli : {P:exp -> pred} ({X:exp} pf (P X)) -> pf (all P)alle : {P:exp -> pred} {E:exp} pf (all P) -> pf (P E)

LF as a Proof Representation

LF is canonical, in that a single typechecker for LF can serve as a proofchecker for many different logics specified in LF. [See Avron, et al. ‘92]

But the efficiency of the representation is poor.

Size of LF Representation

Proofs in LF are extremely large, due to large amounts of repetition.

Consider the representation of P P P for some predicate P:

The proof of this predicate has the following LF representation:

(=> P (/\ P P))

(impi P (/\ P P) ([X:pf P] andi P P x x))

Checking LF

The nice thing is that typechecking

is enough for proofchecking. [The theorem is in the LF paper.]

But the proofs are extremely large.

(impi P (/\ P P) ([X:pf P] andi P P X X)) : pf (=> P (/\ P P))

Implicit LF

A dramatic improvement can be achieved by using a variant of LF, called Implicit LF, or LFi.

In LFi, parts of the proof can be replaced by placeholders.

(impi * * ([X:*] andi * * X X)) : pf (=> P (/\ P P))

Soundness of LFi

The soundness of the LFi type system is given by a theorem that states:

If, in context , a term M has type A in LFi (and and A are placeholder-free), then there is a term M’ such that M’ has type A in LF.

Typechecking LFi

The typechecking algorithm for LFi is given in [Necula & Lee, LICS98].

A key aspect of the algorithm is that it avoids repeated typechecking of reconstructed terms.

Hence, the placeholders save not only space, but also time.

Effectiveness of LFi

In experiments with PCC, LFi leads to substantial reductions in proof size and checking time.

Improvements increase nonlinearly with proof size.

Experiment Proof size (bytes) Checking time (ms)LF LFi LF LFi

unpack >10 x 106 23728 8256 42simplex >2 x 106 23888 1656 42sharpen 183444 4816 136 7qsort 92412 3098 74 6kmp 77246 2092 60 3bcopy 12466 796 11 1

The Need for Improvement

Despite the great improvement of LFi, in our experiments we observe that, in practice, LFi proofs are 10%-200% the size of the code.

How Big is a Proof?

A basic question is how much essential information is in a proof?

In this proof,

there are only 2 uses of rules and in each case they were the only rule that could have been used.

(impi * * ([X:*] andi * * x x)) : pf (=> P (/\ P P))

Improving the Representation

We will now improve on the compactness of proof representation by making use of the observation that large parts of proofs are deterministically generated from the inference rules.

Additional References

For LF:

Harper, Honsell, & Plotkin. A framework for defining logics. Journal of the ACM, 40(1), 143-184, Jan. 1993.

Avron, Honsell, Mason, & Pollack. Using typed lambda calculus to implement formal systems on a machine. Journal of Automated Reasoning, 9(3), 309-354, 1992.

Additional References

For Elf: Pfenning. Logic programming in the

LF logical framework. Logical Frameworks, Huet & Plotkin (Eds.), 149-181, Cambridge Univ. Press, 1991.

Pfenning. Elf: A meta-language for deductive systems (system description). 12th International Conference on Automated Deduction, LNAI 814, 811-815, 1994.

Oracle-Based Checking

Necula’s ExampleSyntax of Girard’s System F

ty : typeint : tyarr : ty -> ty -> tyall : (ty -> ty) -> ty exp : typez : exps : exp -> explam : (exp -> exp) -> expapp : exp -> exp -> exp

of : exp -> ty -> type

Necula’s ExampleTyping Rules for System F

tz : of z int

ts : {E:exp} of E int -> of (s E) int

tlam : {E:exp->exp} {T1:ty} {T2:ty} ({X:exp} of X T1 -> of (E X) T2) -> of (lam E) (arr T1 T2)

tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T

tgen : {E:exp} {T:ty->ty} ({T1:ty} of E (T T1)) -> of E (all T)

tins : {E:exp} {T:ty->ty} {T1:ty} of E (all T) -> of E (T T1)

LF Representation

Consider the lambda expression

It is represented in LF as follows:

(f.(f x.x) (f 0)) y.y

app (lam [F:exp] app (app F (lam [X:exp] X)) (app F 0)) (lam [Y:exp] Y)

Necula’s Example

Now suppose that this term is an applet, with the safety policy that all applets must be well-typed in System F.

One way to make a PCC is to attach a typing derivation to the term.

Typing Derivation in LF(tapp (lam [F:exp] (app (app F (lam [X:exp] X)) (app F 0))) (lam ([X:exp] X)) (all ([T:ty] arr T T)) int (tlam (all ([T:ty] arr T T)) int ([F:exp] (app (app F (lam [X:exp] X)) (app F 0))) ([F:exp][FT:of F (all ([T:ty] arr T T))] (tapp (app F (lam [X:exp] X)) (app F 0) int int (tapp F (lam [X:exp] X) (arr int int) (arr int int) (tins F ([T:ty] arr T T) (arr int int) FT) (tlam int int ([X:exp] X) ([X:exp][XT:of X int] XT))) (tapp F 0 int int (tins F ([T:ty] arr T T) int FT) t0)))) (tgen (lam [Y:exp] Y) ([T:ty] arr T T) ([T:ty] (tlam T T ([Y:exp] Y) ([Y:exp] [YT:of Y T] YT)))))

Typing Derivation in LFi

(tapp * * (all ([T:*] arr T T)) int (tlam * * * ([F:*][FT:of F (all ([T:ty] arr T T))] (tapp * * int (tapp * * (arr int int) (arr int int) (tins * * * FT) (tlam * * * ([X:*][XT:*] XT))) (tapp * * int int (tins * * * FT) t0)))) (tgen * * ([T:*] (tlam * * * ([Y:*] [YT:*] YT)))))

I think. I did this by hand!

LF Representation

Using 16 bits per token, the LF representation of the typing derivation requires over 2,200 bits.

The LFi representation requires about 700 bits.

(The term itself requires only about 360 bits.)

Skip ahead

A Bit More about LFi

To convert an LF term into an LFi term, a representation algorithm is used. [Necula&Lee, LICS98]

Intuition: When typechecking a term: c M1 M2 … Mn : A (in a context )

we know, if A has no placeholders, that some of the M1…Mn may appear in A.

A Bit More about LFi, cont’d

For example, when the rule

is applied at top level, the first two arguments are present in the term

and thus can be elided.

tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T

app (lam [F:exp] app (app F (lam [X:exp] X)) (app F 0)) (lam [Y:exp] Y)

A Bit More about LFi, cont’d

A similar trick works at lower levels by relying on the fact that typing constraints are solved in a certain order (e.g., right-to-left).

See the paper for complete details.

Can We Do Better?

tz : of z int

ts : {E:exp} of E int -> of (s E) int

tlam : {E:exp->exp} {T1:ty} {T2:ty} ({X:exp} of X T1 -> of (E X) T2) -> of (lam E) (arr T1 T2)

tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T

tgen : {E:exp} {T:ty->ty} ({T1:ty} of E (T T1)) -> of E (all T)

tins : {E:exp} {T:ty->ty} {T1:ty} of E (all T) -> of E (T T1)

Determinism

Looking carefully at the typing rules, we observe:

For any typing goal where the term is known but the type is not:

3 possibilities: tgen, tins, other.

If type structure is known, only 2 choices, tapp or other.

How MuchEssential Information?

(tapp (lam [F:exp] (app (app F (lam [X:exp] X)) (app F 0))) (lam ([X:exp] X)) (all ([T:ty] arr T T)) int (tlam (all ([T:ty] arr T T)) int ([F:exp] (app (app F (lam [X:exp] X)) (app F 0))) ([F:exp][FT:of F (all ([T:ty] arr T T))] (tapp (app F (lam [X:exp] X)) (app F 0) int int (tapp F (lam [X:exp] X) (arr int int) (arr int int) (tins F ([T:ty] arr T T) (arr int int) FT) (tlam int int ([X:exp] X) ([X:exp][XT:of X int] XT))) (tapp F 0 int int (tins F ([T:ty] arr T T) int FT) t0)))) (tgen (lam [Y:exp] Y) ([T:ty] arr T T) ([T:ty] (tlam T T ([Y:exp] Y) ([Y:exp] [YT:of Y T] YT)))))

How MuchEssential Information?

There are 15 applications of rules in this derivation.

So, conservatively: log2 3 15 = 30 bits

In other words, 30 bits should be enough to encode the choices made by a type inference engine for this term.

Oracle-based Checking

Idea: Implement the proofchecker as a nondeterministic logic interpreter whose

program consists of the derivation rules, and

initial goal is the judgment to be verified.

We will avoid backtracking by relying on the oracle string.

Skip ahead

Why Higher-Order?

The syntax of VCs for the Java type-safety policy is as follows:

The LF encodings are simple Horn clauses (and requiring only first-order unification). Higher-order features only for implication and universal quantification.

E ::= x | c E1 … En

F ::= true | F1 F2 | x.F | E | E F

Why Higher-Order?

Perhaps first-order Horn logic (or perhaps first-order hereditary Harrop formulas) is enough.

Indeed, first-order expressions and formulas seem to be enough for the VCs in type-safety policies.

However, higher-order and modal logics would require higher-order features.

A SimplificationA Fragment of LF

Level-0 types. A ::= a | A1 A2

Level-1 types (-normal form). B ::= a M1 … Mn | B1 B2 | x:A.B

Level-0 kinds. K ::= Type | A K

Level-0 terms (-normal form). M ::= x:A.M | c M1 … Mn | x M1 … Mn

LF Fragment

This fragment simplifies matters considerably, without restricting the application to PCC.

Level-0 types to encode syntax.

Level-1 types to encode derivations.

No level-1 terms since we never reconstruct a derivation, only verify that one exists.

LF Fragment, cont’d

ty : typeexp : type

of : exp -> ty -> type

Level-0 types.

Level-1 type family.

Disallowing level-2 and higher type families seems not to have any practical impact.

Logic InterpreterGoals

G ::= B | M = M’ | x:B.G | x:A.G

| T | G1 G2

.

For Necula’s example, the interpreter will be started with the goal

t:ty. of E t

Naïve Interpreter

solve(B1 B2) = x:B1. solve(B2)

solve(x:A.B) = x:A. solve(B)

solve(a M1 … Mn) = subgoals(B, a M1 … Mn) where B is the type of a level-1 constant or a level-1 quantified variable (in scope), as selected by the oracle.

subgoals(B1 B2, B) = x:B1. solve(B2)

subgoals(x:A.B’, B) = x:A. solve(B)

subgoals(a M1’ … Mn’, a M1 … Mn) = M1 = M1’ … Mn = Mn’

Back to the example

Consider

solve(of E t)

This consults the oracle.

Since there are 3 level-1 constants that could be used at this point, 2 bits are fetched from the oracle string (to select tapp).

Higher-Order Unification

The unification goals that remain after solve are higher-order and thus only semi-decidable.

A nondeterministic unification procedure (also driven by the oracle string) is used.

Some standard LP optimizations are also used.

Certifying Theorem Proving

Certifying Theorem Proving

Time does not allow a description here.

See: Necula and Lee. Proof generation

in the Touchstone theorem prover. CADE’00.

Of particular interest: Proof-generating congruence-

closure and simplex algorithms.

Resource Constraints

Bounds on certain resources can be enforced via counting.

In a Reference Intepreter: Maintain a global counter. Increment the count for each

instruction executed. Verify for each instruction that the

limit is not exceeded. Use the compiler to optimize away

the counting operations.

Ten Good Things About PCC

1. Someone else does all the really hard work.

2. The host system changes very little.

...

Logic as a lingua franca

CertifyingProver

CPU

Code

ProofProof

Engine

Logic as a lingua franca

CertifyingProver

CPU

ProofProof

Checker

Policy

VC

Code

Language/compiler/machine dependences isolated from the proof checker.

Expressed as predicates and derivations in a formal logic.

Logic as a lingua franca

CertifyingProver

CPU

…iaddiaload...

ProofProof

Checker

Policy

VC

Code can be in any language

once a Safety Policy is supplied.

Logic as a lingua franca

CertifyingProver

CPU

…addl %eax,%ebxtestl %ecx,%ecxjz NULLPTRmovl 4(%ecx),%edxcmpl %edx,%ebxjae ARRAYBNDSmovl 8(%ecx.%ebx.4).%edx...

ProofProof

Checker

Policy

VC

…addl %eax, %testl %ecx,%ejz NULLPTRmovl 4(%ecx),%cmpl %edx,%ebjae ARRAYBNDmovl 8(%ecx.

Adequacy of dynamic checksand “wrappers” can be verified.

Logic as a lingua franca

CertifyingProver

CPU

…add %eax,%ebxmovl 8(%ecx,%ebx,4)...

ProofProof

Checker

Policy

VC

Safety of optimized codecan be verified.

Ten Good Things About PCC

3. You choose the language.

4. Optimized (“unsafe”) code is OK.

5. Verifies that your optimizer and dynamic checks are OK.

The Role ofProgramming Languages

Civilized programming languages can provide “safety for free”.

Well-formed/well-typed safe.

Idea: Arrange for the compiler to “explain” why the target code it generates preserves the safety properties of the source program.

Certifying Compilers[Necula & Lee, PLDI’98]

Intuition: Compiler “knows” why each translation

step is semantics-preserving. So, have it generate a proof that safety

is preserved. “Small theorems about big programs.”

Don’t try to verify the whole compiler, but only each output it generates.

Automation viaCertifying Compilation

CertifyingCompiler

CPU

ProofChecker

Policy

VC

Sourcecode

Proof

Objectcode

Looks and smells like a compiler.

% spjc foo.java bar.class baz.c -ljdk1.2.2

Ten Good Things About PCC

6. Can sometimes be easy-to-use.

7. You can still be a “hero theorem hacker” if you want.

...

Ten Good Things About PCC

8. Proofs are a “semantic checksum”.

9. Possibility for richer safety policies.

10. Co-exists peacefully with crypto.

Acknowledgments

George Necula.

Robert Harper and Frank Pfenning.

Mark Plesko, Michael Donohue, and Guy Bialostocki.

top related