Languages of the future: mega the 701 st programming language Tim Sheard Portland State University (formerly from OGI/OHSU)

Languages of the future:mega the 701st programming language

Tim Sheard

Portland State University(formerly from OGI/OHSU)

What’s wrong with today’s languages?

• The semantic gap– What does the programmer know about the

program? How is this expressed?

• The temporal gap– Systems are “configured” with new knowledge

at many different times – compile-time, link-time, run-time. How is this expressed?

What will languages of the future be like?

• Support reasoning about a program from within the programming language.

• Within the reach of most programmers – No Ph.D. required.

• Support all of today’s capabilities but organize them in different ways. – Separate powerful but risky features from the rest of

the program, spell out obligations needed to control the risk, ensure that obligations are met.

– Provide a flexible hierarchy of temporal stages. Track important attributes across stages.

How do we get there?

• In small steps, I’m afraid . . .

• Two small contributions– Putting the Curry-Howard isomorphism to

work for regular programmers– Exploiting staged computation

• In this talk, I’ll only talk about the first one

Step 1- Putting Curry-Howard to work

• Programming by manipulating proofs of important semantic properties– What is a proof?– How do we exploit proofs?

• is a new point in the design space somewhere between a– Programming language– A logic

Isabelle Coq

Elf NuPurl

Alfa

Haskell

Python O’Caml

Pascal Java

C++ C

We need something in between to two extremes!

DimensionsFormal methods systems

– Have too few formal systems users. We can’t solve the worlds problems with a handful of users. And, for the most part, the users are “thinkers” not “hackers”

– The systems themselves are used to reason about systems, but aren’t designed to execute programs. For the most part, they don’t have rich libraries, I/O etc.

– Have a steep learning curve. “It takes a Ph.D. to learn to effectively use these tools.”

Steps between the “concrete” and the “clouds”

• Train more users to use formal systems, or add formal features to lower level languages so existing programmers can use formal methods.

• Design practical extensions for formal systems and build robust compilers for them, or add formal extensions to practical languages.

Isabelle Coq

Elf NuPurl

Alpha

Python

O’Caml

Pascal Java

C++

C

Haskell

Curry Howard

• Types are properties• Programs are proofs

– A program with type T witness that there exists a program with type T.

• If all we have is simple types – like Int or (Bool,String) or [Tree Bool], then the properties are too simple to think of them as very useful proofs.

What is a proof?

3

Am I odd or even?

3 is odd, if

2 is even, if

1 is odd, if

0 is even

Requirements for a legal proof

•Even is always stacked above odd

•Odd is always stacked below even

•The numeral decreases by one in each stack

•Every stack ends with 0

3 is odd

2 is even

1 is odd

0 is even

3 – 1 = 2

2 – 1 = 1

1 – 1 = 0

Algebraic Datatypes

• Inductively formed structured data– Generalizes enumerations, records & tagged variants

• data Color = Red | Blue | Green• data Address = A Number Street Town Province MailCode• data Person = Teacher [Class] | Student Major

• Types are used to prevent the construction of ill-formed data.

• Pattern matching allows abstract high level (yet still efficient) access

ADT’s provide an abstract interface to heap data.

• Data Tree a

= Fork (Tree a) (Tree a)

| Node a

| Tip

• Fork :: Tree a -> Tree a -> Tree a• Node :: a -> Tree a• Tip :: Tree a

Sum :: Tree Int -> Int

Sum Tip = 0

Sum (Node x) = x

Sum (Fork m n) = sum m + sum n

Functions defined with pattern

matching

Note the “data” declarationintroduces values and functions that construct instances of the new type.

We can define parametric

polymorphic data

Inductivley defined data allows structures of

unbounded size

Fork (Fork (Node 5) Tip) Tip

Fork

Fork

NodeTip

Tip

5

ADT Type Restrictions

• Data Tree a

= Fork (Tree a) (Tree a)

| Node a

| Tip

• Fork :: Tree a -> Tree a -> Tree a• Node :: a -> Tree a• Tip :: Tree a Restriction: the range

of every constructor matches exactly the type being defined

Integer Indexed Type-Constructors

Z:: Even 0

E:: Odd m -> Even (m+1)

O:: Even m -> Odd (m+1)

O(E (O Z))

:: Odd (1+1+1+0)

O(E(O z)) :: Odd 3

E(O Z):: Even 2

O Z :: Odd 1

Z :: Even 0

Note Even and Odd are type constructors indexed by integers

Generalized Algebraic Data Structures

• Like ADT

• Remove the range-type restriction

• Allow type constructors to be indexed by things other than normal types.

The “kind” decl introduces new “types”• Allow algebraic definitions to define new “kinds” as well as new

“data types”• Example of new type

data List a = Nil | Cons a (List a)•Nil and Cons are new values.• They are classified by type List•Nil :: [a]•Cons :: a -> List a -> List a

• Example of new kindkind Nat = Zero | Succ Nat

•Zero and Succ are new types.• They are classified by the kind Nat•Zero :: Nat•Succ :: Nat ~> Nat•Succ Zero :: Nat

5

Int

*

*1

[5]

[ Int ]

*

[ ]

* ~> *

Succ

Nat ~> Nat

Zero

Nat

*2A hierarchy of values, types, kinds, sorts, …

values

types

kinds

sorts

GADT in mega

kind Nat = Zero | Succ Nat

data Even n = Z where n = Zero | ex m . E(Odd m) where n = Succ m

data Odd n = ex m . O(Even m) where n = Succ m

Even and Odd are proofs

constructors

Zero and Succ encode the

natural numbers at the type level

Z:: Even Zero

E:: Odd m -> Even (Succ m)

O:: Even m -> Odd (Succ m)

• Note the different ranges in Z, E and O

• The types encode enforce the well formedness.

O(E(O z)) :: Odd 3

E(O Z):: Even 2

O Z :: Odd 1

Z :: Even 0

Removing the restriction allows indexed types

• The parameter of a type constructor (e.g. the “a” in “T a”) says something about the values with type “T a”– phantom types– indexed types

• Consider an expression language:

data Exp = Eint Int | Ebool Bool | Eplus Exp Exp | Eless Exp Exp | Eif Exp Exp Exp | Ex –- Int variable

| Eb –- Bool variable

But, what about terms like:(Eif (Eint 3) (Eint 0) (Eint 9))

If b then 3 else x+1

(Eif Eb (Eint 3) (Eplus Ex (Eint 1))

Imagine a type-indexed Term datatype

Int :: Int -> Term Int

Bool :: Bool -> Term Bool

Plus :: Term Int -> Term Int -> Term Int

Less :: Term Int -> Term Int -> Term Bool

If :: Term Bool -> Term a -> Term a -> Term a

X :: Term Int

B :: Term Bool

Note the different range

types!

Type-indexed Data

• Benefits – The type system disallows ill-formed Terms

like: (If (Int 3) (Int 0) (Int 9))

– Documentation– With the right types, such objects act like

proofs

Why is (Term a) like a proof?

• A value “x” of type “Term a” is like a judgment

Γ ├ x : aThe type systems ensures that only

valid judgments can be constructed. Having a value of type “Term a” guarantees (i.e. is a proof of) that the term is well typed.

If b then 3 else x+1

(If B (Int 3) (Plus X (Int 1))

Γ ├ if b then 3 else x+1 : Int

Γ ├ b:Bool Γ ├ 3:Int Γ ├ x+1:Int

Γ ├ 1:Int Γ ├ x:IntΓ b = Bool

Γ x = Int

Type-indexed Termsdata Term a

= Int Int where a=Int | Bool Bool where a=Bool | Plus (Term Int) (Term Int) where a=Int | Less (Term Int) (Term Int) where a=Bool | If (Term Bool) (Term a) (Term a) | X where a = Int | B where a = Bool

Int :: forall a.(a=Int) => Int -> Term a

We can specialize this kind of type to the ones we want

Int :: Int -> Term IntBool :: Bool -> Term BoolPlus :: Term Int -> Term Int -> Term IntLess :: Term Int -> Term Int -> Term BoolIf :: Term Bool -> Term a -> Term a -> Term aX :: Term IntB :: Term Bool

Problem – Type Checking

How do we type pattern matching?

case x of

(Int n)::Term Int -> . . .

(Bool b)::Term Bool -> . . .

What type is x?Is it Term Int

Or is it Term Bool

Obligations and Asumptions

Using a Constructor incurs an Obligation

(Int 3)::Term a{Show a=Int}(Bool true)::Term a{Show a=Bool}

Pattern matching allows the system to make some Assumptions

case x::Term a of (Int n)::Term Int ->{Assume a=Int}. . . (Bool b)::Term Bool ->{Assume a=Bool}. . .

data Term a = Int Int where a=Int | Bool Bool where a=Bool | . . .

Programming

eval :: Term a -> (Int,Bool) -> aeval (Int n) env = neval (Bool b) env = beval (Plus x y) env = eval x env + eval y enveval (Less x y) env = eval x env < eval y enveval (If x y z) env = if (eval x env) then (eval y env) else (eval z env)eval X (n,b) = neval B (n,b) = b

Type Checking

eval :: Term a ->(Int,Bool) -> a

eval (Less x y) env = {Assume a=Bool} eval x env < eval y env

Less::(a=Bool)=>Term Int -> Term Int -> Term Bool

x :: Term Int y :: Term Int (eval x) :: Int (eval y) :: Int (eval x < eval y) :: Bool

Assume a=Bool in this context

Basic approach

• Data is a parameterized generalized-algebraic datatype

• It is indexed by some semantic property• New Kinds introduce new types that are used as

indexes• Programs use types to maintain semantic

properties• We construct values that are proofs of these

properties• The equality constrained types make it possible

Constructing proofs at runtime

• Suppose we want to read a string from the user, and interpret that string as an expression.

• What if the user types in an expression of the wrong type?

• Build a proof that the term is well typed for the context in which we use it

test :: IO ()test = do { text <- readln ; exp::Exp <- parse text ; case typCheck exp of Pair Rint x -> print (show (eval x + 2)) Pair Rbool y -> if (eval y) then print “True” else print “False" Fail -> error "Ill typed term" }

data Exp = Eint Int | Ebool Bool | Eplus Exp Exp | Eless Exp Exp | Eif Exp Exp Exp | Ex | Eb

A dynamic test of a static property!

Representation Types

data Rep t = Rint where t=Int | Rbool where t=Bool

• “Rep” is a representation type. It is a normal first class value (at run-time) that represents a static (compile-time) type.

• There is a 1-1 correspondence between Rint and Int, and Rbool and Bool. If x:: Rep t then – knowing the shape of x determines its type, – knowing its type determines its shape.– One can’t overemphasize the importance of this!

Untyped Terms and Judgments

data Exp = Eint Int | Ebool Bool | Eplus Exp Exp | Eless Exp Exp | Eif Exp Exp Exp | Ex | Eb

data Judgment = Fail | exists t . Pair (Rep t) (Term t)

Constructing a Proof

typCheck :: Exp -> Judgment

typCheck (Eint n) = Pair Rint (Int n)typCheck (Ebool b) = Pair Rbool (Bool b)typCheck Ex = Pair Rint XtypCheck Eb = Pair Rbool BtypCheck (Eplus x y) = case (typCheck x, typCheck y) of (Pair Rint a, Pair Rint b) -> Pair Rint (Plus a b) _ -> Fail

More cases …typCheck (Eless x y) = case (typCheck x, typCheck y) of (Pair Rint a, Pair Rint b) -> Pair Rbool (Less a b) _ -> Fail typCheck (Eif x y z) = case (typCheck x, typCheck y, typCheck z) of (Pair Rbool a, Pair Rint b, Pair Rint c) -> Pair Rint (If a b c) (Pair Rbool a, Pair Rbool b, Pair Rbool c) -> Pair Rbool (If a b c) _ -> Fail

Our Original Goals

• Build heterogeneous meta-programming systems– Meta-language ≠ object-language

• Type system of the meta-language guarantees semantic properties of object-language

• Experiment with Omega– Finding new uses for the power of the type system– Translating existing language-based ideas into Omega

• staged interpreters• proof carrying code• language-based security

Serendipity

mega’s type system is good for statically guaranteeing all sorts of properties.– Lists with statically known length– Red–Black Trees– Binomial Heaps– Dynamic Typing– Proof Carrying Code

Conclusion

• Stating static properties is a good way to think about programming

• It may lead to more reliable programs• The compiler should ensure that programs

maintain the stated properties• Generalizing algebraic datatypes make it all

possible– Ranges other than “T a”– “a” becomes an index describing a static property of

x::T a– New kinds let “a” have arbitrary structure– Computing over “a” is sometimes necessary

Contributions

• “Logical Framework” ideas translated into everyday programming idioms.

• Manipulating strongly-typed object languages in a semantics-preserving manner.

• Implementation of Cheney and Hinze’s equality qualified types in a functional programming language.

• Use of new kinds to build new kinds of index sets.• Representation (or Singleton) Types as a way to

seamlessly switch between static and dynamic typing.• Demonstration

– Show some practical techniques– Lots of examples

• Resource: www.cs.pdx.edu/~sheard– Including Emir Pasalic’s Thesis.

Related Work• Logical Frameworks: LF – Bob Harper et. Al• Refinement types – Frank Pfenning• Inductive Families

– In type theory -- Peter Dybjer – Epigram -- Zhaohui Luo, James McKinna, Paul Callaghan, and Conor McBride

• First-class phantom types -- Cheney and Hinze • Guarded Recursive Data Types

– Hong Wei Xi and his students • Guarded Recursive Datatype Constructors • A Typeful Approach to Object-Oriented Programming with Multiple Inheritance • Meta-Programming through Typeful Code Representation

– Constraint-based type inference for guarded algebraic data types -- Vincent Simonet and François Pottier

– A Systematic Translation of Guarded Recursive Data Types to Existential Types -- Martin Sulzmann

– Polymorphic typed defunctionalization -- Pottier and Gauthier. – Towards efficient, typed LR parsers -- Pottier and Régis-Gianas.

• First Class Type Equality – A Lightweight Implementation of Generics and Dynamics -- Hinze and Cheney – Typing Dynamic Typing -- Baars and Swierstra – Type-safe cast: Functional pearl -- Wierich

• Rogue-Sigma-Pi as a meta-language for LF -- Aaron Stump. • Wobbly types: type inference for generalised algebraic data types -- Peyton

Jones, Washburn and Weirich • Cayenne - A Language with Dependent Types -- Lennart Augustsson

http://www.cs.chalmers.se/~peterd/papers/inductive.html

http://www.dur.ac.uk/CARG/epigram.html

Step 2 – Using Staging

• Suppose you are writing a document retrieval system.

• The user types in a query, and you want to retrieve all documents that meet the query.

• The query contains information not known until run-time, but which is constant across all accesses in the document base.

• E.g. Width – Indent < Depth && Keyword == “Naval”

Width – Indent < Depth && Keyword == “Naval”

• If Width and Indent are constant across all queries, But Depth and Keyword are fields of each document

• How can we efficiently build an execution engine that translates the users query (typed as a String) into executable code?

Code in Omegaprompt> [| 5 + 5 |][| 5 + 5 |] : Code Int

prompt> run [| 5 + 5 |]10 : Int

prompt> let x = [| 23 |]X

prompt> let y = [| 56 - $x |]Y

prompt> y[| 56 - 23 |] : Code Int

Dynamic values

data Dyn x = Dint Int where x = Int | Dbool Bool where x = Bool | Dyn (Code x)

dynamize :: Dyn a -> Code adynamize (Dint n) = lift ndynamize (Dbool b) = lift bdynamize (Dyn x) = x

translationtrans :: Term a -> (Dyn Int,Dyn Int) -> Dyn atrans (Int n) (x,y) = Dint ntrans (Bool b) (x,y) = Dbool btrans X (x,y) = xtrans Y (x,y) = ytrans (Plus a b) xy = case (trans a xy, trans b xy) of (Dint m,Dint n) -> Dint(m+n) (m,n) -> Dyn [| $(dynamize m) + $(dynamize n) |]trans (If a b c) xy = case trans a xy of (Dbool test) -> if test then trans b xy else trans c xy (Dyn test) -> Dyn[| if $test then $(dynamize (trans b xy)) else $(dynamize (trans c xy)) |]

Applying the translation

-- if 3 < 5 then (x + (5 + 2)) else yx1 = If (Less (Int 3) (Int 5)) (Plus X (Plus (Int 5) (Int 2))) Y

w term = [| \ x y -> $(dynamize(trans term (Dyn [| x |],Dyn [| y |]))) |] -- w x1-- [| \ x y -> x + 7 |] : Code (Int -> Int -> Int)

Examples we have done

• Typed, staged interpreters– For languages with binding, with patterns, algebraic datatypes

• Type preserving transformations– Simplify :: Exp t -> Exp t– Cps:: Exp t -> Exp {trans t}

• Proof carrying code• Data Structures

– Red-Black trees, Binomial Heaps , Static length lists• Languages with security properties• Typed self-describing databases, where meta data in the

database describes the database schema• Programs that slip easily between dynamic and statically

typed sections. Type-case is easy to encode with no additional mechanism

Some other examples

• Typed Lambda Calculus

• A Language with Security Domains

• A Language which enforces an interaction protocol

Typed lambda CalculusExp with type t in environment s

data V s t = ex m . Z where s = (t,m) | ex m x . S (V m t) where s = (x,m) data Exp s t = IntC Int where t = Int | BoolC Bool where t = Bool | Plus (Exp s Int) (Exp s Int) where t = Int | Lteq (Exp s Int) (Exp s Int) where t = Bool | Var (V s t)

Example Type:

Plus :: forall s t . (t=Int) => Exp s Int -> Exp s Int -> Exp s t

Language with Security DomainsExp with type t in env s in domain d

kind Domain = High | Low

data D t = Lo where t = Low | Hi where t = High

data Dless x y = LH where x = Low , y = High | LL where x = Low, y = Low | HH where x = High, y = High data Exp s d t = Int Int where t = Int | Bool Bool where t = Bool | Plus (Exp s d Int) (Exp s d Int) where t = Int | Lteq (Exp s d Int) (Exp s d Int) where t = Bool | forall d2 . Var (V s d2 t) (Dless d2 d)

Language with interaction prototcolCommand with store St starting in state x,

ending in state y

kind State = Open | Closed

data V s t = forall st . Z where s = (t,st) | forall st t1 . S (V st t) where s = (t1,st)

data Com st x y = forall t . Set (V st t) (Exp st t) where x=y | forall a . Seq (Com st x a) (Com st a y) | If (Exp st Bool) (Com st x y) (Com st x y) | While (Exp st Bool) (Com st x y) where x = y | forall t . Declare (Exp st t) (Com (t,st) x y) | Open where x = Closed, y = Open | Close where x = Open, y = Closed | Write (Exp st Int) where x = Open, y = Open

Closed Open

open

close

write

Languages of the future: mega the 701 st programming language Tim Sheard Portland State University (formerly from OGI/OHSU)

Documents

Languages of the future: mega the 701 st programming language Tim Sheard Portland State University (formerly from OGI/OHSU)