Languages of the future:mega the 701st programming language
Tim Sheard
Portland State University(formerly from OGI/OHSU)
What’s wrong with today’s languages?
• The semantic gap– What does the programmer know about the
program? How is this expressed?
• The temporal gap– Systems are “configured” with new knowledge
at many different times – compile-time, link-time, run-time. How is this expressed?
What will languages of the future be like?
• Support reasoning about a program from within the programming language.
• Within the reach of most programmers – No Ph.D. required.
• Support all of today’s capabilities but organize them in different ways. – Separate powerful but risky features from the rest of
the program, spell out obligations needed to control the risk, ensure that obligations are met.
– Provide a flexible hierarchy of temporal stages. Track important attributes across stages.
How do we get there?
• In small steps, I’m afraid . . .
• Two small contributions– Putting the Curry-Howard isomorphism to
work for regular programmers– Exploiting staged computation
• In this talk, I’ll only talk about the first one
Step 1- Putting Curry-Howard to work
• Programming by manipulating proofs of important semantic properties– What is a proof?– How do we exploit proofs?
• is a new point in the design space somewhere between a– Programming language– A logic
Isabelle Coq
Elf NuPurl
Alfa
Haskell
Python O’Caml
Pascal Java
C++ C
We need something in between to two extremes!
DimensionsFormal methods systems
– Have too few formal systems users. We can’t solve the worlds problems with a handful of users. And, for the most part, the users are “thinkers” not “hackers”
– The systems themselves are used to reason about systems, but aren’t designed to execute programs. For the most part, they don’t have rich libraries, I/O etc.
– Have a steep learning curve. “It takes a Ph.D. to learn to effectively use these tools.”
Steps between the “concrete” and the “clouds”
• Train more users to use formal systems, or add formal features to lower level languages so existing programmers can use formal methods.
• Design practical extensions for formal systems and build robust compilers for them, or add formal extensions to practical languages.
Isabelle Coq
Elf NuPurl
Alpha
Python
O’Caml
Pascal Java
C++
C
Haskell
Curry Howard
• Types are properties• Programs are proofs
– A program with type T witness that there exists a program with type T.
• If all we have is simple types – like Int or (Bool,String) or [Tree Bool], then the properties are too simple to think of them as very useful proofs.
What is a proof?
3
Am I odd or even?
3 is odd, if
2 is even, if
1 is odd, if
0 is even
Requirements for a legal proof
•Even is always stacked above odd
•Odd is always stacked below even
•The numeral decreases by one in each stack
•Every stack ends with 0
3 is odd
2 is even
1 is odd
0 is even
3 – 1 = 2
2 – 1 = 1
1 – 1 = 0
Algebraic Datatypes
• Inductively formed structured data– Generalizes enumerations, records & tagged variants
• data Color = Red | Blue | Green• data Address = A Number Street Town Province MailCode• data Person = Teacher [Class] | Student Major
• Types are used to prevent the construction of ill-formed data.
• Pattern matching allows abstract high level (yet still efficient) access
ADT’s provide an abstract interface to heap data.
• Data Tree a
= Fork (Tree a) (Tree a)
| Node a
| Tip
• Fork :: Tree a -> Tree a -> Tree a• Node :: a -> Tree a• Tip :: Tree a
Sum :: Tree Int -> Int
Sum Tip = 0
Sum (Node x) = x
Sum (Fork m n) = sum m + sum n
Functions defined with pattern
matching
Note the “data” declarationintroduces values and functions that construct instances of the new type.
We can define parametric
polymorphic data
Inductivley defined data allows structures of
unbounded size
Fork (Fork (Node 5) Tip) Tip
Fork
Fork
NodeTip
Tip
5
ADT Type Restrictions
• Data Tree a
= Fork (Tree a) (Tree a)
| Node a
| Tip
• Fork :: Tree a -> Tree a -> Tree a• Node :: a -> Tree a• Tip :: Tree a Restriction: the range
of every constructor matches exactly the type being defined
Integer Indexed Type-Constructors
Z:: Even 0
E:: Odd m -> Even (m+1)
O:: Even m -> Odd (m+1)
O(E (O Z))
:: Odd (1+1+1+0)
O(E(O z)) :: Odd 3
E(O Z):: Even 2
O Z :: Odd 1
Z :: Even 0
Note Even and Odd are type constructors indexed by integers
Generalized Algebraic Data Structures
• Like ADT
• Remove the range-type restriction
• Allow type constructors to be indexed by things other than normal types.
The “kind” decl introduces new “types”• Allow algebraic definitions to define new “kinds” as well as new
“data types”• Example of new type
data List a = Nil | Cons a (List a)•Nil and Cons are new values.• They are classified by type List•Nil :: [a]•Cons :: a -> List a -> List a
• Example of new kindkind Nat = Zero | Succ Nat
•Zero and Succ are new types.• They are classified by the kind Nat•Zero :: Nat•Succ :: Nat ~> Nat•Succ Zero :: Nat
5
Int
*
*1
[5]
[ Int ]
*
[ ]
* ~> *
Succ
Nat ~> Nat
Zero
Nat
*2A hierarchy of values, types, kinds, sorts, …
values
types
kinds
sorts
GADT in mega
kind Nat = Zero | Succ Nat
data Even n = Z where n = Zero | ex m . E(Odd m) where n = Succ m
data Odd n = ex m . O(Even m) where n = Succ m
Even and Odd are proofs
constructors
Zero and Succ encode the
natural numbers at the type level
Z:: Even Zero
E:: Odd m -> Even (Succ m)
O:: Even m -> Odd (Succ m)
• Note the different ranges in Z, E and O
• The types encode enforce the well formedness.
O(E(O z)) :: Odd 3
E(O Z):: Even 2
O Z :: Odd 1
Z :: Even 0
Removing the restriction allows indexed types
• The parameter of a type constructor (e.g. the “a” in “T a”) says something about the values with type “T a”– phantom types– indexed types
• Consider an expression language:
data Exp = Eint Int | Ebool Bool | Eplus Exp Exp | Eless Exp Exp | Eif Exp Exp Exp | Ex –- Int variable
| Eb –- Bool variable
But, what about terms like:(Eif (Eint 3) (Eint 0) (Eint 9))
If b then 3 else x+1
(Eif Eb (Eint 3) (Eplus Ex (Eint 1))
Imagine a type-indexed Term datatype
Int :: Int -> Term Int
Bool :: Bool -> Term Bool
Plus :: Term Int -> Term Int -> Term Int
Less :: Term Int -> Term Int -> Term Bool
If :: Term Bool -> Term a -> Term a -> Term a
X :: Term Int
B :: Term Bool
Note the different range
types!
Type-indexed Data
• Benefits – The type system disallows ill-formed Terms
like: (If (Int 3) (Int 0) (Int 9))
– Documentation– With the right types, such objects act like
proofs
Why is (Term a) like a proof?
• A value “x” of type “Term a” is like a judgment
Γ ├ x : aThe type systems ensures that only
valid judgments can be constructed. Having a value of type “Term a” guarantees (i.e. is a proof of) that the term is well typed.
If b then 3 else x+1
(If B (Int 3) (Plus X (Int 1))
Γ ├ if b then 3 else x+1 : Int
Γ ├ b:Bool Γ ├ 3:Int Γ ├ x+1:Int
Γ ├ 1:Int Γ ├ x:IntΓ b = Bool
Γ x = Int
Type-indexed Termsdata Term a
= Int Int where a=Int | Bool Bool where a=Bool | Plus (Term Int) (Term Int) where a=Int | Less (Term Int) (Term Int) where a=Bool | If (Term Bool) (Term a) (Term a) | X where a = Int | B where a = Bool
Int :: forall a.(a=Int) => Int -> Term a
We can specialize this kind of type to the ones we want
Int :: Int -> Term IntBool :: Bool -> Term BoolPlus :: Term Int -> Term Int -> Term IntLess :: Term Int -> Term Int -> Term BoolIf :: Term Bool -> Term a -> Term a -> Term aX :: Term IntB :: Term Bool
Problem – Type Checking
How do we type pattern matching?
case x of
(Int n)::Term Int -> . . .
(Bool b)::Term Bool -> . . .
What type is x?Is it Term Int
Or is it Term Bool
Obligations and Asumptions
Using a Constructor incurs an Obligation
(Int 3)::Term a{Show a=Int}(Bool true)::Term a{Show a=Bool}
Pattern matching allows the system to make some Assumptions
case x::Term a of (Int n)::Term Int ->{Assume a=Int}. . . (Bool b)::Term Bool ->{Assume a=Bool}. . .
data Term a = Int Int where a=Int | Bool Bool where a=Bool | . . .
Programming
eval :: Term a -> (Int,Bool) -> aeval (Int n) env = neval (Bool b) env = beval (Plus x y) env = eval x env + eval y enveval (Less x y) env = eval x env < eval y enveval (If x y z) env = if (eval x env) then (eval y env) else (eval z env)eval X (n,b) = neval B (n,b) = b
Type Checking
eval :: Term a ->(Int,Bool) -> a
eval (Less x y) env = {Assume a=Bool} eval x env < eval y env
Less::(a=Bool)=>Term Int -> Term Int -> Term Bool
x :: Term Int y :: Term Int (eval x) :: Int (eval y) :: Int (eval x < eval y) :: Bool
Assume a=Bool in this context
Basic approach
• Data is a parameterized generalized-algebraic datatype
• It is indexed by some semantic property• New Kinds introduce new types that are used as
indexes• Programs use types to maintain semantic
properties• We construct values that are proofs of these
properties• The equality constrained types make it possible
Constructing proofs at runtime
• Suppose we want to read a string from the user, and interpret that string as an expression.
• What if the user types in an expression of the wrong type?
• Build a proof that the term is well typed for the context in which we use it
test :: IO ()test = do { text <- readln ; exp::Exp <- parse text ; case typCheck exp of Pair Rint x -> print (show (eval x + 2)) Pair Rbool y -> if (eval y) then print “True” else print “False" Fail -> error "Ill typed term" }
data Exp = Eint Int | Ebool Bool | Eplus Exp Exp | Eless Exp Exp | Eif Exp Exp Exp | Ex | Eb
A dynamic test of a static property!
Representation Types
data Rep t = Rint where t=Int | Rbool where t=Bool
• “Rep” is a representation type. It is a normal first class value (at run-time) that represents a static (compile-time) type.
• There is a 1-1 correspondence between Rint and Int, and Rbool and Bool. If x:: Rep t then – knowing the shape of x determines its type, – knowing its type determines its shape.– One can’t overemphasize the importance of this!
Untyped Terms and Judgments
data Exp = Eint Int | Ebool Bool | Eplus Exp Exp | Eless Exp Exp | Eif Exp Exp Exp | Ex | Eb
data Judgment = Fail | exists t . Pair (Rep t) (Term t)
Constructing a Proof
typCheck :: Exp -> Judgment
typCheck (Eint n) = Pair Rint (Int n)typCheck (Ebool b) = Pair Rbool (Bool b)typCheck Ex = Pair Rint XtypCheck Eb = Pair Rbool BtypCheck (Eplus x y) = case (typCheck x, typCheck y) of (Pair Rint a, Pair Rint b) -> Pair Rint (Plus a b) _ -> Fail
More cases …typCheck (Eless x y) = case (typCheck x, typCheck y) of (Pair Rint a, Pair Rint b) -> Pair Rbool (Less a b) _ -> Fail typCheck (Eif x y z) = case (typCheck x, typCheck y, typCheck z) of (Pair Rbool a, Pair Rint b, Pair Rint c) -> Pair Rint (If a b c) (Pair Rbool a, Pair Rbool b, Pair Rbool c) -> Pair Rbool (If a b c) _ -> Fail
Our Original Goals
• Build heterogeneous meta-programming systems– Meta-language ≠ object-language
• Type system of the meta-language guarantees semantic properties of object-language
• Experiment with Omega– Finding new uses for the power of the type system– Translating existing language-based ideas into Omega
• staged interpreters• proof carrying code• language-based security
Serendipity
mega’s type system is good for statically guaranteeing all sorts of properties.– Lists with statically known length– Red–Black Trees– Binomial Heaps– Dynamic Typing– Proof Carrying Code
Conclusion
• Stating static properties is a good way to think about programming
• It may lead to more reliable programs• The compiler should ensure that programs
maintain the stated properties• Generalizing algebraic datatypes make it all
possible– Ranges other than “T a”– “a” becomes an index describing a static property of
x::T a– New kinds let “a” have arbitrary structure– Computing over “a” is sometimes necessary
Contributions
• “Logical Framework” ideas translated into everyday programming idioms.
• Manipulating strongly-typed object languages in a semantics-preserving manner.
• Implementation of Cheney and Hinze’s equality qualified types in a functional programming language.
• Use of new kinds to build new kinds of index sets.• Representation (or Singleton) Types as a way to
seamlessly switch between static and dynamic typing.• Demonstration
– Show some practical techniques– Lots of examples
• Resource: www.cs.pdx.edu/~sheard– Including Emir Pasalic’s Thesis.
Related Work• Logical Frameworks: LF – Bob Harper et. Al• Refinement types – Frank Pfenning• Inductive Families
– In type theory -- Peter Dybjer – Epigram -- Zhaohui Luo, James McKinna, Paul Callaghan, and Conor McBride
• First-class phantom types -- Cheney and Hinze • Guarded Recursive Data Types
– Hong Wei Xi and his students • Guarded Recursive Datatype Constructors • A Typeful Approach to Object-Oriented Programming with Multiple Inheritance • Meta-Programming through Typeful Code Representation
– Constraint-based type inference for guarded algebraic data types -- Vincent Simonet and François Pottier
– A Systematic Translation of Guarded Recursive Data Types to Existential Types -- Martin Sulzmann
– Polymorphic typed defunctionalization -- Pottier and Gauthier. – Towards efficient, typed LR parsers -- Pottier and Régis-Gianas.
• First Class Type Equality – A Lightweight Implementation of Generics and Dynamics -- Hinze and Cheney – Typing Dynamic Typing -- Baars and Swierstra – Type-safe cast: Functional pearl -- Wierich
• Rogue-Sigma-Pi as a meta-language for LF -- Aaron Stump. • Wobbly types: type inference for generalised algebraic data types -- Peyton
Jones, Washburn and Weirich • Cayenne - A Language with Dependent Types -- Lennart Augustsson
Step 2 – Using Staging
• Suppose you are writing a document retrieval system.
• The user types in a query, and you want to retrieve all documents that meet the query.
• The query contains information not known until run-time, but which is constant across all accesses in the document base.
• E.g. Width – Indent < Depth && Keyword == “Naval”
Width – Indent < Depth && Keyword == “Naval”
• If Width and Indent are constant across all queries, But Depth and Keyword are fields of each document
• How can we efficiently build an execution engine that translates the users query (typed as a String) into executable code?
Code in Omegaprompt> [| 5 + 5 |][| 5 + 5 |] : Code Int
prompt> run [| 5 + 5 |]10 : Int
prompt> let x = [| 23 |]X
prompt> let y = [| 56 - $x |]Y
prompt> y[| 56 - 23 |] : Code Int
Dynamic values
data Dyn x = Dint Int where x = Int | Dbool Bool where x = Bool | Dyn (Code x)
dynamize :: Dyn a -> Code adynamize (Dint n) = lift ndynamize (Dbool b) = lift bdynamize (Dyn x) = x
translationtrans :: Term a -> (Dyn Int,Dyn Int) -> Dyn atrans (Int n) (x,y) = Dint ntrans (Bool b) (x,y) = Dbool btrans X (x,y) = xtrans Y (x,y) = ytrans (Plus a b) xy = case (trans a xy, trans b xy) of (Dint m,Dint n) -> Dint(m+n) (m,n) -> Dyn [| $(dynamize m) + $(dynamize n) |]trans (If a b c) xy = case trans a xy of (Dbool test) -> if test then trans b xy else trans c xy (Dyn test) -> Dyn[| if $test then $(dynamize (trans b xy)) else $(dynamize (trans c xy)) |]
Applying the translation
-- if 3 < 5 then (x + (5 + 2)) else yx1 = If (Less (Int 3) (Int 5)) (Plus X (Plus (Int 5) (Int 2))) Y
w term = [| \ x y -> $(dynamize(trans term (Dyn [| x |],Dyn [| y |]))) |] -- w x1-- [| \ x y -> x + 7 |] : Code (Int -> Int -> Int)
Examples we have done
• Typed, staged interpreters– For languages with binding, with patterns, algebraic datatypes
• Type preserving transformations– Simplify :: Exp t -> Exp t– Cps:: Exp t -> Exp {trans t}
• Proof carrying code• Data Structures
– Red-Black trees, Binomial Heaps , Static length lists• Languages with security properties• Typed self-describing databases, where meta data in the
database describes the database schema• Programs that slip easily between dynamic and statically
typed sections. Type-case is easy to encode with no additional mechanism
Some other examples
• Typed Lambda Calculus
• A Language with Security Domains
• A Language which enforces an interaction protocol
Typed lambda CalculusExp with type t in environment s
data V s t = ex m . Z where s = (t,m) | ex m x . S (V m t) where s = (x,m) data Exp s t = IntC Int where t = Int | BoolC Bool where t = Bool | Plus (Exp s Int) (Exp s Int) where t = Int | Lteq (Exp s Int) (Exp s Int) where t = Bool | Var (V s t)
Example Type:
Plus :: forall s t . (t=Int) => Exp s Int -> Exp s Int -> Exp s t
Language with Security DomainsExp with type t in env s in domain d
kind Domain = High | Low
data D t = Lo where t = Low | Hi where t = High
data Dless x y = LH where x = Low , y = High | LL where x = Low, y = Low | HH where x = High, y = High data Exp s d t = Int Int where t = Int | Bool Bool where t = Bool | Plus (Exp s d Int) (Exp s d Int) where t = Int | Lteq (Exp s d Int) (Exp s d Int) where t = Bool | forall d2 . Var (V s d2 t) (Dless d2 d)
Language with interaction prototcolCommand with store St starting in state x,
ending in state y
kind State = Open | Closed
data V s t = forall st . Z where s = (t,st) | forall st t1 . S (V st t) where s = (t1,st)
data Com st x y = forall t . Set (V st t) (Exp st t) where x=y | forall a . Seq (Com st x a) (Com st a y) | If (Exp st Bool) (Com st x y) (Com st x y) | While (Exp st Bool) (Com st x y) where x = y | forall t . Declare (Exp st t) (Com (t,st) x y) | Open where x = Closed, y = Open | Close where x = Open, y = Closed | Write (Exp st Int) where x = Open, y = Open
Closed Open
open
close
write