Engineering Large Projects in a Functional Language Lessons from a Decade of Haskell at Galois Don Stewart | 2010-07-10 | DevNation PDX
Apr 10, 2015
Engineering Large Projectsin a Functional Language
Lessons from a Decade of Haskell at Galois
Don Stewart | 2010-07-10 | DevNation PDX
© 2010 Galois, Inc. All rights reserved.
This talk made possible by... Aaron Tomb
Adam Wick
Andy Adams-Moran
Andy Gill
David Burke
Dylan McNamee
Eric Mertens
Iavor Diatchki
Isaac Potoczny-Jones
Jef Bell
Peter White
Trevor Elliott
Phil Weaver
Jason Dagit
Jeff Lewis
Joe Hurd
Joel Stanley
John Launchbury
John Matthews
Jonathan Daugherty
Josh Hoyt
Laura McKinney
Ledah Casburn
Lee Pike
Levent Erkok
Louis Testa
Magnus Carlsson
Matt Sottile
Paul Heinlein
Rogan Creswick
Sally Browning
Sigbjorn Finne
Thomas Nordin
Brett Letner
… and many others
© 2010 Galois, Inc. All rights reserved.
What does Galois do?
Information assurance for critical systems
Building systems that are trustworthy and secure
Mixture of government and industry clients
R&D with our favorite tools:
• Formal methods
• Typed functional languages
• Languages, compilers, DSLs
Kernels, file systems, networks, servers, compilers, security, desktop apps, ...
Haskell for pretty much everything
© 2010 Galois, Inc. All rights reserved.
Haskell is ...
A purely functional language
Strongly statically typed
20 years old
Open source
Compiled and interpreted
Used in research, open source and industry
http://haskell.orghttp://haskell.org/platformhttp://hackage.haskell.org
© 2010 Galois, Inc. All rights reserved.
Yes. Haskell can do that.
Many 20 – 200k LOC Haskell projects
Oldest commercial projects over 10 years of development now (e.g. Cryptol)
Teams of 1 – 6 developers at a time
Much pair programming, whiteboards, code reviews
20 – 30 devs over longer project lifetime
Have built many tools and libraries to support Haskell development on this scale
Haskell essential to keeping clients happy with:
• Deadlines, performance(!), maintainability
© 2010 Galois, Inc. All rights reserved.
Themes
© 2010 Galois, Inc. All rights reserved.
Languages matter!
Writing correct software is difficult!
Programming languages vary wildly in how well they support robust, secure, safe coding practices
Languages and tools can aid or hinder our efforts:
• Type systems
• Purity
• Modularity / compositionality
• Abstraction support
• Tools: analyses, provers, model checking
• Buggy implementations
© 2010 Galois, Inc. All rights reserved.
Detect errors early!
Detecting problems before executing the program is critical
• Debugging is hard
• Debugging low level systems is harder
• Debugging low level critical systems is ...
Culture of error prevention
• “How could we rule out this class of errors?”
• “How could we be more precise?”
© 2010 Galois, Inc. All rights reserved.
The toolchain matters!
Can't build anything without a good tool chain
• Native code, optimizing compiler
• Libraries, libraries, libraries
• Debugging, tracing
• Profiling, inspection, runtime analysis
• Testing, analysis
• Need open, modifiable tools –Particularly when pushing the boundaries
(Haskell on bare metal..)
© 2010 Galois, Inc. All rights reserved.
Community matters!
Soup of ideas in a large, open research community:
• Rapid adoption of new ideas
Support, maintainance and help
• Can't build everything we need in-house!
Give back via:
• Workshops: CUFP, ICFP, Haskell Symposium
• Hackathons
• Industrial Haskell Group
• Open source code and infrastructure
• Teaching: papers, blogs, talks
© 2010 Galois, Inc. All rights reserved.
How Galois Uses Haskell
© 2010 Galois, Inc. All rights reserved.
1. The Type System
© 2010 Galois, Inc. All rights reserved.
© 2010 Galois, Inc. All rights reserved.
Types make our lives easier
Cheap way to verify properties• Cheaper than theorem proving
• More assurance than testing
• Saves debugging in hostile environments
Typical conversation:• Engineer A: “Spec says this must never happen”
• Engineer B: “Can we enforce that in the type system?”
© 2010 Galois, Inc. All rights reserved.
Kinds of things types enforce
Simple things:• Correct arguments to a function
• Function f does not touch the disk
• No null pointers
• Mixing up similar concepts:– Virtual / physical addresses
Serious things:• Information flow policies
• Correct component wiring and integration
© 2010 Galois, Inc. All rights reserved.
Recent experienceFirst demo of a new system
Six engineers
50k lines of code, in 5 components, developed over a number of months
Integrated, tested, demo'd in only a week, two months ahead of schedule, significantly above performance spec.
1 space leak, spotted and fixed on first day of testing via the heap profiler
2 bugs found (typos from spec)
© 2010 Galois, Inc. All rights reserved.
Purity is fundamental
Difficult to show safety without purity
Code should be pure by default
Makes large systems easier to glue:• Pure code is “safe” by default to call
Effects are “code smells”, and have to be treated carefully
The world has too many impure languages: don't add to that
© 2010 Galois, Inc. All rights reserved.
Types aren't enough thoughTypes aren't enough thoughTypes aren't enough though
Still not expressive enough for a lot of the properties we want to enforce
We care a lot about sizes in types• “Input must only be 128, 192 or 256 bits”
• “Type T should be represented with 7 bits”
© 2010 Galois, Inc. All rights reserved.
Other tools in the bag
Extended static analysis tools
Model checking• SAT, SMT, …
Theorem proving• Isabelle, Agda, Coq
How much assurance do you need?
© 2010 Galois, Inc. All rights reserved.
2. Abstractions
© 2010 Galois, Inc. All rights reserved.
Monads
Constantly rolling new monads• Captures critical facts about the execution environment in the
type
Directly encodes semantics we care about• “Computed keys are not visible outside the M component”
• “Function f has read-only access to memory”
© 2010 Galois, Inc. All rights reserved.
Algebraic Data Types
Every system is either an interpreter or a compiler• Abstract syntax trees are ubiquitous
• Represent processes symbolically, via ADTs, then evaluate them in a safe (monadic) context
• Precise, concise control over possible values
• But need precise representation control
© 2010 Galois, Inc. All rights reserved.
Laziness
Captures some concepts perfectly• “A stream of 4k packets from the wire”
Critical for control abstractions in DSLs
Useful for prototyping:• error “M.F.foo: not implemented”
© 2010 Galois, Inc. All rights reserved.
Laziness
Makes time and space reasoning harder!• Mostly harmless in practice• Stress testing tends to reveal retainers• Graphical profiling knocks it dead
Must be able to precisely enable/disable Be careful with exceptions and mutation whnf/rnf/! are your friends
© 2010 Galois, Inc. All rights reserved.
Type classes
We use type classes• Well defined interfaces between large components (sets of
modules)
• Natural code reuse
• Capture general concepts in a natural way
• Capture interface in a clear way
• Kick butt EDSLs (see Lennart's blog)
© 2010 Galois, Inc. All rights reserved.
Concurrency and Parallelism
forkIO rocks• Cheap, very fast, precise threads
MVars rock
STM rocks (safely composable locks!)
Result: not shy introducing concurrency when appropriate
© 2010 Galois, Inc. All rights reserved.
3. Foreign Function Interface
© 2010 Galois, Inc. All rights reserved.
Foreign Function Interface
The world is a messy place
A good FFI means we can always call someone else's code if necessary
Have to talk to weird bits of hardware and weird proof systems
ForeignPtr is great abstraction tool
Must have clear API into the runtime system (hot topic at the moment)
© 2010 Galois, Inc. All rights reserved.
4. Meta programming
© 2010 Galois, Inc. All rights reserved.
There's alway boilerplate
Abstractions get rid of a lot of repetitive code, but there's always something that's not automated
We use a little Template Haskell
Other generics:• Hinze-style generics
• SYB generics
Particular useful for generating instance code for marshalling
© 2010 Galois, Inc. All rights reserved.
5. Performance
© 2010 Galois, Inc. All rights reserved.
Fast enough for majority of things
Vast majority of code is fast enough• GHC -O2 -funbox-strict-fields
• Happy with 1 – 2x C for low level code
Last few drops get squeezed out:• Profiling
• Low level Haskell
• Cycle-level measurement
• EDSLs to generate better code
• Calling into C
© 2010 Galois, Inc. All rights reserved.
Performance
Really precise performance requires expertise
Libraries are helping reify “oral traditions” about optimization
Still a lack of clarity about performance techniques in the broader Haskell community though
© 2010 Galois, Inc. All rights reserved.
6. Debugging
© 2010 Galois, Inc. All rights reserved.
There are still bugs!
Testing• QuickCheck!!!
Heap profiling• “By type” profiling of the heap
GHC -fhpc• Great for finding exceptions
• Understanding what is executing
+RTS -stderr• Explain what GC, threads, memory is up to
© 2010 Galois, Inc. All rights reserved.
7. Documentation
© 2010 Galois, Inc. All rights reserved.
Generating supporting artifacts
Haddock is great for reference material• Helps capture design in the source
• Code + types becomes self documenting
Design documents can be partially extracted via:• The major data and type signatures
• graphmod
• cabalgraph
• HPC analysis
© 2010 Galois, Inc. All rights reserved.
8. Libraries
© 2010 Galois, Inc. All rights reserved.
Hackage Changed Everything
2200+ libraries created in 3 years. There's a library for everything, and often more than one...
Can sit back and let mtl / monadlib / haxml / hxt fight it out :)
Static linking → need BSD licensed code if we want to ship
Haskell Platform to answer QA questions
© 2010 Galois, Inc. All rights reserved.
9. Shipping code
© 2010 Galois, Inc. All rights reserved.
Cabal
I don't know how Haskell was possible before Cabal :)
Quickly adopted Cabal/cabal-install across projects
cabal-install:• Simple, clean integration of internal and external components
into packageable objects
© 2010 Galois, Inc. All rights reserved.
10. Conventions
© 2010 Galois, Inc. All rights reserved.
We try to ...
-Wall police
Consistent layout
No tabs
Import qualified Control.Exception
{-# LANGUAGE … #-}
Map exceptions into Either / Maybe
© 2010 Galois, Inc. All rights reserved.
We try to ...
deriving Show
Line/column for errors if you must throw
No global mutable state
Put type sigs in “when you're done” with the design
Use GHCi for rapid experimentation
Cabal by default.
Libraries by default
© 2010 Galois, Inc. All rights reserved.
11. Training
© 2010 Galois, Inc. All rights reserved.
Easy to find Haskell programmers
With a big open source community, its much easier to find Haskell programmers now
Many more applicants than jobs, often with significant experience from open source
We train on-site, and new resources like LYAH and RWH make this easier.
© 2010 Galois, Inc. All rights reserved.
12. Things that we still need
© 2010 Galois, Inc. All rights reserved.
More support for large scale programming
Enforcing conventions across the code
Data representation precision (emerging)
A serious refactoring tool (HaRe on Hackage!)
Vetted and audited libraries by experts (Haskell Platform)
Idioms for mapping design onto types/functions/classes/monads
Better capture your 100 module design!
© 2010 Galois, Inc. All rights reserved.