Programming Languages #devcon2013

Programming Languages

Telefónica Digital DevCon 2013

Iván Montes - @drslump

Disclaimer• Talk is focused on mainstream languages

• Many of the stuff we will talk about is already available

• I know your favourite language probably does something similar, or you know a hack to make it do it.

• Scripting languages do have a compiler, they just happen to include an evaluator too :-)

• C++ is purposely left out of any comparison (Guys you need to simplify the spec!)

• If you don’t understand some terminology, ask!

What is the best?• Good news! You’re probably using it already

• Check list:

• You can focus on solving your business problem

• You are proficient with it

• You can deploy your product without fear

• You have tooling available (editors, testing, packaging…)

• You are not the only soul who knows how it works :-)

But…

Can it be improved?• Given:

• Expected functionality is ever growing

• Solutions complexity is ever growing

• When:

• We add layers and layers to keep it under control

• Those layers are usually libraries and frameworks

• Then:

• Your really nice language is not that important anymore

• You end up programming for the framework (does it pass the check list?)

The problem• We can separate a language in:

• Specification, Compiler, Runtime and Libraries

• Of those we are only in control of the Libraries

• So we abuse them to solve our problems

• But we can end up creating others:

• Support

• Versioning

• Troubleshooting

What’s wrong with Libraries?• They are meant for code reuse

• We (ab)use them as language customisation and extensibility tools

• The cycle of doom then begins:

• Our cool language toolchain breaks

• We expend time modifying editors, build tools, …

• The library gets updated or we want to support a new tool

• Cycle ends with lots of abandonware even if you don’t notice it:

• Check SourceForge, GitHub and similar sites

• The guy now maintaining your 2009 project has megabytes of legacy source code, with no expertise to hack it and nobody to offer support.

Language design• Disclaimer: It’s a hard discipline, requiring lots of talent

and resources.

• Basically the current situation in the mainstream market sucks.

• It’s slowly changing though (guys at Microsoft Research and Universities)

• Designers are realising (again) that we are not totally stupid!

• What we need is a very old tech, we just need it refined.

Taking control• Premise: Any non-tech-savvy person assumes

that a developer is someone intelligent.

• Think highly of yourself!

• We want/need more control over our primary tool.

• Make yourself this question:

What do you think has evolved more in the last 25 years: compilers or cad software?

but I already can!• No, you don’t.

• Some compilers support writing add ons for them.

• The problem is neither being able to write an extension in C to be called from the host language.

• The Language itself must define how this extension mechanisms work. It must be part of its specification.

• This way we get extensibility without sacrificing portability and risking vendor lock-in.

• It helps with versioning, you extend a version of the language not a version of a given compiler.

Extensibility• We should be able to extend pretty much anything in our

language.

• Extending the syntax and semantics (specification) is doable but not very mature yet.

• Extending the runtime is expensive. You wan’t something tested and battle tested and then tested again.

• However extending the compiler is ready for prime time!

• Preprocessors and lispy macros are cool, don’t resent me.

• But we need something more refined, almost fool-proof.

A Compiler• A compiler basically does three tasks:

• Parse (Ast construction)

• Resolve (name resolution, type binding)

• Emit (machine code, bytecode).

• Keep in mind that a bit is a measure for information!

Memory usage (Kb)

CodeParse

ResolveEmit

0 300 600 900 1200

Info!• Compilers produce a lot of information because they need

to understand our code.

• Unfortunately this pretty cool information is trashed once the compilation is done.

• Some languages support reflection, at least not everything is lost (re Java fan boys, type erasure is not cool).

• Given that a lot of very clever people put a lot of effort over several years creating a compiler.

• It seems like a huge waste of resources to not use it more.

CodeParse

ResolveEmit

Compiler Service• Almost a reality (llvm, mono mcs, roslyn, …).

• Note: You know that an IDE basically implements an incremental compiler nowadays, right? Wonder why they feel slow now?

• Use cases:

• Better editor support

• Optimised/Incremental builds

• Documentation generation

• Quality/Security audits

Compiler Service (II)• But we want more!

• If only we could hook our logic into the compiler like we do with build systems…

• The following would be dirty cheap to have:

• Injection of cross-concerns (logging, aspects, …)

• Statically checked/expanded DSLs

• Quality, Security and Business specific enforcements

Compiler Service (III)• All of the sudden, half our

libraries functionality is now managed by the compiler!

• Wait! won’t we suffer the “cycle of doom” too?

• No! :-)

• The logic is encapsulated under the compiler public interface.

• All the tooling using the compiler will automatically support our use cases!

• That means for example:

• Editors will autocomplete and report errors for expanded DSLs

• Build systems will notify errors generated by our custom enforcements

• Besides, we reduced some heavy weight from the runtime, pretty cool for restrained mobile apps for example.

Macros• Extending the compiler is nice but could be made

more user friendly.

• When you have a pattern coming again and again in your code, it will probably be better solved with a macro.

• Macros are a pretty old concept but they mostly refer to “text macros”. What we need are AST Macros.

• Instead of replacing chunks of text we operate on the syntax tree of the compiler. Much safer and powerful!

Examplemacro using(expr as Expression): yield [| try: $(using.Body) ensure: $(expr).close() |]

fp = open(‘path/to/file.ext’) using fp: fp.write(‘foobar') # Note that the file resource is closed automatically

Quasi-Quotations• Normal language syntax constructs that are

somehow quoted (`[| .. |]` in the example).

• Their purpose is to make creating complex AST structures a breeze.

• The compiler generates a separate AST for them. It’s like a template.

• When we inject them somewhere the compiler will apply that template in that AST node.

Splicing• Injection points in a quasi-quotation (`$(..)` in the

example).

• Their purpose is to parametrize quasi-quotations to dynamically build arbitrary AST structures.

• Resolves to a syntax node, the compiler will insert that node in the template tree.

• They keep their lexical information once inserted.

Why are they better?• Syntax is integrated and validated as normal code.

• Most of the lexical information is kept, thus reported errors after expansion contain valid information.

• Debuggable!

• Toolchain friendly. They are transparent to consuming tools.

• DSL friendly. Capture common patterns to solve your problems and offer a statically checked interface to fit your needs!

Can’t libraries do this?• Yes, but they are not the right tool for the job.

• Lots of stuff can be done at compile time, no need to bloat the runtime with stuff that just makes developing easier.

• Unified error reporting. Find out incorrect usages at compile time instead of uncontrolled runtime exceptions.

• The `assert` example shows pretty well the difference:

#!/bin/env python !foo = 10 assert foo != 10 # AssertionError:

#!/bin/env booi !foo = 10 assert foo != 10 # AssertionError: foo != 10

SyntaxA very quick overview

Parsers• Disclaimer: Not saying it’s a good

idea for a general language

• The technology allows it and current systems have enough memory to pull it.

• Traditionally parsing involved a lexer generating tokens and a parser (either a hand-coded recursive descent or a generated table based).

• PEG style parsers (and other pattern matching techniques) operate the same on a stream of chars (source code in a file), a tree of XML nodes or any other object structure.

• They can be made to have constant parsing time using memoization.

• OMeta is an example of PEG parser with a built in extension model.

• Parsing rules (productions) can be inherited and modified at will using a pretty simple syntax.

Making the language ours• Use cases:

• Any sort of DSL would benefit from having its specific syntax

• Units: delay = 10h + 30m, weight = 20Kg

• Reduce verbosity: for i in 1..10

• Short comings:

• Editors won’t know how to colorise the new syntax

• It’s tempting to abuse it

• Makes your code less portable

• Tightly coupled to the standard grammar

User eXperienceHave you checked clang recently?

Language design• Syntax of a language is the first and

most important point of interaction.

• Language design is still mainly driven by the target audience expectations (oh those curly braces everywhere).

• The syntax must be very carefully drafted to avoid ambiguities. Even if designers still care very much about the aesthetics.

• When asked if they performed user tests when designing C#, Hejlsberg answered "only for the integration in Visual Studio”. More generally, check how it integrates with the tools you’re going to use.

• Very terse syntaxes tend to be good for DSLs and small and targeted projects. Code is read and reasoned about much more often than written!

• Semantics are as important as the syntax, don’t overlook its implications.

• Stay away from languages that release new keywords and constructs every other minor version. A syntax either works or doesn’t but can target every use case.

Reporting• The compiler is our main tool

but still, it’s generally speaking, a bitch reporting errors.

• Error reporting is the main interface, besides the language syntax, between the compiler and the developer.

• IDEs partially solve this by implementing their own heuristics on the code.

• But it should be the compiler! I don’t want my IDE bloated with these things.

• clang has done a great job with this. Other compilers should follow their lead.

• Compilers ought to be extensible in this aspect too, if we can extend them with our own logic we must have a clear way to give feedback to the user.

Parsing errors• Language newcomers tend to have syntax errors until they get fluent.

• The common approach is to extend the grammar to include invalid productions with custom errors.

• Grammars tend to get out of hand and difficult to refactor.

• A pretty cool way to improve the errors is to use example snippets of incorrect syntax.

• The snippets are fed to the parser and the state of it when it errors is recorded.

• Recorded states are then used to build the final parser, so when hits an error that matches a custom message can be displayed.

class Foo: def foo() # Note: I forgot the colon pass !

Before: <stdin>(3,13): BCE0043: Unexpected token: <INDENT>. <stdin>(3,13): BCE0044: expecting "DEDENT", found '<EOL>'. <stdin>(3,13): BCE0044: expecting "EOF", found '<DEDENT>'.

After: <stdin>(3,13): BCE0043: Methods must use a ‘:’ for their body.

Type SafetyWishing it could be extended

Type Safety• Major families:

• Nominal (Java, C#, …)

• Structural (Go, TypeScript, …)

• Duck typing (Python, JavaScript, …)

• Some languages offer a mix of them (ie. Scala, ObjC, Boo, …)

• Many allow for runtime duck typing.

• Pattern matching solves a lot of type safety derived issues (bye bye Visitor)

• The problem with extending them is that the semantics of the language change.

• People can easily reason and adapt to syntax changes but for semantic ones…

• In the current state of the art is probably better switching languages completely if the problem at hand requires a different model of type safety.

API Versioning• If your language uses nominal typing you actually have to embrace

it. Carefully plan your Interfaces to get the job done.

• C# does it right. If you want to be confident about exposing an API you can’t make every method virtual (*cough* Java *cough*).

• public, protected and private are visibility modifiers, not extension point indicators.

• Structural typing really shines in this regard. But it must allow it from the call site not only the definition site (Go vs Scala). The library author doesn’t have a clue if his API works as designed or not :-)

• Dynamic languages are flawed on this. Rapid prototyping but zero confidence when upgrading.

Cool stuffOld tech sold as a bright future

Live Coding• Imagine you could tap into the

runtime environment and modify the executing instructions on realtime.

• Many projects deploy nowadays into a Virtual Machine, which technically simplify it.

• Still, the VM expects byte code, we want to write in a higher level syntax.

• Extensible compilers would allow bringing Live Coding to languages that are not interpreted.

• We ask the compiler to regenerate the changes in our function, the result is a chunk of byte code to fed into the VMs.

• No need to run an interpreter and very few limitations.

• Still, this is the kind of thing that would flourish with an extensible compiler design. No big vendor is going to release something like this.

Type Providers• Popularised recently by F#

• They are basically specialised macros, whose purpose is to generate new types in the program.

• Remember when you had to preprocess some XML schema files to generate Java classes? What if the compiler could do that for you automatically?

• Using Json? Just feed an example message file to the type provider macro and it will generate a typed interface for it.

• This is available in many languages at runtime via Reflection. What’s cool is that now you have it at compile time. Type checking and auto completion as obvious benefits.

What now?

Is your responsibility• To get better tools for our job

we have to earn it. Advocate these concepts to gain traction in the profession.

• Language designers do listen to its users even if for the mainstream languages they are slow reacting to it.

• If you have to choose between an extensible language and one that’s not, weight the former appropriately in the comparison.

• Don’t be afraid of moulding a language to your needs. Know your tool and help it help you!

• If deploying into a VM (JVM, CLR, JavaScript) there is little risk in mixing languages in a project that boost our productivity.

• Forget the idea that only functional languages are extensible. Imperative language can do it too!

Conclusions

Conclusions• Languages should be

designed for extensibility.

• Anyone that can create a library should be also able to extend the compiler/language. Not everything must be done at runtime!

• We need better compilers in every aspect.

• Statically typed languages are closing the gap with dynamic ones in ease of use and user experience. We have to review our current stand points on this.

• If you have the chance to learn a new framework or a new language, chose the later, it’s much more rewarding.

• Extensible compilers will foster innovation, no need to wait for Oracle or Microsoft to come up with new ideas every 3 years. People will be able to shape the future trying new things and introducing those that work as part of the standard languages.

• Good drivers don’t need to know about the internals of the car, but it certainly doesn’t hurt :-)

–Ludwig Wittgenstein

“The limits of my language are the limits of my world.”

Programming Languages #devcon2013

Technology

compiler service

given compiler

incremental compiler

nice language

language customisation

favourite language

host language

compiler public interface