Top Banner
Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill
28

Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Dec 13, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Compilation

With an emphasis on getting the job done quickly

Copyright © 2003-2015 – Curt Hill

Page 2: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Introduction• One of the goals of these

courses is to get the ability to implement simple control languages– This is what this is about

• There are whole courses devoted to compiler theory and construction of compilers and interpreters– Just not here

Copyright © 2003-2015 – Curt Hill

Page 3: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Stages• Compilers typically have

several stages• These may be sequentially run

or concurrently run• These include

– The lexical analyzer or scanner– The syntax analyzer or parser– The code generator– The optimization routines

• In this class little concern for last two

Copyright © 2003-2015 – Curt Hill

Page 4: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Lexical analyzer or scanner

• Front end for a parser• Take a string of characters and

produce a sequence of tagged tokens

• In the input there are things like comments, white space, and line breaks– None of those make any

difference to the parser– Neither does input buffering and

numerous other detailsCopyright © 2003-2015 – Curt Hill

Page 5: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Why separate scanner and parser?

• This simplifies the parser – Which is inherently more complicated

• We can optimize the scanner in different ways than the parser

• Separation makes both more modular

• The parser is mostly portable– Scanner may or may not be since it

deals more seriously with files

Copyright © 2003-2015 – Curt Hill

Page 6: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Scanner• Lexical errors - there are just a

few– Invalid format of a number or

identifier name– Unmatched two part comments or

quoted strings– Character not in alphabet

• The lexical analyzer might or might not actually do something with the symbol table– Depends on the format of the token

streamCopyright © 2003-2015 – Curt Hill

Page 7: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

The token stream• A token is usually constructed

from a record or class• One item must contain the class

of token– This usually assigns a number or

enumeration for each reserved word, punctuation mark and for various things like identifiers, and constants

• These may contain supplemental information as needed

Copyright © 2003-2015 – Curt Hill

Page 8: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Supplemental information• A reserved word is defined

sufficiently by the assigned number of enumeration

• A numeric constant needs to carry the actual value and possibly type

• An identifier needs the canonical representation– In languages that are not case

sensitive this is usually every character converted to all upper or all lower case

– In case sensitive languages this is the exact representation of the name

Copyright © 2003-2015 – Curt Hill

Page 9: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Supplemental information

• Often the location of the token is also passed along so that an error message may be pinned to a usable source location– Line number and column position

• Not needed for parsing but helps the user determine how to fix the error

• The parser will merely ask for one token at a time, picking it off the token stream– Parser sees no lines

Copyright © 2003-2015 – Curt Hill

Page 10: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Creating a lexical analyzer

• A lexical analyzer usually recognizes a Type 3 language– A regular language

• Thus you can describe this token language rather simply– Such as using regular expressions

or a Finite State Automaton

• These are easy to code by hand– There are programs that do so as

well

Copyright © 2003-2015 – Curt Hill

Page 11: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Relationship

• It is possible to have the scanner operate like the preprocessor– Read in a source code file and write

out a file of tokens

• Usually it is just a function called by the parser to deliver the next token

• No reason why it could not be a co-routine as well

Copyright © 2003-2015 – Curt Hill

Page 12: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Generated Scanners• There are programs that

generate scanners based on some type of formal language that describes– The most famous of which is lex on

UNIX systems

• Many types of parser generators either come with one that is easy to modify to your particular grammar or generate one while processing the syntax

Copyright © 2003-2015 – Curt Hill

Page 13: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Simple Example

Copyright © 2003-2015 – Curt Hill

Program demo(input,output);{ a comment }var x:integer;begin x := 5; writeln(‘This is x’,x)End.

program  

ident demo(  ident input,  ident output)  ;  var  ident x:  ident integer;  begin  ident x:=  numeric constant

5

Source Code

Token Stream

Page 14: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

The Parser• Determines if the source file

conforms to language syntax– Generate meaningful error messages

• Build a structure suitable for later stages to operate– Mostly the code generator– This structure is usually a parse tree

Copyright © 2003-2015 – Curt Hill

Page 15: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Parsers• There are several types of

parsers• Top down or bottom up• Recursive descent• LL• LR• Generated or table parsers

Copyright © 2003-2015 – Curt Hill

Page 16: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Top down• Uses leftmost derivation

•Build the parse tree in a top down and left to right fashion

• Languages that may be parsed in this way are termed LL(N) – The first L specifies a Left to right

scan of the source code– The second L specifies that the left

most derivation is the one generated– N is often one and it represents the

number of symbols that needs to be looked ahead at in order to decide which rule to use next

Copyright © 2003-2015 – Curt Hill

Page 17: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Example• Consider a handout for this

– After the ident is either a comma or right parenthesis

– The parser looks ahead to that item to determine what to do next

– In every fork in the syntax diagrams there is a look ahead set of tokens

– We determine which way to go by determining if the next symbol is in one set or another

– If we never need more than a single look ahead then 1 is the constant

Copyright © 2003-2015 – Curt Hill

Page 18: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Top Down Again• For most programming

languages an LL(1) grammar exists– This requires just a single look

ahead token

• Lets look through the handout and see how this works

Copyright © 2003-2015 – Curt Hill

Page 19: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Bottom up• Looks at the leaves and works

its way towards the root• Bottom up usually accepts

LR(n) languages• Since they start at the bottom

of the parse tree they use shift-reduce algorithms

• In this presentation we will skip how this actually works

Copyright © 2003-2015 – Curt Hill

Page 20: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

LL and LR• In theory you could parse a

programming language in many ways other than LL or LR– However, doing so makes the

running time of O(N3) which is not good for something that is run as often as a compiler

– By forcing a Left to right scan of the source (the first L) we can get O(N) compiles which makes everyone much happier

Copyright © 2003-2015 – Curt Hill

Page 21: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Subsets• LL(1) languages are a subset

of LR(1)• Hence for any LL(1) language

there is an LR(1) grammar• There may be an LR(1)

language for which there is no LL(1) grammar

• There are two other classes SLR and LALR which are simplifications of LR

Copyright © 2003-2015 – Curt Hill

Page 22: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Commonly• The LL, SLR and LALR parsers

have been the dominant ones because the tables needed by an LR could grow exponentially in the worst case

• Hence most of the table driven parsers were LL or LALR

• However, since that time quite a bit of work has been done and there are now some decent LR table parsers

Copyright © 2003-2015 – Curt Hill

Page 23: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Recursive descent parsers

• Since the grammar is recursive we can make our program follow the grammar

• For each production/non-terminal generate a function that processes that non-terminal

• It simply calls another function for each non-terminal on the RHS of the production

• We are less interested in these Copyright © 2003-2015 – Curt Hill

Page 24: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Recursive Descent• We have seen

– LR(n)– LL(n)– LALR(n)

• The N determines how many tokens have to be looked at to make a decision which production is involved

• Most programming languages have an LL(1) grammar– A recursive descent parser can look

ahead just one token and then choose which is the right production

Copyright © 2003-2015 – Curt Hill

Page 25: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Generated parsers• These parsers are usually LALR(1)

or LL(1) but a few are LR(1)• There are number of these

available– YACC UNIX

• LALR(1)

• These read in some form of a grammar

• Generate a series of tables that is used by a parser

• The scanner is also generated and then plugged into the parser

Copyright © 2003-2015 – Curt Hill

Page 26: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

How do these work?• The scanner reads in so many

tokens• The parser then looks for tokens

that fit into the pattern of a production

• That is it looks for what is on the RHS

• When it finds the pattern it does a reduction– A reduction is moving right to left

across a production– Replace RHS with the LHS

Copyright © 2003-2015 – Curt Hill

Page 27: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

More• When a reduction is done a

semantic routine is usually called• This routine may do any of the

following:– Check semantic things

• Is a variable defined?

– Update the symbol table– Generate code– Do the things not possible with BNF

• Eventually we should be able to reduce to the distinguished symbol, then we are done

Copyright © 2003-2015 – Curt Hill

Page 28: Compilation With an emphasis on getting the job done quickly Copyright © 2003-2015 – Curt Hill.

Finally• Lexical analysis implements a finite

state automaton• Parsing implements a push down

automaton– In a recursive descent the stack is the

run-time stack of function calls– In bottom up parsing the stack

contains tokens

• As Software Engineers we are usually interested in generated parsers for the ease of construction

Copyright © 2003-2015 – Curt Hill