YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Compiler Construction 2011, Lecture 2 Staff:

• Viktor Kuncak – Lectures

• Etienne Kneuss and Philippe Suter – {labs}

• Eva Darulova and Giuliano Losa – Exercises

• Regis Blanc – assistant

• Yvette Gallay – secretary

http://lara.epfl.ch

Drawing Hands

M.C. Escher, 1948

http://tiny.cc/compilers

Page 2: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Reminder

• Register on:

– IS academia

– Moodle – so you can get our emails

– Our wonderful repository (reachable from course page)

• So please form the groups

Page 3: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Compiler (scalac, gcc)

machine code (e.g. x86, arm, JVM) efficient to execute

i=0 while (i < 10) { a[i] = 7*i+3 i = i + 1 }

source code (e.g. Scala, Java,C) easy to write

mov R1,#0 mov R2,#40 mov R3,#3 jmp +12 mov (a+R1),R3 add R1, R1, #4 add R3, R3, #7 cmp R1, R2 blt -16

Compiler

Construction

i = 0 LF

w h i l e

i = 0

while ( i <

10 )

lexer

characters words trees

data-flow graphs

parser

assign

while

i 0

+

* 3

7 i

assign a[i]

<

i 10

code gen

optimizer

type check

Page 4: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Compiler (scalac, gcc)

Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 }

source code

Compiler

Construction

i d3

=

0 LF

w

id3 = 0

while (

id3 <

10 )

lexer

characters words (tokens)

trees

parser

assign

while

i 0

+

* 3

7 i

assign a[i]

<

i 10

Lexer is specified using regular expressions. Groups characters into tokens and classifies them into token classes.

Page 5: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Today: Lexical Analysis. Summary: • lexical analyzer maps a stream of characters into a stream of tokens

– while doing that, it typically needs only bounded memory

• we can specify tokens for a lexical analyzers using regular expressions

• it is not difficult to construct a lexical analyzer manually – we give an example

– for manually constructed analyzers, we often use the first character to decide on token class; a notion first(L) = { a | aw in L }

• we follow the maximal munch rule: lexical analyzer should eagerly accept the longest token that it can recognize from the current point

• it is possible to automate the construction of lexical analyzers; the starting point is conversion of regular expressions to automata

– tools that automate this construction are part of compiler-compilers, such as JavaCC described in the Tiger book

– automated construction of lexical analyzers from regular expressions is an example of compilation for a domain-specific language

Page 6: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

While Language – Idea

• Small language used to illustrate key concepts

• Also used in your first lab – interpreter

– later labs will use a more complex language

– we continue to use while in lectures

• ‘while’ and ‘if’ are the control statements

– no procedures, no exceptions

• the only variables are of ‘int’ type

– no variable declarations, they are initially zero

– no objects, pointers, arrays

Page 7: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

While Language – Example Programs

x = 13; while (x > 1) { println("x=", x); if (x % 2 == 0) { x = x / 2; } else { x = 3 * x + 1; } }

Does the program terminate

for every initial value of x?

(Collatz conjecture - open)

while (i < 100) { j = i + 1; while (j < 100) { println(“ “,i); println(“,”,j); j = j + 1; } i = i + 1; }

Nested loop

Page 8: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Tokens (Words) of the While Language

Ident ::= letter (letter | digit)*

integerConst ::= digit digit*

stringConst ::= “ AnySymbolExceptQuote* “

keywords if else while println

special symbols ( ) && < == + - * / % ! - { } ; ,

letter ::= a | b | c | … | z | A | B | C | … | Z digit ::= 0 | 1 | … | 8 | 9

regular expressions

Page 9: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Regular Expressions: Definition

• One way to denote (often infinite) languages

• Regular expression is an expression built from:

– empty language

– {ε}, denoted just ε

– {a} for a in Σ, denoted simply by a

– union, denoted | or, sometimes +

– concatenation, as multiplication or nothing

– Kleene star *

• Identifiers: letter (letter | digit)* (letter,digit are shorthands from before)

Page 10: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

History: Kleene (from Wikipedia)

Stephen Cole Kleene (January 5, 1909, Hartford, Connecticut, United States – January 25, 1994, Madison, Wisconsin) was an American mathematician who helped lay the foundations for theoretical computer science. One of many distinguished students of Alonzo Church, Kleene, along with Alan Turing, Emil Post, and others, is best known as a founder of the branch of mathematical logic known as recursion theory. Kleene's work grounds the study of which functions are computable. A number of mathematical concepts are named after him: Kleene hierarchy, Kleene algebra, the Kleene star (Kleene closure), Kleene's recursion theorem and the Kleene fixpoint theorem. He also invented regular expressions, and was a leading American advocate of mathematical intuitionism.

Page 11: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Manually Constructing Lexers

Page 12: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Compiler (scalac, gcc)

Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 }

source code

Compiler

Construction

i d3

=

0 LF

w

id3 = 0

while (

id3 <

10 )

lexer

characters words (tokens)

trees

parser

assign

while

i 0

+

* 3

7 i

assign a[i]

<

i 10

Lexer is specified using regular expressions. Groups characters into tokens and classifies them into token classes.

Page 13: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Lexer input and Output

i d3

=

0 LF

w

id3 = 0

while (

id3 <

10 )

lexer

Stream of Char-s

class CharStream(fileName : String){

val file = new BufferedReader(

new FileReader(fileName))

var current : Char = ' '

var eof : Boolean = false

def next = {

if (eof)

throw EndOfInput("reading" + file)

val c = file.read()

eof = (c == -1)

current = c.asInstanceOf[Char]

}

next

}

Stream of Token-s sealed abstract class Token

case class ID(content : String) // “id3”

extends Token

case class IntConst(value : Int) // 10

extends Token

case class AssignEQ() „=„

extends Token

case class CompareEQ // „==„

extends Token

case class MUL() extends Token // „*‟

case class PLUS() extends Token // +

case clas LEQ extends Token // „<=„

case class OPAREN extends Token //(

case class CPAREN extends Token //)

...

case class IF() extends Token // „if‟

case class WHILE() extends Token

case class EOF() extends Token

// End Of File

class Lexer(ch : CharStream) {

var current : Token

def next : Unit = {

lexer code here

}

}

Page 14: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Identifiers and Keywords

if (isLetter) { b = new StringBuffer while (isLetter || isDigit) { b.append(ch.current) ch.next } } keywords.lookup(b.toString) { case None => token=ID(b.toString) case Some(kw) => token=kw }

Keywords look like identifiers,

but are simply indicated as

keywords in language

definition

A constant Map from strings to

keyword tokens

if not in map, then it is ordinary

identifier

regular expression for identifiers:

letter (letter|digit)*

Page 15: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Integer Constants

if (isDigit) { k = while (isDigit) { k = ch.next } token = IntConst(k) }

Keywords look like identifiers,

but are simply indicated as

keywords in language

definition

A constant Map from strings to

keyword tokens

if not in map, then it is ordinary

identifier

regular expression for integers:

digit digit*

Page 16: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Decision Tree to Map Symbols to Tokens ch.current match {

case '(' => {current = OPAREN; ch.next; return}

case ')' => {current = CPAREN; ch.next; return}

case '+' => {current = PLUS; ch.next; return}

case '/' => {current = DIV; ch.next; return}

case '*' => {current = MUL; ch.next; return}

case '=' => {

ch.next

if (ch.current=='=') {ch.next; current = CompareEQ; return}

else {current = AssignEQ; return}

}

case '<' => {

ch.next

if (ch.current=='=') {ch.next; current = LEQ; return}

else {current = LESS; return}

}

}

Page 17: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Skipping Comments

if (ch.current='/') {

ch.next

if (ch.current='/') {

while (!isEOL && !isEOF) {

ch.next

}

}

}

Nested comments? /* foo /* bar */ baz */

Page 18: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Further Important Topics

• Longest Match Rule

• Combining pieces together

– computing first symbols for regular expressions

• Example of tiny lexical analyzer

see wiki

Page 19: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Computing first symbols

Page 20: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Computing nullable expressions

Page 21: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Automating Construction of Lexers

Page 22: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Example in javacc

TOKEN: {

<IDENTIFIER: <LETTER> (<LETTER> | <DIGIT> | "_")* >

| <CONSTANT: <DIGIT> (<DIGIT>)* >

| <LETTER: ["a"-"z"] | ["A"-"Z"]>

| <DIGIT: ["0"-"9"]>

}

SKIP: {

" "

| "\n"

| "\t"

}

Page 23: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Finite Automaton

Kinds of finite automata:

• deterministic

• non-deterministic

– with epsilon transition

– with regular expressions on edges

Page 24: Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Interpretation of Non-Determinism

• For a given string, some paths in automaton lead to accepting, some to rejecting states

• Does the automaton accept?

– yes, if there exists an accepting path

Continued in next lecture


Related Documents