Top Banner
Compiler Construction 2011, Lecture 2 Staff: Viktor Kuncak – Lectures Etienne Kneuss and Philippe Suter – {labs} Eva Darulova and Giuliano Losa – Exercises Regis Blanc – assistant Yvette Gallay – secretary http://lara.epfl.ch Drawing Hands M.C. Escher, 1948 http://tiny.cc/compilers
24

Compiler Construction 2011, Lecture 2 - lara.epfl.chlara.epfl.ch/~kuncak/cc2011/L2-CC2011.pdf · Compiler Construction 2011, Lecture 2 ... Stephen Cole Kleene ... known as a founder

Aug 19, 2018

ReportDownload

Documents

nguyenngoc

  • Compiler Construction 2011, Lecture 2 Staff:

    Viktor Kuncak Lectures

    Etienne Kneuss and Philippe Suter {labs}

    Eva Darulova and Giuliano Losa Exercises

    Regis Blanc assistant

    Yvette Gallay secretary

    http://lara.epfl.ch

    Drawing Hands

    M.C. Escher, 1948

    http://tiny.cc/compilers

    http://tiny.cc/compilers

  • Reminder

    Register on:

    IS academia

    Moodle so you can get our emails

    Our wonderful repository (reachable from course page)

    So please form the groups

  • Compiler (scalac, gcc)

    machine code (e.g. x86, arm, JVM) efficient to execute

    i=0 while (i < 10) { a[i] = 7*i+3 i = i + 1 }

    source code (e.g. Scala, Java,C) easy to write

    mov R1,#0 mov R2,#40 mov R3,#3 jmp +12 mov (a+R1),R3 add R1, R1, #4 add R3, R3, #7 cmp R1, R2 blt -16

    Compiler

    Construction

    i = 0 LF

    w h i l e

    i = 0

    while ( i <

    10 )

    lexer

    characters words trees

    data-flow graphs

    parser

    assign

    while

    i 0

    +

    * 3

    7 i

    assign a[i]

    <

    i 10

    code gen

    optimizer

    type check

  • Compiler (scalac, gcc)

    Id3 = 0 while (id3 < 10) { println(,id3); id3 = id3 + 1 }

    source code

    Compiler

    Construction

    i d3

    =

    0 LF

    w

    id3 = 0

    while (

    id3 <

    10 )

    lexer

    characters words (tokens)

    trees

    parser

    assign

    while

    i 0

    +

    * 3

    7 i

    assign a[i]

    <

    i 10

    Lexer is specified using regular expressions. Groups characters into tokens and classifies them into token classes.

  • Today: Lexical Analysis. Summary: lexical analyzer maps a stream of characters into a stream of tokens

    while doing that, it typically needs only bounded memory

    we can specify tokens for a lexical analyzers using regular expressions

    it is not difficult to construct a lexical analyzer manually we give an example

    for manually constructed analyzers, we often use the first character to decide on token class; a notion first(L) = { a | aw in L }

    we follow the maximal munch rule: lexical analyzer should eagerly accept the longest token that it can recognize from the current point

    it is possible to automate the construction of lexical analyzers; the starting point is conversion of regular expressions to automata

    tools that automate this construction are part of compiler-compilers, such as JavaCC described in the Tiger book

    automated construction of lexical analyzers from regular expressions is an example of compilation for a domain-specific language

  • While Language Idea

    Small language used to illustrate key concepts

    Also used in your first lab interpreter

    later labs will use a more complex language

    we continue to use while in lectures

    while and if are the control statements

    no procedures, no exceptions

    the only variables are of int type

    no variable declarations, they are initially zero

    no objects, pointers, arrays

  • While Language Example Programs

    x = 13; while (x > 1) { println("x=", x); if (x % 2 == 0) { x = x / 2; } else { x = 3 * x + 1; } }

    Does the program terminate

    for every initial value of x?

    (Collatz conjecture - open)

    while (i < 100) { j = i + 1; while (j < 100) { println( ,i); println(,,j); j = j + 1; } i = i + 1; }

    Nested loop

  • Tokens (Words) of the While Language

    Ident ::= letter (letter | digit)*

    integerConst ::= digit digit*

    stringConst ::= AnySymbolExceptQuote*

    keywords if else while println

    special symbols ( ) && < == + - * / % ! - { } ; ,

    letter ::= a | b | c | | z | A | B | C | | Z digit ::= 0 | 1 | | 8 | 9

    regular expressions

  • Regular Expressions: Definition

    One way to denote (often infinite) languages

    Regular expression is an expression built from:

    empty language

    {}, denoted just

    {a} for a in , denoted simply by a

    union, denoted | or, sometimes +

    concatenation, as multiplication or nothing

    Kleene star *

    Identifiers: letter (letter | digit)* (letter,digit are shorthands from before)

  • History: Kleene (from Wikipedia)

    Stephen Cole Kleene (January 5, 1909, Hartford, Connecticut, United States January 25, 1994, Madison, Wisconsin) was an American mathematician who helped lay the foundations for theoretical computer science. One of many distinguished students of Alonzo Church, Kleene, along with Alan Turing, Emil Post, and others, is best known as a founder of the branch of mathematical logic known as recursion theory. Kleene's work grounds the study of which functions are computable. A number of mathematical concepts are named after him: Kleene hierarchy, Kleene algebra, the Kleene star (Kleene closure), Kleene's recursion theorem and the Kleene fixpoint theorem. He also invented regular expressions, and was a leading American advocate of mathematical intuitionism.

    http://en.wikipedia.org/wiki/Hartford,_Connecticuthttp://en.wikipedia.org/wiki/United_Stateshttp://en.wikipedia.org/wiki/Madison,_Wisconsinhttp://en.wikipedia.org/wiki/United_Stateshttp://en.wikipedia.org/wiki/Mathematicianhttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Alonzo_Churchhttp://en.wikipedia.org/wiki/Alonzo_Churchhttp://en.wikipedia.org/wiki/Alan_Turinghttp://en.wikipedia.org/wiki/Emil_Posthttp://en.wikipedia.org/wiki/Mathematical_logichttp://en.wikipedia.org/wiki/Recursion_theoryhttp://en.wikipedia.org/wiki/Recursion_theoryhttp://en.wikipedia.org/wiki/Computable_functionhttp://en.wikipedia.org/wiki/Kleene_hierarchyhttp://en.wikipedia.org/wiki/Kleene_hierarchyhttp://en.wikipedia.org/wiki/Kleene_hierarchyhttp://en.wikipedia.org/wiki/Kleene_algebrahttp://en.wikipedia.org/wiki/Kleene_algebrahttp://en.wikipedia.org/wiki/Kleene_algebrahttp://en.wikipedia.org/wiki/Kleene_starhttp://en.wikipedia.org/wiki/Kleene_starhttp://en.wikipedia.org/wiki/Kleene_starhttp://en.wikipedia.org/wiki/Kleene's_recursion_theoremhttp://en.wikipedia.org/wiki/Kleene's_recursion_theoremhttp://en.wikipedia.org/wiki/Kleene's_recursion_theoremhttp://en.wikipedia.org/wiki/Kleene_fixpoint_theoremhttp://en.wikipedia.org/wiki/Kleene_fixpoint_theoremhttp://en.wikipedia.org/wiki/Kleene_fixpoint_theoremhttp://en.wikipedia.org/wiki/Kleene_fixpoint_theoremhttp://en.wikipedia.org/wiki/Kleene_fixpoint_theoremhttp://en.wikipedia.org/wiki/Regular_expressionshttp://en.wikipedia.org/wiki/Mathematical_intuitionism

  • Manually Constructing Lexers

  • Compiler (scalac, gcc)

    Id3 = 0 while (id3 < 10) { println(,id3); id3 = id3 + 1 }

    source code

    Compiler

    Construction

    i d3

    =

    0 LF

    w

    id3 = 0

    while (

    id3 <

    10 )

    lexer

    characters words (tokens)

    trees

    parser

    assign

    while

    i 0

    +

    * 3

    7 i

    assign a[i]

    <

    i 10

    Lexer is specified using regular expressions. Groups characters into tokens and classifies them into token classes.

  • Lexer input and Output

    i d3

    =

    0 LF

    w

    id3 = 0

    while (

    id3 <

    10 )

    lexer

    Stream of Char-s

    class CharStream(fileName : String){

    val file = new BufferedReader(

    new FileReader(fileName))

    var current : Char = ' '

    var eof : Boolean = false

    def next = {

    if (eof)

    throw EndOfInput("reading" + file)

    val c = file.read()

    eof = (c == -1)

    current = c.asInstanceOf[Char]

    }

    next

    }

    Stream of Token-s sealed abstract class Token

    case class ID(content : String) // id3

    extends Token

    case class IntConst(value : Int) // 10

    extends Token

    case class AssignEQ() =

    extends Token

    case class CompareEQ // ==

    extends Token

    case class MUL() extends Token // *

    case class PLUS() extends Token // +

    case clas LEQ extends Token //

  • Identifiers and Keywords

    if (isLetter) { b = new StringBuffer while (isLetter || isDigit) { b.append(ch.current) ch.next } } keywords.lookup(b.toString) { case None => token=ID(b.toString) case Some(kw) => token=kw }

    Keywords look like identifiers,

    but are simply indicated as

    keywords in language

    definition

    A constant Map from strings to

    keyword tokens

    if not in map, then it is ordinary

    identifier

    regular expression for identifiers:

    letter (letter|digit)*

  • Integer Constants

    if (isDigit) { k = while (isDigit) { k = ch.next } token = IntConst(k) }

    Keywords look like identifiers,

    but are simply indicated as

    keywords in language

    definition

    A constant Map from strings to

    keyword tokens

    if not in map, then it is ordinary

    identifier

    regular expression for integers:

    digit digit*

  • Decision Tree to Map Symbols to Tokens ch.current match {

    case '(' => {current = OPAREN; ch.next; return}

    case ')' => {current = CPAREN; ch.next; return}

    case '+' => {current = PLUS; ch.next; return}

    case '/' => {current = DIV; ch.next; return}

    case '*' => {current = MUL; ch.next; return}

    case '=' => {

    ch.next

    if (ch.current=='=') {ch.next; current = CompareEQ; return}

    else {current = AssignEQ; return}

    }

    case '

  • Skipping Comments

    if (ch.current='/') {

    ch.next

    if (ch.current='/') {

    while (!isEOL && !isEOF) {

    ch.next

    }

    }

    }

    Nested comments? /* foo /* bar */ baz */

  • Further Important Topics

    Longest Match Rule

    Combining pieces together

    computing first symbols for regular expressions

    Example of tiny lexical analyzer

    see wiki

  • Computing first symbols

  • Computing nullable expressions

  • Automating Construction of Lexers

  • Example in javacc

    TOKEN: {

    |

    |

    |

    }

    SKIP: {

    " "

    | "\n"

    | "\t"

    }

  • Finite Automaton

    Kinds of finite automata:

    deterministic

    non-deterministic

    with epsilon transition

    with regular expressions on edges

  • Interpretation of Non-Determinism

    For a given string, some paths in automaton lead to accepting, some to rejecting states

    Does the automaton accept?

    yes, if there exists an accepting path

    Continued in next lecture

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.