Top Banner
Compiler Construction Compiler Construction Parsing I Parsing I Ran Shaham and Ohad Shacham Ran Shaham and Ohad Shacham School of Computer Science School of Computer Science Tel-Aviv University Tel-Aviv University
37

Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

Compiler ConstructionCompiler Construction

Parsing IParsing I

Ran Shaham and Ohad ShachamRan Shaham and Ohad ShachamSchool of Computer ScienceSchool of Computer Science

Tel-Aviv UniversityTel-Aviv University

Page 2: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

22

AdministrationAdministration

ForumForumhttps://https://forums.cs.tau.ac.il/viewforum.php?fforums.cs.tau.ac.il/viewforum.php?f=64=64

Project Teams Project Teams Send me an email if you can’t find a teamSend me an email if you can’t find a team Send me your team if you found one and didn’t send an emailSend me your team if you found one and didn’t send an email Check excel file on websiteCheck excel file on website

First PA is at:First PA is at: http://www.cs.tau.ac.il/research/ohad.shacham/wcc08/pa/pa1/pa1.pdfhttp://www.cs.tau.ac.il/research/ohad.shacham/wcc08/pa/pa1/pa1.pdf

Page 3: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

33

Programming Assignment 1Programming Assignment 1

Implement a scanner for ICImplement a scanner for IC class Tokenclass Token

At least – line, id, valueAt least – line, id, value Should extend java_cup.runtime.SymbolShould extend java_cup.runtime.Symbol Numeric token ids in Numeric token ids in sym.javasym.java

Will be later generated by JavaCupWill be later generated by JavaCup

class Compilerclass Compiler Testbed - calls scanner to print list of tokensTestbed - calls scanner to print list of tokens [StateList] <<EOF>> { return appropriate symbol } [StateList] <<EOF>> { return appropriate symbol }

Page 4: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

44

Programming Assignment 1Programming Assignment 1

class LexicalErrorclass LexicalErrorCaught by CompilerCaught by Compiler

AssumeAssume class identifiers starts with a capital letterclass identifiers starts with a capital letterOther identifiers starts with a non capital letterOther identifiers starts with a non capital letter

Page 5: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

55

sym.javasym.java

public class sym {public class sym {

public static final int EOF = 0;public static final int EOF = 0;

public static final int ID = 1;public static final int ID = 1;

......

}}

Defines symbol constant ids Communicate between parser and scanner Actual values don’t matter

Unique value for each tokes

Will be generated by cup in PA2

Page 6: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

66

Token classToken class

import java_cup.runtime.Symbol;import java_cup.runtime.Symbol;

public class Token extends Symbol {public class Token extends Symbol {

public int getId() {...}public int getId() {...}

public Object getValue() {...}public Object getValue() {...} public int getLine() {...} public int getLine() {...}

......

}}

Page 7: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

77

JFlex directives to useJFlex directives to use

%cup%cup (integrate with cup)(integrate with cup)

%line%line (count lines)(count lines)

%type Token%type Token (pass type Token)(pass type Token)

%class Lexer%class Lexer (gen. scanner class)(gen. scanner class)

Page 8: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

88

%cup%cup

%implements java_cup.runtime.Scanner%implements java_cup.runtime.Scanner Lex class implements java_cup.runtime.ScannerLex class implements java_cup.runtime.Scanner

%function next_token %function next_token Returns the next tokenReturns the next token

%type java_cup.runtime.Symbol%type java_cup.runtime.Symbol Return token ClassReturn token Class

Page 9: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

99

StructureStructure

JFlex javacIC.lexLexical analyzer

test.ic

tokens

Lexer.java

sym.javaToken.java

LexicalError.javaCompiler.java

Page 10: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

1010

DirectionsDirections

Download JavaDownload Java Download JFlexDownload JFlex Download JavaCupDownload JavaCup Put JFlex and JavaCup in classpathPut JFlex and JavaCup in classpath EclipseEclipse

Use ant build.xmlUse ant build.xml Import jflex and javacupImport jflex and javacup

Apache AntApache Ant

Page 11: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

1111

DirectionsDirections

Use skeleton from the websiteUse skeleton from the website Read AssignmentRead Assignment Use ForumUse Forum

Page 12: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

1212

ToolsTools

AntAntMake environmentMake environmentA build.xml included in the skeletonA build.xml included in the skeletonDownload from:Download from:

http://ant.apache.orghttp://ant.apache.org

Use:Use:ant – to compileant – to compileant scanner – to run JFlexant scanner – to run JFlex

Page 13: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

1313

ToolsTools

JFlexJFlexLexical analyzer generatorLexical analyzer generatorDownload from:Download from:

http://jflex.de/http://jflex.de/

Manual: Manual: http://http://jflex.de/manual.pdfjflex.de/manual.pdfAdd $MyJFlex/lib/JFlex.jar to your classpathAdd $MyJFlex/lib/JFlex.jar to your classpathUse:Use:

java JFlex.Main IC.lexjava JFlex.Main IC.lexant scanner – for ant usersant scanner – for ant users

Page 14: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

1414

ToolsTools CupCup

Parser generatorParser generator Download from:Download from:

http://www2.cs.tum.edu/projects/cup/http://www2.cs.tum.edu/projects/cup/

Manual:Manual:http://www2.cs.tum.edu/projects/cup/manual.htmlhttp://www2.cs.tum.edu/projects/cup/manual.html

Put java-cup-11a.jar and java-cup-11a-runtime.jar in your classpathPut java-cup-11a.jar and java-cup-11a-runtime.jar in your classpath

Use:Use: java -jar java-cup-11a.jar <your file.cup>java -jar java-cup-11a.jar <your file.cup> ant libparser – for ant usersant libparser – for ant users

Page 15: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

1515

Compiler

ICProgram

ic

x86 executable

exeLexicalAnalysi

s

Syntax Analysi

s

Parsing

AST Symbol

Tableetc.

Inter.Rep.(IR)

CodeGeneration

IC compilerIC compiler

Page 16: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

1616

ParsingParsing

Input:Input: Sequence of TokensSequence of Tokens

Output:Output: Abstract Syntax TreeAbstract Syntax Tree

Decide whether program satisfies syntactic structureDecide whether program satisfies syntactic structure

Page 17: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

1717

Parsing errors Parsing errors

Error detectionError detection Report the most relevant error messageReport the most relevant error message Correct line numberCorrect line number Current v.s. expected tokenCurrent v.s. expected token

Error recoveryError recovery Recover and continue to the next errorRecover and continue to the next error Heuristics for good recovery to avoid many spurious errorsHeuristics for good recovery to avoid many spurious errors

Search for a semi-column and ignore the statementSearch for a semi-column and ignore the statement Ignore the next n errorsIgnore the next n errors

Page 18: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

1818

ParsingParsing

Context Free Grammars (CFG)Context Free Grammars (CFG)

Captures program structure (hierarchy)Captures program structure (hierarchy) Employ formal theory resultsEmploy formal theory results Automatically create “efficient” parsersAutomatically create “efficient” parsers

Grammar:S if E then S else S S print EE num

Page 19: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

1919

From text to abstract syntaxFrom text to abstract syntax5 + (7 * x)

numnum++((numnum**idid))

Lexical Analyzer

program text

token stream

Parser

Grammar:E id E numE E + EE E * EE ( E ) num(5)

E

E E+

E * E

( E )

num(7) id(x)

+

Num(5)

Num(7) id(x)

*Abstract syntax tree

parse tree

validsyntaxerror

Page 20: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

2020

From text to abstract syntaxFrom text to abstract syntax

numnum++((numnum**idid))token stream

Parser

Grammar:E id E numE E + EE E * EE ( E ) num

E

E E+

E * E

( E )

num id

+

num

num x

*Abstract syntax tree

parse tree

validsyntaxerror

Note: a parse tree describes a run of the parser,an abstract syntax tree is the result of a successful run

Page 21: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

2121

Parsing terminologyParsing terminologySymbols סימנים)): terminals (tokens) + * ( ) id numnon-terminals E

Derivation (גזירה):EE + E1 + E1 + E * E1 + 2 * E1 + 2 * 3

Parse tree (עץ גזירה):

1

E

E E+

E E*

2 3

Grammar rules :( חוקי(דקדוקE id E numE E + EE E * EE ( E )

Convention: the non-terminal appearing in the first derivation rule is defined to be the initial non-terminal

Each step in a derivation is called a production

Page 22: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

2222

AmbiguityAmbiguity

Derivation:EE + E1 + E1 + E * E1 + 2 * E1 + 2 * 3

Parse tree:

1

E

E E+

E E*

2 3

Derivation:EE * EE * 3E + E * 3E + 2 * 31 + 2 * 3

Parse tree:

E

E E*

3E E+

1 2

Leftmost derivation Rightmost derivation

Grammar rules:E id E numE E + EE E * EE ( E )

Definition: a grammar is ambiguous if there exists an input (רב-משמעי)string that has two different derivations

Page 23: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

2323

Grammar rewritingGrammar rewritingAmbiguous grammar:E id E numE E + EE E * EE ( E )

Unambiguous grammar:E E + TE TT T * FT FF idF numF ( E )

E

E T+

T F*

3F

2

T

F

1

Derivation:EE + T1 + T1 + T * F1 + F * F1 + 2 * F1 + 2 * 3

Parse tree:

Note the difference between a language and a grammar:A grammar represents a language.A language can be represented by many grammars.

Page 24: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

2424

Parsing methods – Top DownParsing methods – Top Down

LL(k)LL(k) ““L” – left-to-right scan of inputL” – left-to-right scan of input ““L” – leftmost derivationL” – leftmost derivation ““k” – predict based on “k” look-ahead tokensk” – predict based on “k” look-ahead tokens

Predict a production for a non-terminal and “k” tokensPredict a production for a non-terminal and “k” tokens

Page 25: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

2525

Parsing methods – Bottom UpParsing methods – Bottom Up

LR(0), SLR(1), LR(1), LALR(1)LR(0), SLR(1), LR(1), LALR(1) ““L” – left-to-right scan of inputL” – left-to-right scan of input ““R” – right most derivationR” – right most derivation

Decide a production for a RHS and a lookupDecide a production for a RHS and a lookup

Page 26: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

2626

Top Down – parsingTop Down – parsingE

1 + E

E T + EE iT i

1

1 + T + E

+

1 + 2 + E

T

E

1 + 2 + 3

T + E

E

+T E

2 3

1 + 2 + 3

Page 27: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

2727

Top Down – parsingTop Down – parsing Starts with the start symbolStarts with the start symbol Tries to transform it to the inputTries to transform it to the input Also called Also called predictive parsingpredictive parsing LL(1) exampleLL(1) example

Grammar:S if E then S else S S begin S LS print EL endL ; S LE num

if 5 then print 8 else…

Token : rule Sif : S if E then S else S if E then S else S5 : E num if 5 then S else S print : print E if 5 then print E else S

Page 28: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

2828

Top Down - problemsTop Down - problems

Left RecursionLeft Recursion A A Aa Aa A A a a

Non terminationNon terminationA

AaAaa

Aaaa

Aaaaaa…..

Page 29: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

2929

Top Down - problemsTop Down - problems

Two rules cannot start with same tokenTwo rules cannot start with same token Can be solved by backtrackingCan be solved by backtracking Reduce #backtracksReduce #backtracks

E E T + E T + E E E T T

E

T

T + E

Page 30: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

3030

Top Down – solutionTop Down – solution

Two waysTwo ways Eliminate left recursionEliminate left recursion Perform left refactoringPerform left refactoring

Page 31: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

3131

Top Down – solutionTop Down – solution

Step I: left recursion removal

E E + T

E T

T T * F

T F

F id

F (E)

E T + E

T F * T

Page 32: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

3232

Top Down – solutionTop Down – solution

Step II: left factoring

E T + E

E T

T F * T

T F

F id

F (E)

E T E’E’ + E E’ εT F T’T’ * T T’ εF idF (E)

Page 33: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

3333

Top Down – left recursionTop Down – left recursion

Non-terminal with two rules starting with Non-terminal with two rules starting with same prefixsame prefix

Grammar:S if E then S else S S if E then S

Left-factored grammar:S if E then S XX εX else S

Page 34: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

3434

Bottom Up – parsingBottom Up – parsing

No problem with left recursionNo problem with left recursion Widely used in practiceWidely used in practice LR(0), SLR(1), LR(1), LALR(1)LR(0), SLR(1), LR(1), LALR(1)

We will focus only on the theory of LR(0)We will focus only on the theory of LR(0)

JavaCup implements LALR(1)JavaCup implements LALR(1)

Starts with the inputStarts with the input Attempt to rewrite it to the start symbolAttempt to rewrite it to the start symbol

Page 35: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

3535

Bottom Up – parsingBottom Up – parsing1 + (2) + (3)

E + (E) + (3)

+

E E + (E) E i

E

1 2 + 3

E

E + (3)

E

( ) ( )

E + (E)

E

E

E

E + (2) + (3)

Page 36: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

3636

Bottom Up - problemsBottom Up - problems

AmbiguityAmbiguity

E = E + EE = E + E

E = iE = i

1 + 2 + 3 -> (1 + 2) + 3 ????1 + 2 + 3 -> (1 + 2) + 3 ????

1 + 2 + 3 -> 1 + (2 + 3) ????1 + 2 + 3 -> 1 + (2 + 3) ????

Page 37: Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

3737

SummarySummary

Do PA1Do PA1Use forumUse forum

Next weekNext weekCupCupLR(0)LR(0)