Compiler Construction Compiler Construction Parsing I Parsing I Ran Shaham and Ohad Shacham Ran Shaham and Ohad Shacham School of Computer Science School of Computer Science Tel-Aviv University Tel-Aviv University
Compiler ConstructionCompiler Construction
Parsing IParsing I
Ran Shaham and Ohad ShachamRan Shaham and Ohad ShachamSchool of Computer ScienceSchool of Computer Science
Tel-Aviv UniversityTel-Aviv University
22
AdministrationAdministration
ForumForumhttps://https://forums.cs.tau.ac.il/viewforum.php?fforums.cs.tau.ac.il/viewforum.php?f=64=64
Project Teams Project Teams Send me an email if you can’t find a teamSend me an email if you can’t find a team Send me your team if you found one and didn’t send an emailSend me your team if you found one and didn’t send an email Check excel file on websiteCheck excel file on website
First PA is at:First PA is at: http://www.cs.tau.ac.il/research/ohad.shacham/wcc08/pa/pa1/pa1.pdfhttp://www.cs.tau.ac.il/research/ohad.shacham/wcc08/pa/pa1/pa1.pdf
33
Programming Assignment 1Programming Assignment 1
Implement a scanner for ICImplement a scanner for IC class Tokenclass Token
At least – line, id, valueAt least – line, id, value Should extend java_cup.runtime.SymbolShould extend java_cup.runtime.Symbol Numeric token ids in Numeric token ids in sym.javasym.java
Will be later generated by JavaCupWill be later generated by JavaCup
class Compilerclass Compiler Testbed - calls scanner to print list of tokensTestbed - calls scanner to print list of tokens [StateList] <<EOF>> { return appropriate symbol } [StateList] <<EOF>> { return appropriate symbol }
44
Programming Assignment 1Programming Assignment 1
class LexicalErrorclass LexicalErrorCaught by CompilerCaught by Compiler
AssumeAssume class identifiers starts with a capital letterclass identifiers starts with a capital letterOther identifiers starts with a non capital letterOther identifiers starts with a non capital letter
55
sym.javasym.java
public class sym {public class sym {
public static final int EOF = 0;public static final int EOF = 0;
public static final int ID = 1;public static final int ID = 1;
......
}}
Defines symbol constant ids Communicate between parser and scanner Actual values don’t matter
Unique value for each tokes
Will be generated by cup in PA2
66
Token classToken class
import java_cup.runtime.Symbol;import java_cup.runtime.Symbol;
public class Token extends Symbol {public class Token extends Symbol {
public int getId() {...}public int getId() {...}
public Object getValue() {...}public Object getValue() {...} public int getLine() {...} public int getLine() {...}
......
}}
77
JFlex directives to useJFlex directives to use
%cup%cup (integrate with cup)(integrate with cup)
%line%line (count lines)(count lines)
%type Token%type Token (pass type Token)(pass type Token)
%class Lexer%class Lexer (gen. scanner class)(gen. scanner class)
88
%cup%cup
%implements java_cup.runtime.Scanner%implements java_cup.runtime.Scanner Lex class implements java_cup.runtime.ScannerLex class implements java_cup.runtime.Scanner
%function next_token %function next_token Returns the next tokenReturns the next token
%type java_cup.runtime.Symbol%type java_cup.runtime.Symbol Return token ClassReturn token Class
99
StructureStructure
JFlex javacIC.lexLexical analyzer
test.ic
tokens
Lexer.java
sym.javaToken.java
LexicalError.javaCompiler.java
1010
DirectionsDirections
Download JavaDownload Java Download JFlexDownload JFlex Download JavaCupDownload JavaCup Put JFlex and JavaCup in classpathPut JFlex and JavaCup in classpath EclipseEclipse
Use ant build.xmlUse ant build.xml Import jflex and javacupImport jflex and javacup
Apache AntApache Ant
1111
DirectionsDirections
Use skeleton from the websiteUse skeleton from the website Read AssignmentRead Assignment Use ForumUse Forum
1212
ToolsTools
AntAntMake environmentMake environmentA build.xml included in the skeletonA build.xml included in the skeletonDownload from:Download from:
http://ant.apache.orghttp://ant.apache.org
Use:Use:ant – to compileant – to compileant scanner – to run JFlexant scanner – to run JFlex
1313
ToolsTools
JFlexJFlexLexical analyzer generatorLexical analyzer generatorDownload from:Download from:
http://jflex.de/http://jflex.de/
Manual: Manual: http://http://jflex.de/manual.pdfjflex.de/manual.pdfAdd $MyJFlex/lib/JFlex.jar to your classpathAdd $MyJFlex/lib/JFlex.jar to your classpathUse:Use:
java JFlex.Main IC.lexjava JFlex.Main IC.lexant scanner – for ant usersant scanner – for ant users
1414
ToolsTools CupCup
Parser generatorParser generator Download from:Download from:
http://www2.cs.tum.edu/projects/cup/http://www2.cs.tum.edu/projects/cup/
Manual:Manual:http://www2.cs.tum.edu/projects/cup/manual.htmlhttp://www2.cs.tum.edu/projects/cup/manual.html
Put java-cup-11a.jar and java-cup-11a-runtime.jar in your classpathPut java-cup-11a.jar and java-cup-11a-runtime.jar in your classpath
Use:Use: java -jar java-cup-11a.jar <your file.cup>java -jar java-cup-11a.jar <your file.cup> ant libparser – for ant usersant libparser – for ant users
1515
Compiler
ICProgram
ic
x86 executable
exeLexicalAnalysi
s
Syntax Analysi
s
Parsing
AST Symbol
Tableetc.
Inter.Rep.(IR)
CodeGeneration
IC compilerIC compiler
1616
ParsingParsing
Input:Input: Sequence of TokensSequence of Tokens
Output:Output: Abstract Syntax TreeAbstract Syntax Tree
Decide whether program satisfies syntactic structureDecide whether program satisfies syntactic structure
1717
Parsing errors Parsing errors
Error detectionError detection Report the most relevant error messageReport the most relevant error message Correct line numberCorrect line number Current v.s. expected tokenCurrent v.s. expected token
Error recoveryError recovery Recover and continue to the next errorRecover and continue to the next error Heuristics for good recovery to avoid many spurious errorsHeuristics for good recovery to avoid many spurious errors
Search for a semi-column and ignore the statementSearch for a semi-column and ignore the statement Ignore the next n errorsIgnore the next n errors
1818
ParsingParsing
Context Free Grammars (CFG)Context Free Grammars (CFG)
Captures program structure (hierarchy)Captures program structure (hierarchy) Employ formal theory resultsEmploy formal theory results Automatically create “efficient” parsersAutomatically create “efficient” parsers
Grammar:S if E then S else S S print EE num
1919
From text to abstract syntaxFrom text to abstract syntax5 + (7 * x)
numnum++((numnum**idid))
Lexical Analyzer
program text
token stream
Parser
Grammar:E id E numE E + EE E * EE ( E ) num(5)
E
E E+
E * E
( E )
num(7) id(x)
+
Num(5)
Num(7) id(x)
*Abstract syntax tree
parse tree
validsyntaxerror
2020
From text to abstract syntaxFrom text to abstract syntax
numnum++((numnum**idid))token stream
Parser
Grammar:E id E numE E + EE E * EE ( E ) num
E
E E+
E * E
( E )
num id
+
num
num x
*Abstract syntax tree
parse tree
validsyntaxerror
Note: a parse tree describes a run of the parser,an abstract syntax tree is the result of a successful run
2121
Parsing terminologyParsing terminologySymbols סימנים)): terminals (tokens) + * ( ) id numnon-terminals E
Derivation (גזירה):EE + E1 + E1 + E * E1 + 2 * E1 + 2 * 3
Parse tree (עץ גזירה):
1
E
E E+
E E*
2 3
Grammar rules :( חוקי(דקדוקE id E numE E + EE E * EE ( E )
Convention: the non-terminal appearing in the first derivation rule is defined to be the initial non-terminal
Each step in a derivation is called a production
2222
AmbiguityAmbiguity
Derivation:EE + E1 + E1 + E * E1 + 2 * E1 + 2 * 3
Parse tree:
1
E
E E+
E E*
2 3
Derivation:EE * EE * 3E + E * 3E + 2 * 31 + 2 * 3
Parse tree:
E
E E*
3E E+
1 2
Leftmost derivation Rightmost derivation
Grammar rules:E id E numE E + EE E * EE ( E )
Definition: a grammar is ambiguous if there exists an input (רב-משמעי)string that has two different derivations
2323
Grammar rewritingGrammar rewritingAmbiguous grammar:E id E numE E + EE E * EE ( E )
Unambiguous grammar:E E + TE TT T * FT FF idF numF ( E )
E
E T+
T F*
3F
2
T
F
1
Derivation:EE + T1 + T1 + T * F1 + F * F1 + 2 * F1 + 2 * 3
Parse tree:
Note the difference between a language and a grammar:A grammar represents a language.A language can be represented by many grammars.
2424
Parsing methods – Top DownParsing methods – Top Down
LL(k)LL(k) ““L” – left-to-right scan of inputL” – left-to-right scan of input ““L” – leftmost derivationL” – leftmost derivation ““k” – predict based on “k” look-ahead tokensk” – predict based on “k” look-ahead tokens
Predict a production for a non-terminal and “k” tokensPredict a production for a non-terminal and “k” tokens
2525
Parsing methods – Bottom UpParsing methods – Bottom Up
LR(0), SLR(1), LR(1), LALR(1)LR(0), SLR(1), LR(1), LALR(1) ““L” – left-to-right scan of inputL” – left-to-right scan of input ““R” – right most derivationR” – right most derivation
Decide a production for a RHS and a lookupDecide a production for a RHS and a lookup
2626
Top Down – parsingTop Down – parsingE
1 + E
E T + EE iT i
1
1 + T + E
+
1 + 2 + E
T
E
1 + 2 + 3
T + E
E
+T E
2 3
1 + 2 + 3
2727
Top Down – parsingTop Down – parsing Starts with the start symbolStarts with the start symbol Tries to transform it to the inputTries to transform it to the input Also called Also called predictive parsingpredictive parsing LL(1) exampleLL(1) example
Grammar:S if E then S else S S begin S LS print EL endL ; S LE num
if 5 then print 8 else…
Token : rule Sif : S if E then S else S if E then S else S5 : E num if 5 then S else S print : print E if 5 then print E else S
…
2828
Top Down - problemsTop Down - problems
Left RecursionLeft Recursion A A Aa Aa A A a a
Non terminationNon terminationA
AaAaa
Aaaa
Aaaaaa…..
…
2929
Top Down - problemsTop Down - problems
Two rules cannot start with same tokenTwo rules cannot start with same token Can be solved by backtrackingCan be solved by backtracking Reduce #backtracksReduce #backtracks
E E T + E T + E E E T T
E
T
T + E
3030
Top Down – solutionTop Down – solution
Two waysTwo ways Eliminate left recursionEliminate left recursion Perform left refactoringPerform left refactoring
3131
Top Down – solutionTop Down – solution
Step I: left recursion removal
E E + T
E T
T T * F
T F
F id
F (E)
E T + E
T F * T
3232
Top Down – solutionTop Down – solution
Step II: left factoring
E T + E
E T
T F * T
T F
F id
F (E)
E T E’E’ + E E’ εT F T’T’ * T T’ εF idF (E)
3333
Top Down – left recursionTop Down – left recursion
Non-terminal with two rules starting with Non-terminal with two rules starting with same prefixsame prefix
Grammar:S if E then S else S S if E then S
Left-factored grammar:S if E then S XX εX else S
3434
Bottom Up – parsingBottom Up – parsing
No problem with left recursionNo problem with left recursion Widely used in practiceWidely used in practice LR(0), SLR(1), LR(1), LALR(1)LR(0), SLR(1), LR(1), LALR(1)
We will focus only on the theory of LR(0)We will focus only on the theory of LR(0)
JavaCup implements LALR(1)JavaCup implements LALR(1)
Starts with the inputStarts with the input Attempt to rewrite it to the start symbolAttempt to rewrite it to the start symbol
3535
Bottom Up – parsingBottom Up – parsing1 + (2) + (3)
E + (E) + (3)
+
E E + (E) E i
E
1 2 + 3
E
E + (3)
E
( ) ( )
E + (E)
E
E
E
E + (2) + (3)
3636
Bottom Up - problemsBottom Up - problems
AmbiguityAmbiguity
E = E + EE = E + E
E = iE = i
1 + 2 + 3 -> (1 + 2) + 3 ????1 + 2 + 3 -> (1 + 2) + 3 ????
1 + 2 + 3 -> 1 + (2 + 3) ????1 + 2 + 3 -> 1 + (2 + 3) ????