Top Banner
Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003
82

Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Dec 28, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Topic #4: Syntactic Analysis (Parsing)

EE 456 – Compiling Techniques

Prof. Carl Sable

Fall 2003

Page 2: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Lexical Analyzer and Parser

Page 3: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Parser

• Accepts string of tokens from lexical analyzer (usually one token at a time)

• Verifies whether or not string can be generated by grammar

• Reports syntax errors (recovers if possible)

Page 4: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Errors

• Lexical errors (e.g. misspelled word)

• Syntax errors (e.g. unbalanced parentheses, missing semicolon)

• Semantic errors (e.g. type errors)

• Logical errors (e.g. infinite recursion)

Page 5: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Error Handling

• Report errors clearly and accurately

• Recover quickly if possible

• Poor error recover may lead to avalanche of errors

Page 6: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Error Recovery

• Panic mode: discard tokens one at a time until a synchronizing token is found

• Phrase-level recovery: Perform local correction that allows parsing to continue

• Error Productions: Augment grammar to handle predicted, common errors

• Global Production: Use a complex algorithm to compute least-cost sequence of changes leading to parseable code

Page 7: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Context Free Grammars

• CFGs can represent recursive constructs that regular expressions can not

• A CFG consists of:– Tokens (terminals, symbols)– Nonterminals (syntactic variables denoting sets of

strings)– Productions (rules specifying how terminals and

nonterminals can combine to form strings)– A start symbol (the set of strings it denotes is the

language of the grammar)

Page 8: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Derivations (Part 1)

• One definition of language: the set of strings that have valid parse trees

• Another definition: the set of strings that can be derived from the start symbol

E E + E | E * E | (E) | – E | id

E => -E (read E derives –E)

E => -E => -(E) => -(id)

Page 9: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Derivations (Part 2)

• αAβ => αγβ if A γ is a production and α and β are arbitrary strings of grammar symbols

• If a1 => a2 => … => an, we say a1 derives an

• => means derives in one step• *=> means derives in zero or more steps• +=> means derives in one or more steps

Page 10: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Sentences and Languages

• Let L(G) be the language generated by the grammar G with start symbol S:– Strings in L(G) may contain only tokens of G– A string w is in L(G) if and only if S +=> w– Such a string w is a sentence of G

• Any language that can be generated by a CFG is said to be a context-free language

• If two grammars generate the same language, they are said to be equivalent

Page 11: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Sentential Forms

• If S *=> α, where α may contain nonterminals, we say that α is a sentential form of G

• A sentence is a sentential form with no nonterminals

Page 12: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Leftmost Derivations

• Only the leftmost nonterminal in any sentential form is replaced at each step

• A leftmost step can be written as wAγ lm=> wδγ– w consists of only terminals– γ is a string of grammar symbols

• If α derives β by a leftmost derivation, then we write α lm*=> β

• If S lm*=> α then we say that α is a left-

sentential form of the grammar• Analogous terms exist for rightmost derivations

Page 13: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Parse Trees

• A parse tree can be viewed as a graphical representation of a derivation

• Every parse tree has a unique leftmost derivation (not true of every sentence)

• An ambiguous grammars has:– more than one parse tree for at least one

sentence– more than one leftmost derivation for at least

one sentence

Page 14: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Capability of Grammars

• Can describe most programming language constructs

• An exception: requiring that variables are declared before they are used– Therefore, grammar accepts superset of

actual language– Later phase (semantic analysis) does type

checking

Page 15: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Regular Expressions vs. CFGs

• Every construct that can be described by an RE and also be described by a CFG

• Why use REs at all?– Lexical rules are simpler to describe this way– REs are often easier to read– More efficient lexical analyzers can be

constructed

Page 16: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Verifying Grammars

• A proof that a grammar verifies a language has two parts:– Must show that every string generated by the

grammar is part of the language– Must show that every string that is part of the

language can be generated by the grammar

• Rarely done for complete programming languages!

Page 17: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Eliminating Ambiguity (1)

stmt if expr then stmt

| if expr then stmt else stmt

| other

if E1 then if E2 then S1 else S2

Page 18: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Eliminating Ambiguity (2)

Page 19: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Eliminating Ambiguity (3)

stmt matched

| unmatched

matched if expr then matched else matched

| other

unmatched if expr then stmt

| if expr then matched else unmatched

Page 20: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Left Recursion

• A grammar is left recursive if for any nonterminal A such that there exists any derivation A +=> Aα for any string α

• Most top-down parsing methods can not handle left-recursive grammars

Page 21: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Eliminating Left Recursion (1)

A Aα1 | Aα2 | … | Aαm | β1 | β2 | … | βn

A β1A’ | β2A’ | … | βnA’A’ α1A’ | α2A’ | … | αmA’ | ε

Harder case:S Aa | bA Ac | Sd | ε

Page 22: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Eliminating Left Recursion (2)

• First arrange the nonterminals in some order A1, A2, … An

• Apply the following algorithm:

for i = 1 to n { for j = 1 to i-1 { replace each production of the form Ai Ajγ by the productions Ai δ1γ | δ2γ | … | δkγ, where Aj δ1 | δ2 | … | δk are the Aj productions } eliminate the left recursion among Ai productions}

Page 23: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Left Factoring

• Rewriting productions to delay decisions

• Helpful for predictive parsing

• Not guaranteed to remove ambiguity

A αβ1 | αβ2

A αA’A’ β1 | β2

Page 24: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Limitations of CFGs

• Can not verify repeated strings– Example: L1 = {wcw | w is in (a|b)*}– Abstracts checking that variables are declared

• Can not verify repeated counts– Example: L2 = {anbmcndm | n≥1 & m≥1}– Abstracts checking that number of formal and

actual parameters are equal

• Therefore, some checks put off until semantic analysis

Page 25: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Top Down Parsing

• Can be viewed two ways:– Attempt to find leftmost derivation for input

string– Attempt to create parse tree, starting from at

root, creating nodes in preorder

• General form is recursive descent parsing– May require backtracking– Backtracking parsers not used frequently

because not needed

Page 26: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Predictive Parsing

• A special case of recursive-descent parsing that does not require backtracking

• Must always know which production to use based on current input symbol

• Can often create appropriate grammar:– removing left-recursion– left factoring the resulting grammar

Page 27: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Transition Diagrams

• For parser:– One diagram for each nonterminal– Edge labels can be tokens or nonterminal

• A transition on a token means we should take that transition if token is next input symbol

• A transition on a nonterminal can be thought of as a call to a procedure for that nonterminal

• As opposed to lexical analyzers:– One (or more) diagrams for each token– Labels are symbols of input alphabet

Page 28: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Creating Transition Diagrams

• First eliminate left recursion from grammar

• Then left factor grammar

• For each nonterminal A:– Create an initial and final state

– For every production A X1X2…Xn, create a path from initial to final state with edges labeled X1, X2, …, Xn.

Page 29: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Using Transition Diagrams

• Predictive parsers:– Start at start symbol of grammar– From state s with edge to state t labeled with token a, if next input token is a:

• State changes to t• Input cursor moves one position right

– If edge labeled by nonterminal A:• State changes to start state for A• Input cursor is not moved• If final state of A reached, then state changes to t

– If edge labeled by ε, state changes to t• Can be recursive or non-recursive using stack

Page 30: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Transition Diagram Example

E E + T | TT T * F | FF (E) | id

E TE’E’ +TE’ | εT FT’T’ *FT’ | εF (E) | id

E:

F:

T’:

T:

E’:

Page 31: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Simplifying Transition Diagrams

E’: E:

Page 32: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Nonrecursive Predictive Parsing (1)

Input

Stack

Page 33: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Nonrecursive Predictive Parsing (2)

• Program considers X, the symbol on top of the stack, and a, the next input symbol

• If X = a = $, parser halts successfully

• if X = a ≠ $, parser pops X off stack and advances to next input symbol

• If X is a nonterminal, the program consults M[X, a] (production or error entry)

Page 34: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Nonrecursive Predictive Parsing (3)

• Initialize stack with start symbol of grammar

• Initialize input pointer to first symbol of input

• After consulting parsing table:– If entry is production, parser replaces top

entry of stack with right side of production (leftmost symbol on top)

– Otherwise, an error recovery routine is called

Page 35: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Predictive Parsing Table

Nonter-minal

Input Symbol

id + * ( ) $

E ETE’ ETE’

E’ E’+TE’ E’ε E’ε

T TFT’ TFT’

T’ T’ε T’*FT’ T’ε T’ε

F Fid F(E)

Page 36: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Using a Predictive Parsing Table

Stack Input Output

$E id+id*id$

$E’T id+id*id$ ETE’

$E’T’F id+id*id$ TFT’

$E’T’id id+id*id$ Fid

$E’T’ +id*id$

$E’ +id*id$ T’ε

$E’T+ +id*id$ E’+TE’

$E’T id*id$

$E’T’F id*id$ TFT’

Stack Input Output

… … …

$E’T’id id*id$ Fid

$E’T’ *id$

$E’T’F* *id$ T’*FT’

$E’T’F id$

$E’T’id id$ Fid

$E’T’ $

$E’ $ T’ ε

$ $ E’ ε

Page 37: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

FIRST

• FIRST(α) is the set of all terminals that begin any string derived from α

• Computing FIRST:– If X is a terminal, FIRST(X) = {X}– If Xε is a production, add ε to FIRST(X)– If X is a nonterminal and XY1Y2…Yn is a

production:• For all terminals a, add a to FIRST(X) if a is a member of

any FIRST(Yi) and ε is a member of FIRST(Y1), FIRST(Y2), … FIRST(Yi-1)

• If ε is a member of FIRST(Y1), FIRST(Y2), … FIRST(Yn), add ε to FIRST(X)

Page 38: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

FOLLOW

• FOLLOW(A), for any nonterminal A, is the set of terminals a that can appear immediately to the right if A in some sentential form

• More formally, a is in FOLLOW(A) if and only if there exists a derivation of the form S *=>αAaβ

• $ is in FOLLOW(A) if and only if there exists a derivation of the form S *=> αA

Page 39: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Computing FOLLOW

• Place $ in FOLLOW(S)• If there is a production A αBβ, then

everything in FIRST(β) (except for ε) is in FOLLOW(B)

• If there is a production A αB, or a production A αBβ where FIRST(β) contains ε,then everything in FOLLOW(A) is also in FOLLOW(B)

Page 40: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

FIRST and FOLLOW Example

E TE’E’ +TE’ | εT FT’T’ *FT’ | εF (E) | id

FIRST(E) = FIRST(T) = FIRST(F) = {(, id}FIRST(E’) = {+, ε}FIRST(T’) = {*, ε}FOLLOW(E) = FOLLOW(E’) = {), $}FOLLOW(T) = FOLLOW(T’) = {+, ), $}FOLLOW(F) = {+, *, $}

Page 41: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Creating a Predictive Parsing Table

• For each production A α :– For each terminal a in FIRST(α) add A α

to M[A, a]– If ε is in FIRST(α) add A α to M[A, b]

for every terminal b in FOLLOW(A)– If ε is in FIRST(α) and $ is in FOLLOW(A)

add A α to M[A, $]

• Mark each undefined entry of M as an error entry (use some recovery strategy)

Page 42: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Multiply-Defined Entries Example

S iEtSS’ | aS’ eS | εE b

Nonter-minal

Input Symbol

a b i t e $

S S a S iEtSS’

S’S’ ε

S’ eSS’ ε

E E b

Page 43: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

LL(1) Grammars (1)

• Algorithm covered in class can be applied to any grammar to produce a parsing table

• If parsing table has no multiply-defined entries, grammar is said to be “LL(1)”– First “L”, left-to-right scanning of input– Second “L”, produces leftmost derivation– “1” refers to the number of lookahead symbols

needed to make decisions

Page 44: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

LL(1) Grammars (2)

• No ambiguous or left-recursive grammar can be LL(1)

• Eliminating left recursion and left factoring does not always lead to LL(1) grammar

• Some grammars can not be transformed into an LL(1) grammar at all

• Although the example of a non-LL(1) grammar we covered has a fix, there are no universal rules to handle cases like this

Page 45: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Shift-Reduce Parsing

• One simple form of bottom-up parsing is shift-reduce parsing

• Starts at the bottom (leaves, terminals) and works its way up to the top (root, start symbol)

• Each step is a “reduction”:– Substring of input matching the right side of a

production is “reduced”– Replaced with the nonterminal on the left of the

production

• If all substrings are chosen correctly, a rightmost derivation is traced in reverse

Page 46: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Shift-Reduce Parsing Example

S aABeA Abc | bB -> d

abbcdeaAbcdeaAdeaABeS

S rm=> aABe rm=>aAde rm=>aAbcde rm=> abbcde

Page 47: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Handles (1)

• Informally, a “handle” of a string:– Is a substring of the string– Matches the right side of a production– Reduction to left side of production is one step along

reverse of rightmost derivation

• Leftmost substring matching right side of production is not necessarily a handle– Might not be able to reduce resulting string to start

symbol– In example from previous slide, if reduce aAbcde to aAAcde, can not reduce this to S

Page 48: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Handles (2)

• Formally, a handle of a right-sentential form γ:– Is a production A β and a position of γ where β

may be found and replaced with A– Replacing A by β leads to the previous right-sentential

form in a rightmost derivation of γ

• So if S rm*=> αAw rm=> αβw then A β in

the position following α is a handle of αβw• The string w to the right of the handle contains

only terminals• Can be more than one handle if grammar is

ambiguous (more than one rightmost derivation)

Page 49: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Ambiguity and Handles Example

E E + EE E * EE (E)E id

E rm=> E + E rm=> E + E * E rm=> E + E * id3

rm=> E + id2 * id3

rm=> id1 + id2 * id3

E rm=> E * E rm=> E * id3

rm=> E + E * id3

rm=> E + id2 * id3

rm=> id1 + id2 * id3

Page 50: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Handle Pruning

• Repeat the following process, starting from string of tokens until obtain start symbol:– Locate handle in current right-sentential form– Replace handle with left side of appropriate

production

• Two problems that need to be solved:– How to locate handle– How to choose appropriate production

Page 51: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Shift-Reduce Parsing

• Data structures include a stack and an input buffer– Stack holds grammar symbols and starts off empty– Input buffer holds the string w to be parsed

• Parser shifts input symbols onto stack until a handle β is on top of the stack– Handle is reduced to the left side of appropriate

production– If stack contains only start symbol and input is empty,

this indicates success

Page 52: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Actions of a Shift-Reduce Parser

• Shift – the next input symbol is shifted onto the top of the stack

• Reduce – The parser reduces the handle at the top of the stack to a nonterminal (the left side of the appropriate production)

• Accept – The parser announces success

• Error – The parser discovers a syntax error and calls a recovery routine

Page 53: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Shift Reduce Parsing Example

Stack Input Action

$ id1 + id2 * id3$ shift

$id1 + id2 * id3$ reduce by E id

$E + id2 * id3$ shift

$E + id2 * id3$ shift

$E + id2 * id3$ reduce by E id

$E + E * id3$ shift

$E + E * id3$ shift

$E + E * id3 $ reduce by E id

$E + E * E $ reduce by E E * E

$E + E $ reduce by E E + E

$E $ accept

Page 54: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Viable Prefixes

• Two definitions of a viable prefix:– A prefix of a right sentential form that can

appear on a stack during shift-reduce parsing– A prefix of a right-sentential form that does

not continue past the right end of the rightmost handle

• Can always add tokens to the end of a viable prefix to obtain a right-sentential form

Page 55: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Conflicts in Shift-Reduce Parsing

• There are grammars for which shift-reduce parsing can not be used

• Shift/reduce conflict: can not decide whether to shift or reduce

• Reduce/reduce conflict: can not decide which of multiple possible reductions to make

• Sometimes can add rule to adapt for use with ambiguous grammar

Page 56: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Operator-Precedence Parsing

• A form of shift-reduce parsing that can apply to certain simple grammars– No productions can have right side ε– No right side can have two adjacent

nonterminals– Other essential requirements must be met

• Once the parser is built (often by hand), the grammar can be effectively ignored

Page 57: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Precedence Relations

Relation Meaning

a <· b a "yields precedence to" b

a ·= b a "has the same precedence as" b

a ·> b a "takes precedence over" b

Page 58: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Using Precedence Relations (1)

• Can be thought of as delimiting handles:– <· Marks left end of handle– ·> Appears in the interior of handle– ·= Marks right end of handle

• Consider right-sentential β0a1β1β1…anBn:– Each βi is either single nonterminal or ε– Each ai is a single token– Suppose that exactly one precedence relation

will hold for each ai, ai+1 pair

Page 59: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Using Precedence Relations (2)

• Mark beginning and end of string with $• Remove the nonterminals• Insert correct precedence relation between each

pair of terminals

id + * $

id ·> ·> ·>

+ <· ·> <· ·>

* <· ·> ·> ·>

$ <· <· <·

id + id * id

$ <· id ·> + <· id ·> * <· id ·> $

Page 60: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Using Precedence Relations (3)

• To find the current handle:– Scan the string from the left until the first ·>

is encountered– Scan backwards (left) from there until a <· is

encountered– Everything in between, including intervening

or surrounding nonterminals, is the handle

• The nonterminals do not influence the parse!

Page 61: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Implementing the Algorithmset ip to point to the first symbol in w$initialize stack to $repeat forever if $ on top of stack in ip points to $ return success else let a be topmost symbol on stack let b be symbol pointed to by ip if a <· b or a ·= b push b onto stack advance ip to next input symbol else if a ·> b repeat pop x until top symbol on stack <· x else error()

Page 62: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Precedence and Associativity (1)

• For grammars describing arithmetic expressions:– Can construct table of operator-precedence

relations automatically– Heuristic based on precedence and

associativity of operators

• Selects proper handles, even if grammar is ambiguous

Page 63: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Precedence and Associativity (2)

• If operator θ1 has higher precedence than operator θ2, make θ1 ·> θ2 and θ2 <· θ1

• If θ1 and θ2 are of equal precedence:

– If they are left associative, make θ1 ·> θ2 and θ2 ·> θ1

– If they are right associative, make θ1 <· θ2 and θ2 <· θ1

Page 64: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Precedence and Associativity (3)

• For all operators θ:– θ <· id– id ·> θ– θ <· (– ( <· θ– ) ·> θ– θ ·> )– θ ·> $– $ <· θ

• Also let:– (·= )– ( <· (– ( <· id– $ <· (– id ·> $– id ·> )– $ <· id– ) ·> $– ) ·> )

Page 65: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Operator Grammar Example

• ^ is of highest precedence and is right-associative

• * and / are of next highest precedence and are left-associative

• + and – are of lowest precedence and are left-associative

E E + E | E – E | E * E | E / E

| E ^ E | (E) | -E | id

Page 66: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Computed Precedence Relations

+ - * / ^ id ( ) $

+ ·> ·> <· <· <· <· <· ·> ·>

- ·> ·> <· <· <· <· <· ·> ·>

* ·> ·> ·> ·> <· <· <· ·> ·>

/ ·> ·> ·> ·> <· <· <· ·> ·>

^ ·> ·> ·> ·> <· <· <· ·> ·>

id ·> ·> ·> ·> ·> ·> ·>

( <· <· <· <· <· <· <· ·=

) ·> ·> ·> ·> ·> ·> ·>

$ <· <· <· <· <· <· <·

Page 67: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Handling Unary Operators

• If unary operator θ is not also a binary operator:– Incorporate θ into table

– For all θn:

• Make θn <· θ no matter what

• If θ has higher precedence than θn, make θ ·> θn

• Otherwise, make θ <· θn

• Otherwise the easiest thing to do is create second lexical symbol (token)– Lexical analyzer must distinguish one from the other– Can't use lookahead, must rely on previous token

Page 68: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Precedence Functions (1)

• Do not need to store entire table of precedence relations

• Select two precedence functions f and g :– f(a) < g(b) whenever a <· b– f(a) = g(b) whenever a ·= b– f(a) > g(b) whenever a ·> b

+ - * / ^ ( ) id $

f 2 2 4 4 4 0 6 6 0

g 1 1 3 3 5 5 0 5 0

Page 69: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Precedence Functions (2)

• Precedence relation between a and b is determined by comparing f(a) to g(b)

• Loss of error detection capability (errors caught later when no reduction for handle is found)

• It is not always possible to construct valid precedence functions

• When it is possible, functions can be computed automatically

Page 70: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Precedence Functions Algorithm

• Create symbols fa and fg for all tokens and $• If a ·= b then fa and gb must be in same group• Partition symbols into as many groups as possible• For all cases where a <· b, draw edge from group of gb

to group of fa• For all cases where a ·> b, draw edge from group of fa

to group of gb

• If graph has cycles, no precedence functions exist• Otherwise:

– f(a) is the length of the longest path beginning at group of fa

– g(a) is the length of the longest path beginning at group of ga

Page 71: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Precedence Functions Example

id + * $

id ·> ·> ·>

+ <· ·> <· ·>

* <· ·> ·> ·>

$ <· <· <·

+ * id $

f 2 4 4 0

g 1 3 5 0

Page 72: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Detecting and Handling Errors

• Errors can occur at two points:– If no precedence relation holds between the terminal

on top of stack and current input– If a handle has been found, but no production is found

with this handle as right side

• Errors during reductions can be handled with diagnostic message

• Errors due to lack of precedence relation can be handled by recovery routines specified in table

Page 73: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

LR Parsers

• LR Parsers us an efficient, bottom-up parsing technique useful for a large class of CFGs

• Too difficult to construct by hand, but automatic generators to create them exist (e.g. Yacc)

• LR(k) grammars– “L” refers to left-to-right scanning of input– “R” refers to rightmost derivation (produced in reverse

order)– “k” refers to the number of lookahead symbols

needed for decisions (if omitted, assumed to be 1)

Page 74: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Benefits of LR Parsing

• Can be constructed to recognize virtually all programming language construct for which a CFG can be written

• Most general non-backtracking shift-reduce parsing method known

• Can be implemented efficiently• Handles a class of grammars that is a superset

of those handled by predictive parsing• Can detect syntactic errors as soon as possible

with a left-to-right scan of input

Page 75: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Model of LR Parser

Stack

Input

Page 76: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

LR Parser (1)

• Driver program is the same for all LR Parsers

• Stack consists of states (si) and grammar symbols (Xi)– Each state summarizes information contained in stack

below it– Grammar symbols do not actually need to be stored

on stack in most implementations

• State symbol on top of stack and next input symbol used to determine shift/reduce decision

Page 77: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

LR Parser (2)

• Parsing table includes action function and goto function

• Action function– Based on state and next input symbol– Actions are shift, reduce, accept or error

• Goto function– Based on state and grammar symbol– Produces next state

Page 78: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

LR Parser (3)

• Configuration (s0X1s1…Xmsm,aiai+1…an$) indicates right-sentential form X1X2…Xmaiai+1…an

• If action[sm,ai] = shift s, enter configuration (s0X1s1…Xmsmais,ai+1…an$)

• If action[sm,ai] = reduce A B, enter configuration (s0X1s1…Xm-rsm-rAs, ai+1…an$), where s = goto[sm-r,A] and r is length of B

• If action[sm,ai] = accept, signal success

• If action[sm,ai] = error, try error recovery

Page 79: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

LR Parsing Algorithmset ip to point to the first symbol in w$initialize stack to s0

repeat forever let s be topmost state on stack let a be symbol pointed to by ip if action[s,a] = shift s’ push a then s’ onto stack advance ip to next input symbol else if action[s,a] = reduce A B pop 2*|B| symbols of stack let s’ be state now on top of stack push A then goto[s’,A] onto stack output production A B else if action[s,a] == accept return success else error()

Page 80: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

LR Parsing Table Example

stateaction goto

id + * ( ) $ E T F

0 s5 s4 1 2 3

1 s6 acc

2 r2 s7 r2 r2

3 r4 r4 r4 r4

4 s5 s4 8 2 3

5 r6 r6 r6 r6

6 s5 s4 9 3

7 s5 s4 10

8 s6 s11

9 r1 s7 r1 r1

10 r3 r3 r3 r3

11 r5 r5 r5 r5

(1) E E + T(2) E T(3) T T * F(4) T F(5) F (E)(6) F id

Page 81: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

LR Parsing ExampleStack Input Action

(1) s0 id * id + id $ shift

(2) s0 id s5 * id + id $ reduce by F id

(3) s0 F s3 * id + id $ reduce by T F

(4) s0 T s2 * id + id $ shift

(5) s0 T s2 * s7 id + id $ shift

(6) s0 T s2 * s7 id s5 + id $ reduce by F id

(7) s0 T s2 * s7 F s10 + id $ reduce by T T * F

(8) S0 T s2 + id $ reduce by E T

(9) s0 E s1 + id $ shift

(10) s0 E s1 + s6 id $ shift

(11) s0 E s1 + s6 id s5 $ reduce by F id

(12) s0 E s1 + s6 F s3 $ reduce by T F

(13) s0 E s1 + s6 T s9 $ reduce by E E + T

(14) s0 E s1 $ accept

Page 82: Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Constructing LR Parsing Tables

• Three methods:– SLR (simple LR)

• Not all that simple (but simpler than other two)!• Weakest of three methods, easiest to implement

– Constructing canonical LR parsing tables• Most general of methods• Constructed tables can be quite large

– LALR parsing table (lookahead LR)• Tables smaller than canonical LR• Most programming language constructs can be handled

• We will not cover any of these methods in class– Too much detail– Yacc will take care of it!