Bottom-Up Parsing...Stack Implementation of a Bottom-Up Parser A bottom-up parser uses an explicit stack in its implementation The main actions are shift and reduce A bottom-up parser
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Bottom-Up Parsing Attempts to traverse a parse tree bottom up (post-order traversal) Reduces a sequence of tokens to the start symbol At each reduction step, the RHS of a production is replaced with LHS A reduction step corresponds to the reverse of a rightmost derivation Example: given the following grammar E → E + T | T T → T * F | F F → ( E ) | id
A rightmost derivation for id + id * id is shown below:
E ⇒rm E + T ⇒rm E + T * F ⇒rm E + T * id ⇒rm E + F * id ⇒rm E + id * id ⇒rm T + id * id ⇒rm F + id * id ⇒rm id + id * id
Stack Implementation of a Bottom-Up Parser A bottom-up parser uses an explicit stack in its implementation The main actions are shift and reduce
A bottom-up parser is also known as as shift-reduce parser
Four operations are defined: shift, reduce, accept, and error Shift: parser shifts the next token on the parser stack Reduce: parser reduces the RHS of a production to its LHS The handle always appears on top of the stack
Accept: parser announces a successful completion of parsing Error: parser discovers that a syntax error has occurred
The parser operates by: Shifting tokens onto the stack When a handle β is on top of stack, parser reduces β to LHS of production Parsing continues until an error is detected or input is reduced to start symbol
Example on Bottom-Up Parsing Consider the parsing of the input string id + id * id
$ $id $F $T $E $E + $E + id $E + F $E + T $E + T * $E + T * id $E + T * F $E + T $E
id + id * id $ + id * id $ + id * id $ + id * id $ + id * id $
id * id $ * id $ * id $ * id $
id $ $ $ $ $
shift reduce F → id reduce T → F reduce E → T shift shift reduce F → id reduce T → F shift shift reduce F → id reduce T → T * F reduce E → E + T accept
Stack
Input
Action
E → E + T | T T → T * F | F F → ( E ) | id
We use $ to mark the bottom of the stack as well as the end of input
LR Parsing To have an operational shift-reduce parser, we must determine:
Whether a handle appears on top of the stack The reducing production to be used The choice of actions to be made at each parsing step
LR parsing provides a solution to the above problems Is a general and efficient method of shift-reduce parsing Is used in a number of automatic parser generators
The LR(k) parsing technique was introduced by Knuth in 1965 L is for Left-to-right scanning of input R corresponds to a Rightmost derivation done in reverse k is the number of lookahead symbols used to make parsing decisions
LR Parsing – cont'd LR parsing is attractive for a number of reasons …
Is the most general deterministic parsing method known Can recognize virtually all programming language constructs Can be implemented very efficiently The class of LR grammars is a proper superset of the LL grammars Can detect a syntax error as soon as an erroneous token is encountered A LR parser can be generated by a parser generating tool
Four LR parsing techniques will be considered LR(0) : LR parsing with no lookahead token to make parsing decisions SLR(1) : Simple LR, with one token of lookahead LR(1) : Canonical LR, with one token of lookahead LALR(1) : Lookahead LR, with one token of lookahead
LALR(1) is the preferable technique used by parser generators
Driver program Same driver is used for all LR parsers
Parsing stack Contains state information, where si is state i States are obtained from grammar analysis
Parsing table, which has two parts Action section: specifies the parser actions Goto section: specifies the successor states
The parser driver receives tokens from the scanner one at a time Parser uses top state and current token to lookup parsing table Different LR analysis techniques produce different tables
LR Parsing Table Example Consider the following grammar G1 … 1: E → E + T 3: T → ID 2: E → T 4: T → ( E )
The following parsing table is obtained after grammar analysis
S1 0 1 2 3 4 5 6 7 8
State + ID ( ) $ E T Action Goto
S2 G4 G3
S1 S2
S1 S2 S5 S8
G6 G3
G7
R3 R3 R3
R2 R2 R2
R1 R1 R1 R4 R4 R4
S5 A
Entries are labeled with … Sn: Shift token and goto state n (call scanner for next token) Rn: Reduce using production n Gn: Goto state n (after reduce) A: Accept parse (terminate successfully) blank : Syntax error
LR Parser Driver Let s be the parser stack top state and t be the current input token If action[s,t] = shift n then
Push state n on the stack Call scanner to obtain next token
If action[s,t] = reduce A → X1 X2 ... Xm then Pop the top m states off the stack Let s' be the state now on top of the stack Push goto[s', A] on the stack (using the goto section of the parsing table)
If action[s,t] = accept then return If action[s,t] = error then call error handling routine All LR parsers behave the same way
The difference depends on how the parsing table is computed from a CFG
LR(0) grammars can be parsed looking only at the stack Making shift/reduce decisions without any lookahead token Based on the idea of an item or a configuration An LR(0) item consists of a production and a dot
A → X1 . . . Xi • Xi+1 . . . Xn The dot symbol • may appear anywhere on the right-hand side
Marks how much of a production has already been seen X1 . . . Xi appear on top of the stack Xi+1 . . . Xn are still expected to appear
An LR(0) state is a set of LR(0) items It is the set of all items that apply at a given point in parse
Identifying the Initial State Since the dot appears before E, an E is expected
There are two productions of E: E → E + T and E → T Either E+T or T is expected The items: E → • E + T and E → • T are added to the initial state
Since T can be expected and there are two productions for T Either ID or ( E ) can be expected The items: T → • ID and T → • ( E ) are added to the initial state
The initial state (0) is identified by the following set of items S → • E $ E → • E + T E → • T T → • ID T → • ( E ) 0
Shift Actions In state 0, we can shift either an ID or a left parenthesis
If we shift an ID, we shift the dot past the ID We obtain a new item T → ID • and a new state (state 1) If we shift a left parenthesis, we obtain T → ( • E ) Since the dot appears before E, an E is expected We add the items E → • E + T and E → • T Since the dot appears before T, we add T → • ID and T → • ( E ) The new set of items forms a new state (state 2)
In State 2, we can also shift an ID or a left parenthesis as shown
(
S → • E $ E → • E + T E → • T T → • ID T → • ( E ) 0
T → ( • E ) E → • E + T E → • T T → • ID T → • ( E ) 2
Reduce and Goto Actions In state 1, the dot appears at the end of item T → ID •
This means that ID appears on top of stack and can be reduced to T When • appears at end of an item, the parser can perform a reduce action
If ID is reduced to T, what is the next state of the parser? ID is popped from the stack; Previous state appears on top of stack T is pushed on the stack A new item E → T • and a new state (state 3) are obtained If top of stack is state 0 and we push a T, we go to state 3 Similarly, if top of stack is state 2 and we push a T, we go also to state 3
(
S → • E $ E → • E + T E → • T T → • ID T → • ( E ) 0
T → ( • E ) E → • E + T E → • T T → • ID T → • ( E ) 2 T → ID • 1
DFA of LR(0) States We complete the state diagram to obtain the DFA of LR(0) states In state 4, if next token is $, the parser accepts (successful parse)
ID
T Accept
E
S → • E $ E → • E + T E → • T T → • ID T → • ( E ) 0
T → ( • E ) E → • E + T E → • T T → • ID T → • ( E ) 2 T → ID • 1
LR(0) Parsing Table The LR(0) parsing table is obtained from the LR(0) state diagram The rows of the parsing table correspond to the LR(0) states The columns correspond to tokens and non-terminals For each state transition i → j caused by a token x …
Put Shift j at position [i, x] of the table
For each transition i → j caused by a nonterminal A … Put Goto j at position [i, A] of the table
For each state containing an item A → α • of rule n … Put Reduce n at position [i, y] for every token y
For each transition i → Accept … Put Accept at position [i, $] of the table
LR(0) Parsing Table – cont'd The LR(0) table of grammar G1 is shown below
For a shift, the token to be shifted determines the next state For a reduce, the state on top of stack specifies the production to be used
S1 0 1 2 3 4 5 6 7 8
State + ID ( ) $ E T Action Goto
S2 G4 G3
S1 S2
S1 S2 S5 S8
G6 G3
G7
R3 R3 R3
R2 R2 R2
R1 R1 R1 R4 R4 R4
S5 A
R3 R3
R2 R2
R1 R1 R4 R4
Entries are labeled with … Sn: Shift token and goto state n (call scanner for next token) Rn: Reduce using production n Gn: Goto state n (after reduce) A: Accept parse (terminate successfully) blank : Syntax error
SLR(1) Grammars SLR(1) parsing increases the power of LR(0) significantly
Lookahead token is used to make parsing decisions Reduce action is applied more selectively according to FOLLOW set
A grammar is SLR(1) if two conditions are met in every state … If A → α ● x γ and B → β ● then token x ∉ FOLLOW(B) If A → α ● and B → β ● then FOLLOW(A) ∩ FOLLOW(B) = ∅
Violation of first condition results in shift-reduce conflict A → α ● x γ and B → β ● and x ∈ FOLLOW(B) then … Parser can shift x and reduce B → β
Violation of second condition results in reduce-reduce conflict A → α ● and B → β ● and x ∈ FOLLOW(A) ∩ FOLLOW(B) Parser can reduce A → α and B → β
Limits of the SLR(1) Parsing Method Consider the following grammar G3 … 0: S' → S $ 1: S → id 2: S → V := E 3: V → id 4: E → V 5: E → n
The initial state consists of 4 items as shown below When id is shifted in state 0, we obtain 2 items: S → id • and V → id •
FOLLOW(S) = {$} and FOLLOW(V) = {:= , $} Reduce-reduce conflict in state 1 when lookahead token is $
Therefore, grammar G3 is not SLR(1) The reduce-reduce conflict is caused by the weakness of SLR(1) method V → id should be reduced only when lookahead token is := (but not $)
General LR(1) Parsing – Items and States Even more powerful than SLR(1) is the LR(1) parsing method LR(1) generalizes LR(0) by including a lookahead token in items An LR(1) item consists of …
Grammar production rule Right-hand position represented by the dot, and Lookahead token
A → X1 . . . Xi • Xi+1 . . . Xn , l where l is a lookahead token The • represents how much of the right-hand side has been seen
X1 . . . Xi appear on top of the stack Xi+1 . . . Xn are expected to appear
The lookahead token l is expected after X1 . . . Xn appear on stack An LR(1) state is a set of LR(1) items
LR(1) Parser Generation – Initial State Consider again grammar G3 … 0: S' → S $ 1: S → id 2: S → V := E 3: V → id 4: E → V 5: E → n
The initial state contains the LR(1) item: S' → • S , $ S' → • S , $ means that S is expected and to be followed by $
The closure of (S' → • S , $) produces the initial state items Since the dot appears before S, an S is expected There are two productions of S: S → id and S → V := E
The LR(1) items (S → • id , $) and (S → • V := E , $) are obtained The lookahead token is $ (end-of-file token)
Since the • appears before V in (S → • V := E , $), a V is expected
The LR(1) item ( V → • id , := ) is obtained The lookahead token is := because it appears after V in (S → • V := E , $)
Shift Action The initial state (state 0) consists of 4 items In state 0, we can shift an id
The token id can be shifted in two items When shifting id, we shift the dot past the id We obtain (S → id • , $ ) and ( V → id • , := ) The two LR(1) items form a new state (state 1) The two items are reduce items No additional item can be added to state 1
S' → • S , $ S → • id , $ S → • V := E , $ V → • id , := 0
S' → • S , $ S → • id , $ S → • V := E , $ V → • id , := 0
Reduce and Goto Actions In state 1, • appears at end of ( S → id • , $ ) and ( V → id • , := )
This means that id appears on top of stack and can be reduced Two productions can be reduced: S → id and V → id
The lookahead token eliminates the conflict of the reduce items If lookahead token is $ then id is reduced to S If lookahead token is := then id is reduced to V
When in state 0 after a reduce action … If S is pushed, we obtain item (S' → S • , $) and go to state 2 If V is pushed, we obtain item (S → V • := E , $) and go to state 3
Accept $
S' → • S , $ S → • id , $ S → • V := E , $ V → • id , := 0 V
LR(1) State Diagram The LR(1) state diagram of grammar G3 is shown below Grammar G3, which was not SLR(1), is now LR(1) The reduce-reduce conflict that existed in state 1 is now removed The lookahead token in LR(1) items eliminated the conflict
:=
Accept $
S' → • S , $ S → • id , $ S → • V := E , $ V → • id , := 0 V
id
S → V • := E , $ 3
S → id • , $ V → id • , := 1
S
S → V := E • , $ 5
S' → S • , $ 2 S → V := • E , $ E → • V , $ E → • n , $ V → • id , $ 4
LALR(1) : Look-Ahead LR(1) Preferred parsing technique in many parser generators Close in power to LR(1), but with less number of states Increased number of states in LR(1) is because
Different lookahead tokens are associated with same LR(0) items
Number of states in LALR(1) = states in LR(0) LALR(1) is based on the observation that
Some LR(1) states have same LR(0) items Differ only in lookahead tokens
LALR(1) can be obtained from LR(1) by Merging LR(1) states that have same LR(0) items Obtaining the union of the LR(1) lookahead tokens