Top Banner
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4
25

1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

1

CIS 461Compiler Design & Construction

Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and

Linda Torczon

Lecture-Module #12Parsing 4

Page 2: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

2

Parsing Techniques

Top-down parsers (LL(1), recursive descent)

• Start at the root of the parse tree from the start symbol and grow toward leaves (similar to a derivation)

• Pick a production and try to match the input

• Bad “pick” may need to backtrack

• Some grammars are backtrack-free (predictive parsing)

Bottom-up parsers (LR(1), operator precedence)

• Start at the leaves and grow toward root

• We can think of the process as reducing the input string to the start symbol

• At each reduction step a particular substring matching the right-side of a production is replaced by the symbol on the left-side of the production

• Bottom-up parsers handle a large class of grammars

Page 3: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

3

S

lookahead input string

S

A B

C

?

?

lookahead

start symbol fringe of the parse tree

A ?

lookahead

upper fringe of the parse tree

left-to-rightscan

Bottom-up Parsing

left-most derivation

Top-down Parsing

D

D

C

S

right-most derivation in reverse

Page 4: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

4

Handle-pruning, Bottom-up Parsers

The process of discovering a handle & reducing it to the appropriate left-hand side is called handle pruning

Handle pruning forms the basis for a bottom-up parsing method

To construct a rightmost derivation

S 0 1 2 … n-1 n w

Apply the following simple algorithm

for i n to 1 by -1

Find the handle < i i , ki > in i

Replace i with i to generate i-1

Page 5: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

5

Example

The expression grammar

Handles for rightmost derivation of input string:

x – 2 * y

Sentential Form Handle Prod’n , Pos’n

S —Expr 1,1Expr – Term 3,3Expr – Term * Factor 5,5Expr – Term * <id,y> 9,5Expr – Factor * <id,y> 7,3Expr – <num,2> * <id,y> 8,3Term – <num,2> * <id,y> 4,1Factor – <num,2> * <id,y> 7,1<id,x> – <num,2> * <id,y> 9,1

1 S Expr2 Expr Expr + Term3  | Expr – Term4 | Term5 Term Term * Factor6  | Term / Factor7 | Factor8 Factor num9 | id

Page 6: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

6

Handle-pruning, Bottom-up Parsers

One implementation technique is the shift-reduce parser

push $lookahead = get_ next_token( )repeat until (top of stack == start symbol and lookahead == $) if the top of the stack is a handle then /* reduce to */ pop || symbols off the stack push onto the stack else if (lookahead $) then /* shift */ push lookahead lookahead = get_next_token( )

How do errors show up?

• failure to find a handle

• hitting $ and needing to shift (final else clause)

Either generates an error

Page 7: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

7

Example, Corresponding Parse Tree

S

<id,x>

Term

Fact.

Expr –

Expr

<id,y>

<num,2>

Fact.

Fact.Term

Term

*

1. Shift until top-of-stack is the right end of a handle2. Pop the left end of the handle & reduce

5 shifts + 9 reduces + 1 accept

Page 8: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

8

Shift-reduce Parsing

Shift reduce parsers are easily built and easily understood

A shift-reduce parser has just four actions

• Shift — next word is shifted onto the stack

• Reduce — right end of handle is at top of stack

Locate left end of handle within the stack

Pop handle off stack & push appropriate lhs

• Accept — stop parsing & report success

• Error — call an error reporting/recovery routine

Accept & Error are simple

Shift is just a push and a call to the scanner

Reduce takes |rhs| pops & 1 push

If handle-finding requires state, put it in the stack

Handle finding is key

• handle is on stack

• finite set of handles

use a DFA !

Page 9: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

9

LR Parsers

• LR(k) parsers are table-driven, bottom-up, shift-reduce parsers that use a limited right context (k-token lookahead) for handle

recognition• LR(k): Left-to-right scan of the input, Rightmost derivation in reverse

with k token lookahead

A grammar is LR(k) if, given a rightmost derivation

S 0 1 2 … n-1 n sentence

We can

1. isolate the handle of each right-sentential form i , and

2. determine the production by which to reduce,

by scanning i from left-to-right, going at most k symbols beyond

the right end of the handle of i

Page 10: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

10

LR Parsers

A table-driven LR parser looks like

ScannerTable-driven

Parser

ACTION & GOTOTables

ParserGenerator

sourcecode

grammar

IR

Stack

Page 11: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

11

LR Shift-Reduce Parsers push($); // $ is the end-of-file symbolpush(s0); // s0 is the start state of the DFA that recognizes handleslookahead = get_next_token();repeat forever s = top_of_stack(); if ( ACTION[s,lookahead] == reduce ) then pop 2*|| symbols; s = top_of_stack(); push(); push(GOTO[s,]); else if ( ACTION[s,lookahead] == shift si ) then push(lookahead); push(si); lookahead = get_next_token(); else if ( ACTION[s,lookahead] == accept and lookahead == $ ) then return success; else error();

The skeleton parser

•uses ACTION & GOTO

• does |words| shifts

• does |derivation| reductions • does 1 accept

Page 12: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

12

To make a parser for L(G), we need a set of tables

The grammar

The tables

LR Parsers (parse tables)

1 S Z 2 Z Z z3  | z

ACTION  State $ z0 — shift 21 accept shift 32 reduce 3 reduce 33 reduce 2 reduce 2

GOTO State Z 0 1 123

Page 13: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

13

Example ParsesThe string “z”

The string “zz”

Stack Input Action$ s0 z $ shift 2$ s0 z s2 $ reduce 3$ s0 Z s1 $ accept

Stack Input Action$ s0 z z $ shift 2$ s0 z s2 z $ reduce 3 $ s0 Z s1 z $ shift 3$ s0 Z s1 z s3 $ reduce 2$ s0 Zs1 $ accept

Page 14: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

14

LR Parsers

How does this LR stuff work?

• Unambiguous grammar unique rightmost derivation

• Keep upper fringe on a stack– All active handles include TOS– Shift inputs until TOS is right end of a handle

• Language of handles is regular– Build a handle-recognizing DFA

– ACTION & GOTO tables encode the DFA

• To match subterms, recurse and leave DFA’s state on stack

• Final states of the DFA correspond to reduce actions– New state is GOTO[lhs , state at TOS]

– For Z, this takes the DFA to S1

S0

S3

S2

S1

z

zZ

Control DFA for the simple example

Reduce action

Reduce action

Page 15: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

15

Building LR Parsers

How do we generate the ACTION and GOTO tables?• Use the grammar to build a model of the handle recognizing DFA

• Use the DFA model to build ACTION & GOTO tables• If construction succeeds, the grammar is LR

How do we build the handle-recognizing DFA ?

• Encode the set of productions that can be used as handles in the DFA state: Use LR(k) items

• Use two functions goto( s, ) and closure( s )

– goto() is analogous to move() in the DFA to NFA conversion

– closure() is analogous to -closure• Build up the states and transition functions of the DFA

• Use this information to fill in the ACTION and GOTO tables

Page 16: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

16

LR(k) items

An LR(k) item is a pair [A , B], where

A is a production with a • at some position in the rhs

B is a lookahead string of length ≤ k (terminal symbols or $)

Examples: [• , a], [• , a], [• , a], & [• , a]

The • in an item indicates the position of the top of the stack

• LR(0) items [ • ] (no lookahead symbol)

• LR(1) items [ • , a ] (one token lookahead)

• LR(2) items [ • , a b ] (two token lookahead) ...

Page 17: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

17

LR(k) items

The • in an item indicates the position of the top of the stack

[• , a] means that the input seen so far is consistent with the use of immediately after the symbol on top of the stack

[• , a] means that the input seen so far is consistent with the use of at this point in the parse, and that the parser has already recognized .

[• , a] means that the parser has seen , and that a lookahead a is consistent with reducing to (for LR(k) parsers a is a string of terminal symbols of length k)

The table construction algorithm uses items to represent valid configurations of an LR(1) parser

Page 18: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

18

LR(1) Items

The production •, with lookahead a, generates 4 items

[• , a], [• , a], [• , a], & [• , a]

The set of LR(1) items for a grammar is finite

What’s the point of all these lookahead symbols?

• Carry them along to choose correct reduction

• Lookaheads are bookkeeping, unless item has • at right end

– Has no direct use in [• , a]

– In [• , a], a lookahead of a implies a reduction by

– For { [• , a],[• , b] }

lookahead = a reduce to ;

lookahead FIRST() shift Limited right context is enough to pick the actions

Page 19: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

19

Back to Finding Handles

Parser in a state where the stack (the fringe) was

Expr – Term

With lookahead of *

How did it choose to expand Term rather than reduce to Expr?

• Lookahead symbol is the key

• With lookahead of + or –, parser should reduce to Expr

• With lookahead of * or /, parser should shift

• Parser uses lookahead to decide

• All this context from the grammar is encoded in the handle- recognizing mechanism

Page 20: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

20

Back to x - 2 * y

1. Shift until TOS is the right end of a handle2. Find the left end of the handle & reduce

shift here

reduce here

Page 21: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

21

High-level overview Build the handle-recognizing DFA (aka Canonical Collection of sets of LR(1) items), C =

{ I0 , I1 , ... , In }

a Introduce a new start symbol S’ which has only one production

S’ S

b Initial state, I0 should include

• [S’ •S, $], along with any equivalent items

• Derive equivalent items as closure( I0 )

c Repeatedly compute, for each Ik , and each grammar symbol , goto(Ik , )

• If the set is not already in the collection, add it

• Record all the transitions created by goto( )

This eventually reaches a fixed point

2 Fill in the ACTION and GOTO tables using the DFA

The canonical collection completely encodes the transition diagram for the handle-finding DFA

LR(1) Table Construction

Page 22: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

22

Computing Closures

closure(I) adds all the items implied by items already in I

• Any item [ , a] implies [ , x] for each production with on the lhs, and x FIRST(a)

• Since is valid, any way to derive is valid, too

The algorithm

Closure( I ) while ( I is still changing ) for each item [ • , a] I for each production P for each terminal b FIRST(a) if [ • , b] I then add [ • , b] to I

Fixpoint computation

Page 23: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

23

Example Grammar

Initial step builds the item [S • A ,$]and takes its closure( )

Closure( [S • A , $] )

So, initial state s0 is { [S • Z ,$], [Z • Z z, $],[Z• z , $], [Z • Z z , z], [Z • z , z] }

1 S Z 2 Z Z z3  | z

Item From[S • Z , $] Original item[Z • Z z , $] 1, a is $[Z • z , $] 1, a is $[Z • Z z , z] 2, a is z $[Z • z , z] 2, a is z $

Page 24: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

24

Computing Gotos

goto(I , x) computes the state that the parser would reach if it recognized an x while in state I

• goto( { [ , a] }, ) produces [ , a]

• It also includes closure( [ , a] ) to fill out the state

The algorithm

Goto( I, x ) new = Ø for each [ • x , a] I new = new [ x • , a]

return closure(new)

• Not a fixpoint method• Uses closure

Page 25: 1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

25

Example Grammar

s0 is { [S • Z ,$], [Z • Z z, $],[Z • z , $], [Z • Z z , z], [Z • z , z] }

goto( S0 , z )

• Loop produces

• Closure adds nothing since • is at end of rhs in each item

In the construction, this produces s2

{ [Z z • , {$ , z}]}

New, but obvious, notation for two distinct items

[Zz • , $] and [Zz • , z]

Item From[Z z • , $] Item 3 in s0

[Z z • , z] Item 5 in s0