Top Banner
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011
54

Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Mar 28, 2015

Download

Documents

Rhiannon Fill
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Compiler Construction

A Compulsory Module for Students in

Computer Science Department

Faculty of IT / Al – Al Bayt University

Second Semester 2010/2011

Page 2: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Syntax Analyzer

Page 3: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Top-down parsing

can be viewed as the problem of constructing a parse tree for the input string, starting from the root and creating the nodes of the parse tree in preorder (depth-first)

Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an input string.

Page 4: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Top-down Parsing (cont.) At each step of a top-down parse, the key problem is that of

determining the production to be applied for a nonterminal, say A. Once an A-production is chosen, the rest of the parsing process consists of "matching” the terminal symbols in the production body with the input string.

A general form of top-down parsing, called recursive descent parsing (may require backtracking to find the correct A-production to be applied). a special case of recursive-descent parsing is Predictive parsing

(where no backtracking is required. Another form of top-down parsing is called Nonrecursive

descent parsing (maintain a stack explicitly in addition to using parse table)

Page 5: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Recursive-Descent Parsing

Algorithm

•A recursive-descent parsing program consists of a set of procedures, one for each nonterminal.•Execution begins with the procedure for the start symbol,Recursive-descent may require backtracking; that is, it may require repeated scans over the input (backtracking is not very efficient)

Page 6: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Recursive-Descent Parsing (cont.)

To allow backtracking, the above algorithm needs to be modified as follows: cannot choose a unique A-production at line (1), so we must

try each of several productions in some order. failure at line (7) is not ultimate failure, but suggests only that

we need to return to line (1) and try another A-production. input error is found only if there are no more A-productions to

try. In order to try another A-production, we need to be able to

reset the input pointer to where it was when we first reached line (1). Thus, a local variable is needed to store this input pointer for future use.

Page 7: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Example:Consider the following grammar:

To construct a parse tree top-down for the input string w = cad:

We have a match for the first and the second input symbol, so we advance the input pointer to d and compare d against the next leaf, labeled b. Since b does not match d, so we must try the other production. But in this case we need to reset the pointer of the input string ad start parsing again using the new production.

•A left-recursive grammar can cause a recursive-descent parser, even one with backtracking, to go into an infinite loop.

Page 8: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Predictive Parsers

A CFG may be parsed by a recursive-descent parser that needs no backtracking if left recursions eliminated left factoring transformations applied

Building a predictive parser using recursive procedures by : Creating transition diagrams Match terminals, and making procedure call for non-terminals.

Page 9: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Consider the following grammar

After left recursion elimination and left factoring

6

43

T

+

E':

Transition diagrams can be simplified, provided the sequence of grammar symbols along paths is preserved. The diagrams in Fig. above are equivalent: if we trace paths from E to an accepting state and substitute for E', then, in both sets of diagrams, the grammar symbols along the paths make up strings of the form T + T + . . . + T.

Page 10: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Transition Diagrams for Predictive Parsers

To construct the transition diagram from a grammar, first eliminate left recursion and then left factor the grammar. for each non-terminal A,

1. Create an initial and final (return) state. 2. For each production A XIX2 - . Xn, create a path from the initial to the

final state, with edges labeled X1, X2,. . . , Xn. If A , the path is an edge labeled .

Transition diagrams for predictive parsers differ from those for lexical analyzers. Parsers have one diagram for each non-terminal. The labels of edges can be tokens or non-terminals.

A transition on a token (terminal) means that we take that transition if that token is the next input symbol.

A transition on a non-terminal A is a call of the procedure for A. we used tail-recursion removal and substitution of procedure bodies to

optimize the procedure for a non-terminal.

Page 11: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Non-recursive Predictive Parsing (driven predictive parsing)

a + b $

XYZ$

Input buffer

stack

Predictive parsingprogram/driver

Parsing Table M

•A non-recursive predictive parser can be built by maintaining a stack explicitly, rather than implicitly via recursive calls. •The parser mimics a leftmost derivation.•If w is the input that has been matched so far, then the stack holds a sequence of grammar symbols :

such that the table-driven parser has an input buffer, a stack containing a sequence of grammar symbols, a parsing table constructed by Algorithm 4.31, and an output stream. The input buffer contains the string to be parsed, followed by the end marker $

Page 12: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Nonrecursive Predictive Parsing

Method: Initially, the parser is in a configuration with w$ in the input buffer and the start symbol S of G on top of the stack, above $. The algorithm below uses the predictive parsing table M to produce a predictive parse for the input

Page 13: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Predictive Parsing table for the grammar below:

Page 14: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

FIRST and FOLLOW

FIRST ( A ) = set of terminals that begin the strings derived from a. If A , then is also in FIRST( a ).

FOLLOW ( A ) = set of terminals a that can appear immediately to the right of A in some sentential form. In other words, there exists a derivation S a Aab.

In addition, if A can be the rightmost symbol in some sentential form, then $ is in FOLLOW(A); recall that $ is a special "endmarker" symbol that is assumed not to be a symbol of any grammar.

Page 15: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Computing FIRST

FIRST (X) IF X is a terminal, FIRST(X) = {X} IF X is a production, then add to FIRST

(X) IF X Y1Y2…Yk is a production,

Place a in FIRST(X) if a in FIRST(Yi) and is in all of FIRST(Y1),…FIRST(Yi-1)

Add to FIRST(X) if is in all FIRST(Yi).

Page 16: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Computing FOLLOW

Place $ in FOLLOW(S), S is the start symbol, and $ is the endmarker.

If there is a production A aB, then everything in FIRST( ) except for , is placed in

FOLLOW(B).

If there is a production AaB, or a production AaB, where FIRST( ) contains , then everything in FOLLOW(A) is in FOLLOW(B).

Page 17: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Example: computing First()

stmt expr ;

expr term expr1 | expr1 + term expr1 | term factor term1

term1 * factor term1 | factor ( expr ) | number

Easy ones:First (expr1) = {+, }First(term1) = {*, }First(factor) = {(, number}

Next step:First(term) = First(factor)First(expr) = First(term) and First(stmt) = First(expr ;)Due to expr prodAdd “;” to First(stmt)

Page 18: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Exercise: computing First and Follow

Given the following Grammar

1 E TE’

2 E’ +TE’ | -TE’ | 3 T FT’

4 T’ *FT’ | /FT’ | 5 F ( E ) | id | num

Compute FIRST and FOLLOW for all non-terminals

FIRST(E) = {(, id,num}FIRST(E’) = {+,-,}FIRST(T) = {(,id,num}FIRST(T’) = {*,/,}FIRST(F) = {(, id,num}

Follow(E) = {),$}Follow(E’) = {),$}Follow(T) = {+,-,),$}Follow(T’) = {+,-,),$}Follow(F) = {*,/,+,-,),$}

Page 19: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Construction of a predictive parsing table

Page 20: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Notes:

For some grammars, however, M may have some entries that are multiply defined. For example: if G is left-recursive or ambiguous, then M[A,d] will

have at least one multiply defined entry.

Although left-recursion elimination and left factoring are easy to do, there are some grammars for which no amount of alteration will produce an LL(1) grammar.

The language in the following example (which abstracts the dangling-else) problem has no LL(1) grammar at all.

Page 21: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Cont. The parsing table for this grammar appears below.

The entry for M[S' ,e] contains both S' eS and S' . The grammar is ambiguous and the ambiguity is manifested by a

choice in what production to use when an e (else) is seen. We can resolve this ambiguity

Page 22: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

LL(1) Grammar: scanning the input from left to right (L), producinga leftmost derivation (L), using one input symbol of lookahead at each step to make parsing action decisions (1).

A Grammar whose parsing table has no multiple-defined entries is called LL(1).

The class of LL(1) grammars is rich enough to cover most programming constructs, although care is needed in writing a suitable grammar for the source language

Properties of LL(1) Parsers A correct, leftmost parse tree is guaranteed (perform left

recursion elimination and left factoring) All grammars in the LL(1) class should be unambiguous All LL(1) parsers operate in linear time, and at most, linear space.

Page 23: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

LL(1) grammar properties (cont.)

Page 24: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Exercise 1

A | Which two of the following cases may cause the

grammar to be NOT LL(1)

1. a in FIRST() and b in FIRST()

2. a and b are both in FIRST()

3. a in both FIRST() and FIRST()

4. a in FIRST() and FOLLOW(A), in FIRST()

3 and 4

Page 25: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Exercise 2

stmt if expr then stmt else stmt | if expr then stmt

S iEtS | iEtSeS | aE b

After left factoring, the new CFG isS iEtSS’ | aS’ eS | eE bWhy is this CFG not LL(1)?

Page 26: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Exercise 2

S iEtSS’ | a

S’ eS | FIRST(S) = FIRST(S’) = FOLLOW(S) = FOLLOW(S’) = Where is the conflict?

{i, a}

{e, }

{e,$}

{e,$}

S’eS and S’ are both entered for M[S’,e] entry

Page 27: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Exercise 3

Show the following CFG is not LL(1)stmt label unlabeled_stmt

label id : | e

unlabeled_stmt id := expr

id is in both First(label) and Follow(label) This means both label id: and label will be inserted into parser table entry M[label,id]

Page 28: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Bottom-Up Parsing A bottom-up parse corresponds to the construction of a

parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top). It is convenient to describe parsing as the process of building parse trees.

Bottom-up parsing during a left-to-right scan of the input constructs a rightmost derivation in reverse.

Page 29: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

We can think of bottom-up parsing as the process of "reducing" a string w to the start symbol of the grammar. At each reduction step, a specific substring matching the body of a production is replaced by the nonterminal at the head of that production.

a reduction is the reverse of a step in a derivation (recall that in a derivation, a nonterminal in a sentential form is replaced by the body of one of its productions). The goal of bottom-up parsing is therefore to construct a derivation in reverse.

Page 30: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Bottom up parsing algorithms are:Shift-reduce parsers LR grammars, needs too much work to be build

by hand, tools called automatic parser generators make it easy to construct efficient LR parsers from suitable grammars.

The following derivation corresponds to the parse id*id

This derivation is in fact a rightmost derivation.

Page 31: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Handler

Informally, a "handle" is a substring that matches the body of a production, and whose reduction represents one step along the reverse of a rightmost derivation.

Page 32: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Shift-Reduce Parsing

Shift-reduce parsing is a form of bottom-up parsing in which a stack holds grammar symbols and an input buffer holds the rest of the string to be parsed. As we shall see, the handle always appears at the top of the stack just before it is identified as the handle.

$ used to mark the bottom of the stack and also the right end of the input.

Page 33: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

While the primary operations are shift and reduce, there are actually fourpossible actions a shift-reduce parser can make: 1. Shift. Shift the next input symbol onto the top of the stack.2. Reduce. The right end of the string to be reduced must be at the top of the stack. Locate the left end of the string within the stack and decide with what nonterminal to replace the string.3. Accept. Announce successful completion of parsing.4. Error. Discover a syntax error and call an error recovery routine.

The use of a stack in shift-reduce parsing is justified by an important fact: the handle will always eventually appear on top of the stack, never inside.

Page 34: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Problem of shift reduce

Conflicts During Shift-Reduce Parsing Every shift-reduce parser for such a grammar

can reach a configuration in which the parser, knowing the entire stack contents and the next input symbol, cannot decide whether to shift or to reduce (a shift/reduce conflict), or cannot decide which of several reductions to make (a reduce/reduce conflict)

Page 35: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

LR Parsing: Simple LR

The most prevalent type of bottom-up parser today is based on a concept called LR(k) parsing; the "L" is for left-to-right scanning of the input, the "R" for constructing a rightmost derivation in reverse, and the k for the number of input symbols of lookahead that are used in making parsing decisions.

The cases k = 0 or k = 1 are of practical interest, and we shall only consider LR parsers with k <=1 here. When (k) is omitted, k is assumed to be 1.

Page 36: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

LR parsing is attractive for a variety of reasons: LR parsers can be constructed to recognize virtually all

programminglanguage constructs for which context-free grammars can be written. Non- LR

context-free grammars exist, but these can generally be avoided for typical programming-language constructs.

The LR-parsing method is the most general nonbacktracking shift-reduce parsing method known

An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input.

The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive or LL methods. For a grammar to be LR(k), we must be able to recognize the occurrence of the right side of a production in a right-sentential form, with k input symbols of lookahead.

This requirement is far less stringent than that for LL(k) grammars where we must be able to recognize the use of a production seeing only the first k symbols of what its right side derives. Thus, LR grammars can describe more languages than LL grammars.

Page 37: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

The principal drawback of the LR method is that it is too much work to construct an LR parser by hand for a typical programming-language grammar.

A specialized tool, an LR parser generator, is needed. Fortunately, many such generators are available, one of the most commonly used ones, Yacc. Such a generator takes a context-free grammar and automatically produces a parser for that grammar.

If the grammar contains ambiguities or other constructs that are difficult to parse in a left-to-right scan of the input, then the parser generator locates these constructs and provides detailed diagnostic messages.

Page 38: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Items and the LR(0) Automaton An LR parser makes shift-reduce decisions by maintaining

states to keep track of where we are in a parse. An LR(0) item of a grammar G is a production of G with a

dot at some position of the body Example :the production yields

One collection of sets of LR(0) items, called the canonical LR(0) collection, provides the basis for constructing a

deterministic finite automaton that is used ‘ to make parsing decisions. called an LR(0) automata

Page 39: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

augmented grammar

If G is a grammar with start symbol S, then G', the augmented grammar for G, is G with a new start symbol St

and production S' S. The purpose of this new starting production is to indicate to

the parser when it should stop parsing and announce acceptance of the input. That is, acceptance occurs when

and only when the parser is about to reduce by S' S

Page 40: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Closure of Item Sets

Page 41: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.
Page 42: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

The Function GOT0

Page 43: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Example For the grammar :

then CLOSURE(I) contains the set of items I0. in Fig. 4.31. represent the canonical collection set of LR(0)

Page 44: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

the sets of items of interest into two classes could be divided :

1. Kernel items: the initial item, S' .S, and all items whose dots are not at the left end.

2. Non-kernel items: all items with their dots at the left end, except for S' .S.

Page 45: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

The LR-Parsing Algorithm

A schematic of an LR parser consists of an input, an output, a stack, a driver program, and a parsing table that has two pasts (ACTIONa nd GOTO).

Page 46: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Structure of the LR Parsing Table

Page 47: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

SLR-parsing algorithm. (simple LR)

Page 48: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Example: parse id * id + id, using the following

grammar

Page 49: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

0id50F3

Page 50: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

See page 254 of the book

Page 51: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.
Page 52: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.
Page 53: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Canonical LR(1) parser solve such problem since it provide 1 look-ahead symbol

Page 54: Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2010/2011.

Error Recovery

Selection of a synchronizing set Place all symbols in FOLLOW(A) into the

synchronizing set for non-terminal A. Add keywords that begin statements to the set Add symbols in FIRST(A)

May re-parse A rather than pop A If a non-terminal can generate e, then the production can be

used as a default. Pop and continue parsing