Top Banner
SYNTAX ANALYSIS
45
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Syntax Analysis Part1

SYNTAX ANALYSIS

Page 2: Syntax Analysis Part1

OVERVIEW

The role of Syntax Analyzer

Context Free grammars

Derivations, parse tree and Ambiguity

Eliminating Ambiguity

Elimination of Left Recursion

Left factoring

Top-Down parsing

Page 3: Syntax Analysis Part1

THE ROLE OF THE PARSER

Main task:

reads string of tokens from lexical analyzer and verifies that

the string of token names can be generated by the grammar

of the source language.

Position of parser in compiler model:

Lexical

analyzer ParserRest of

front end

Symbol

Table

Source

program

token

Get next

token

parse

tree

intermediate

representation

Page 4: Syntax Analysis Part1

Grammars offer significant advantages to both language designers and compiler writers.

Grammars gives a precise syntactic specification of Programming Languages.

A properly designed grammar imparts a structure to a PL that is useful for translation of source programs into correct object code and for the detection of errors.

Languages evolve over a period of time, acquiring new constructs and performing additional tasks.

For certain classes of grammars, we can automatically construct an efficient parser that determines if a source program is well formed.

Page 5: Syntax Analysis Part1

Three general types of parsers for grammars

Universal parsing methods

Cocke-Younger-Kasami algorithm

Earley’s algorithm

Too inefficient to use in production compilers

Compilers either top-down or bottom-up

one symbol at a time

Most efficient top-down and bottom-up methods work only on subclasses of grammars.

LL and LR grammars are expressive enough to describe most syntactic constructs in PLs

Page 6: Syntax Analysis Part1

SYNTACTIC ERROR HANDLING

Common programming errors can occur at different levels: Lexical

Misspelling an identifier or keyword or operator

Syntactic Arithmetic expression with unbalanced parentheses

Semantic Operator applied to incompatible operand (type mismatch)

Logical An infinitely recursive call

Page 7: Syntax Analysis Part1

Much of error detection and recovery can be handled by parsing methods.

Detection of semantic and logical errors at compile time is a difficult task.

The error handler in a parser has simple-to-state goals:

Report the presence of errors clearly and accurately.

Recover from each error quickly enough to detect subsequent errors.

Add minimal overhead to the processing of correct programs.

LL and LR parsing methods, detect an error as soon as possible.

Page 8: Syntax Analysis Part1

ERROR RECOVERY STRATEGIES

Panic mode Discards input until a valid tokens if found.

Phase level local correction(replacement of characters) on the remaining input

Error productions augment the grammar with productions that generate erroneous

constructs.

Global correction algorithms for choosing a minimal sequence of changes to obtain a

global least cost correction.

Page 9: Syntax Analysis Part1

CONTEXT FREE GRAMMARS

PL constructs have an inherently recursive structure that can be defined by CFGs

A CFG consists of terminals, non-terminals, a start

symbol, and productions.

Terminals: id, (, ), *, /, +, -

Start symbol: expression.

Non-terminals: expression, term, factor

Page 10: Syntax Analysis Part1

DERIVATIONS

Derivation corresponds to the construction of parse

tree, starting from the start symbol replacing a non-

terminal using one of its production rules.

Any string of grammar symbols is a sentential form

if it is derived from the start symbols.

A sentence in a grammar is a sentential form with

no non-terminals.

Page 11: Syntax Analysis Part1

Left most derivation

the leftmost non-terminal is replaced in the sentential

form

Right most derivation:

the rightmost non-terminal is replaced in the sentential

form

Page 12: Syntax Analysis Part1

PARSE TREE

A parse tree is a graphical representation of a

derivations that rules out the order in which

productions are applied to replace non-terminals.

Yield of parse tree

Leaves of the parse tree read from left to right ,

constitute a sentential form.

Page 13: Syntax Analysis Part1
Page 14: Syntax Analysis Part1

AMBIGUITY

A grammar that produces more than one parse tree for some sentence is said to be ambiguous.

An ambiguous grammar is one that produces more than one leftmost or rightmost derivation for the same sentence.

Page 15: Syntax Analysis Part1

ELIMINATING AMBIGUITY

Consider the following grammar of expressions

consisting of plus, minus and digits

When an operand 5 has operators to left and right,

some conventions are needed for deciding which

operator applies to that operand.

Page 16: Syntax Analysis Part1

ASSOCIATIVITY OF OPERATORS

In PLs, the operators +, -, *, / are left-associative.

An operand with plus signs on both sides of it

belongs to the operator to its left.

Exponentiation and assignment are right-

associative.

Page 17: Syntax Analysis Part1
Page 18: Syntax Analysis Part1

PRECEDENCE OF OPERATORS

Rules defining relative precedence of operators are

needed when more than one kind of operator are

present.

*, / have higher precedence than +, -.

9+5*2 is interpreted as 9+(5*2) and not (9+5)*2.

Page 19: Syntax Analysis Part1

Consider the following “dangling-else” grammar

According to this grammar, the compound

statement

has the parse tree

Page 20: Syntax Analysis Part1

The grammar is ambiguous since the string

has two parse trees

Page 21: Syntax Analysis Part1

The dangling else grammar can be rewritten to an

unambiguous grammar by matching every then to

an else using the grammar shown below.

Page 22: Syntax Analysis Part1

ELIMINATION OF LEFT RECURSION

A grammar is left-recursive if it has a non-terminal

such that there is a derivation for some .

Immediate left-recursion is a production of the form

The left recursion production

can be replaced by the non-left-recursive

productions

without changing the strings derivable from A.

Page 23: Syntax Analysis Part1

In general, the immediate left-recursion for any

number of A-productions

can be eliminated as

Non-left recursive grammar for expression grammar

is

Page 24: Syntax Analysis Part1

ALGORITHM TO ELIMINATE LEFT-RECURSION

The resulting non-left-recursive grammar may have

ϵ-productions.

Page 25: Syntax Analysis Part1

Example: consider the following grammar

The algorithm yields the following non-recursive

algorithm

S Aa | bA Ac | Sd | ε

S Aa | bA bdA’ | A’

A’ cA’ | adA’ | ε

Page 26: Syntax Analysis Part1

LEFT FACTORING

Left factoring transformation is useful for producing

a grammar suitable for predictive parsing.

If are two A-productions , the left-

factored productions for A becomes

ALGORITHM:

Page 27: Syntax Analysis Part1

Example: Consider the grammar

Left factored grammar becomes

Page 28: Syntax Analysis Part1

NON-CONTEXT FREE LANGUAGE CONSTRUCTS

L1 = { wcw | w is in (a|b)* } is not context free

The non-context freedom of L1 , directly implies the non-context-freedom of programming languages like C and Java, which require declaration of identifiers before their use and which allow identifiers of arbitrary length.

L2 = { anbmcndm | n >= 1 and m >= 1 } is not context free

Here, L2 abstracts the problem of checking the number of formal parameters in the declaration of a function agrees the number of actual parameters in a use of the function.

Page 29: Syntax Analysis Part1

TOP-DOWN PARSING

Finding leftmost derivation for an input string.

The key problem is to determine the production to be

applied for a non-terminal.

Backtracking may be involved in finding the correct

production to a non-terminal.

For most of the PL constructs, back tracking can be

avoided by using suitable transformations to the

grammar.

Page 30: Syntax Analysis Part1
Page 31: Syntax Analysis Part1

RECURSIVE DESCENT PARSING

Leftmost derivation for an input string.

Construction of a parse tree from the root and creating the nodes of the parse tree in preorder.

Recursive-descent parsing may involve backtracking. Consider the grammar S cAd

A ab | a

derivation for string w=cad here, involves backtracking.

S S

c A d

a

c A d

a bc dA

S

Page 32: Syntax Analysis Part1

Recursive descent parsing consists of set of procedures,

one for each non-terminal.

Issues

May require backtracking i.e, may require repeated scan over

the input.

Left-recursive grammar can cause the parser to go into an

infinite loop.

Page 33: Syntax Analysis Part1

FIRST AND FOLLOW SETS

If α is any string of grammar symbols,

FIRST(α) is the set of terminals that begin the strings derived

from α.

For any non-terminal A

FOLLOW(A) is the set of terminals that can appear

immediately to the right of A in some sentential form.

Page 34: Syntax Analysis Part1

FIRST()

To compute FIRST(X) for all grammar symbols X, apply

the following rules until no more terminals or ϵ can be

added to FIRST(X).

Page 35: Syntax Analysis Part1

FOLLOW()

To compute FIRST(A) for all non-terminals A, apply the

following rules until nothing can be added to any

FOLLOW set.

Page 36: Syntax Analysis Part1

EXAMPLE:

In the expression grammar

The FIRST sets are

FIRST(F) = FIRST(T)=FIRST(E)={(, id}

FIRST(E’) ={+, ϵ}

FIRST(T’) = {*, ϵ}

The FOLLOW sets are

FOLLOW(E) = FOLLOW(E’) = {), $}

FOLLOW(T) = FOLLOW(T’) = {+, ), $}

FOLLOW(F) = {+, *, ), $}

Page 37: Syntax Analysis Part1

PREDICTIVE PARSING

The goal of predictive parsing is to construct top-down parser that never backtracks.

Obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking.

Given an input symbol and a non-terminal to be expanded, we must know which one of the alternatives of the productions is the unique alternative that derives a string beginning with a.

To do so, we do the following transformations to the grammar Eliminate left-recursion

Perform left-factoring

Page 38: Syntax Analysis Part1

LL(1) GRAMMARS

Class of LL(1) grammars Left to right scan, Leftmost derivation, 1 input symbol

lookahead

LL(1) grammars aid in automatic construction of predictive parser.

A grammar G is LL(1) if and only if whenever are two distinct productions in G, the following conditions hold: FIRST( ) and FIRST( ) are disjoint sets.

If ϵ is in FIRST( ), then FIRST( ) and FOLLOW(A) are disjoint sets and like wise if ϵ is in FIRST( ) .

Page 39: Syntax Analysis Part1

CONSTRUCTION OF A PREDICTIVE PARSING TABLE

Page 40: Syntax Analysis Part1

EXAMPLE: PREDICTIVE PARSING TABLE FOR

EXPRESSION GRAMMAR

Page 41: Syntax Analysis Part1

For every LL(1) grammar, each parsing-table entry

uniquely identifies a production or signals an error.

For some grammars, the table may have some

entries multiple defined.

If grammar is either left-recursive or ambiguous at least

one of the entry is multiply defined.

Although left-recursion elimination and left-factoring

are easy to do, there are some grammars for which

no amount of alteration will produce an LL(1)

grammar.

Page 42: Syntax Analysis Part1

EXAMPLE: PREDICTIVE PARSING TABLE FOR

DANGLING-ELSE PROBLEM

Predictive parsing table for the grammar

is

Page 43: Syntax Analysis Part1

TABLE-DRIVEN PREDICTIVE PARSING

Page 44: Syntax Analysis Part1

PREDICTIVE PARSING ALGORITHM

Page 45: Syntax Analysis Part1

EXAMPLE: