MTE.1 CSE4100 Midterm Exam Advice and Hints Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191.

MTE.1

CSE4100

Midterm Exam Advice and HintsMidterm Exam Advice and Hints

Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department

The University of Connecticut191 Auditorium Road, Box U-155

Storrs, CT [email protected]

http://www.engr.uconn.edu/~steve(860) 486 - 4818

Dr. Robert LaBarreUnited Technologies Research Center

411 Silver LaneE. Hartford, CT 06018

[email protected]

MTE.2

CSE4100

Core MaterialCore Material

Chapter 1: Introduction to CompilersChapter 1: Introduction to Compilers Basic Compiler Ideas and Concepts The “Big Picture”

Chapter 2: A Simple One-Pass CompilerChapter 2: A Simple One-Pass Compiler A Look at All Phases of Compilation Process From Lexical Analysis Thru Code Generation

FOCUS: Chapter 3: Lexical AnalysisFOCUS: Chapter 3: Lexical Analysis Specifying/Recognizing Tokens Patterns (Regular Expressions) and Lexemes Regular Expressions and DFA/NFA Algorithms for

Regular Expression to NFA NFA to DFA

MTE.3

CSE4100

Core MaterialCore Material

FOCUS Chapter 4: Syntax Analysis FOCUS Chapter 4: Syntax Analysis Context Free Grammar: Defs and Concepts

Derivations, Specification, Languages Writing Grammars Ambiguity, Left Recursion, Left Factoring,

Removing epsilon Moves Algorithm for Left Recursion Removal

Top-Down Parsing Recursive Descent and Predictive Parsing First and Follow Calculation Constructing LL(1) Parsing Table Ambiguity and Error Handling

Lex and Yacc will not be Tested!Lex and Yacc will not be Tested!

MTE.4

CSE4100

Hints for Taking ExamHints for Taking Exam

Read the Questions Carefully!Read the Questions Carefully! Ask Questions if you are Confused!Ask Questions if you are Confused! Answer Questions in Any OrderAnswer Questions in Any Order

Organized to fit on minimum number of pages Answer “Easiest” questions for you!

Assess Points per Time UnitAssess Points per Time Unit 75 minutes = 75 points 30 minutes = 30 points; 20 minutes = 20 points

Don't Be Afraid to Not Answer a QuestionDon't Be Afraid to Not Answer a Question 60% Correct for 100 Points = 60 Points 90% Correct For 80 Points = 72 Points

Partial Credit is the NormPartial Credit is the Norm

MTE.5

CSE4100

Possible QuestionsPossible Questions

Open Notes and Open BookOpen Notes and Open Book 5 to 6 Total Multi-Part Questions5 to 6 Total Multi-Part Questions Possibilities… Possibilities…

Constructive and Algorithm Questions Writing and Using Grammar Understanding Significance and Relevance of

Concepts Know your Algorithms and Constructs Know your Algorithms and Constructs

(Regular Expressions, NFA, DFA, CFG)(Regular Expressions, NFA, DFA, CFG) Show All Work to Receive Partial (Any) CreditShow All Work to Receive Partial (Any) Credit Do Not Jump to Final AnswerDo Not Jump to Final Answer Avoid Run-on ExplanationsAvoid Run-on Explanations

MTE.6

CSE4100

Chapter 3 Excerpted MaterialChapter 3 Excerpted Material Introducing Basic Terminology

Token Sample Lexemes Informal Description of Pattern

const

if

relation

id

num

literal

const

if

<, <=, =, < >, >, >=

pi, count, D2

3.1416, 0, 6.02E23

“core dumped”

const

if

< or <= or = or < > or >= or >

letter followed by letters and digits

any numeric constant

any characters between “ and “ except “

Classifies Pattern

Actual values are critical. Info is :

1. Stored in symbol table2. Returned to parser

MTE.7

CSE4100

Language ConceptsLanguage Concepts

A language, L, is simply any set of strings over a fixed alphabet.

Alphabet Languages

{0,1} {0,10,100,1000,100000…}

{0,1,00,11,000,111,…}

{a,b,c} {abc,aabbcc,aaabbbccc,…}

{A, … ,Z} {TEE,FORE,BALL,…}

{FOR,WHILE,GOTO,…}

{A,…,Z,a,…,z,0,…9, { All legal PASCAL progs}

+,-,…,<,>,…} { All grammatically correct

English sentences }

Special Languages: - EMPTY LANGUAGE

- contains string only

MTE.8

CSE4100

Formal Language Operations

OPERATION DEFINITION

union of L and M written L M

concatenation of L and M written LM

Kleene closure of L written L*

positive closure of L written L+

L M = {s | s is in L or s is in M}

LM = {st | s is in L and t is in M}

L+=

0i

iL

L* denotes “zero or more concatenations of “ L

L*=

1i

iL

L+ denotes “one or more concatenations of “ L

MTE.9

CSE4100

Formal Language OperationsExamples

L = {A, B, C, D } D = {1, 2, 3}

L D = {A, B, C, D, 1, 2, 3 }

LD = {A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 }

L2 = { AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD}

L4 = L2 L2 = ??

L* = { All possible strings of L plus }

L+ = L* -

L (L D ) = ??

L (L D )* = ??

MTE.10

CSE4100

A A Regular Expression Regular Expression is a Set of Rules / is a Set of Rules /

Techniques for Constructing Sequences of Techniques for Constructing Sequences of

Symbols (Strings) From an Alphabet.Symbols (Strings) From an Alphabet.

Let Let Be an Alphabet, r a Regular Expression Be an Alphabet, r a Regular Expression

Then L(r) is the Language That is Characterized Then L(r) is the Language That is Characterized

by the Rules of Rby the Rules of R

Language & Regular Expressions

MTE.11

CSE4100

precedence

Rules for Specifying Regular Expressions:Rules for Specifying Regular Expressions:

1. is a regular expression denoting { }

2. If a is in , a is a regular expression that denotes {a}

3. Let r and s be regular expressions with languages L(r) and L(s). Then

(a) (r) | (s) is a regular expression L(r) L(s)

(b) (r)(s) is a regular expression L(r) L(s)

(c) (r)* is a regular expression (L(r))*

(d) (r) is a regular expression L(r)

All are Left-Associative.

MTE.12

CSE4100

EXAMPLES of Regular Expressions

L = {A, B, C, D } D = {1, 2, 3}

A | B | C | D = L

(A | B | C | D ) (A | B | C | D ) = L2

(A | B | C | D )* = L*

(A | B | C | D ) ((A | B | C | D ) | ( 1 | 2 | 3 )) = L (L D)

MTE.13

CSE4100

Algebraic Properties of Algebraic Properties of Regular ExpressionsRegular Expressions

AXIOM DESCRIPTION

r | s = s | r

r | (s | t) = (r | s) | t

(r s) t = r (s t)

r = rr = r

r* = ( r | )*

r ( s | t ) = r s | r t( s | t ) r = s r | t r

r** = r*

| is commutative

| is associative

concatenation is associative

concatenation distributes over |

relation between * and

Is the identity element for concatenation

* is idempotent

MTE.14

CSE4100

Automata & Language TheoryAutomata & Language Theory

TerminologyTerminology FSA

A recognizer that takes an input string and determines whether it’s a valid string of the language.

Non-Deterministic FSA (NFA) Has several alternative actions for the same input symbol

Deterministic FSA (DFA) Has at most 1 action for any given input symbol

Bottom LineBottom Line expressive power(NFA) == expressive power(DFA) Conversion can be automated

MTE.15

CSE4100

Finite Automata & Language TheoryFinite Automata & Language Theory

Finite Automata : A recognizer that takes an input string & determines whether it’s a valid sentence of the language

Non-Deterministic : Has more than one alternative action for the same input symbol. Can’t utilize algorithm !

Deterministic : Has at most one action for a given input symbol.

Both types are used to recognize regular expressions.

MTE.16

CSE4100

NFAs & DFAsNFAs & DFAs

Non-Deterministic Finite Automata (NFAs) easily represent regular expression, but are somewhat less precise.

Deterministic Finite Automata (DFAs) require more complexity to represent regular expressions, but offer more precision.

We’ll discuss both plus conversion algorithms, i.e., NFA DFA and DFA NFA

MTE.17

CSE4100

Non-Deterministic Finite AutomataNon-Deterministic Finite Automata

An NFA is a mathematical model that consists of :

• S, a set of states

• , the symbols of the input alphabet

• move, a transition function.

• move(state, symbol) state

• move : S S

• A state, s0 S, the start state

• F S, a set of final or accepting states.

MTE.18

CSE4100

Example NFAExample NFA

S = { 0, 1, 2, 3 }

s0 = 0

F = { 3 }

= { a, b }

start0 3b21 ba

a

b

What Language is defined ?

What is the Transition Table ?

state

i n p u t

0

1

2

a b

{ 0, 1 }

-- { 2 }

-- { 3 }

{ 0 }

(null) moves possible

ji

Switch state but do not use any input symbol

MTE.19

CSE4100

Epsilon-TransitionsEpsilon-Transitions

Given the regular expression: (a (b*c)) | (a (b |c+)?)Given the regular expression: (a (b*c)) | (a (b |c+)?) Find a transition diagram NFA that recognizes

it. Solution ?Solution ?

MTE.20

CSE4100

Deterministic Finite Automata Deterministic Finite Automata

A DFA is an NFA with the following restrictions:

• moves are not allowed

• For every state s S, there is one and only one path from s for every input symbol a .

Since transition tables don’t have any alternative options, DFAs are easily simulated via an algorithm.

s s0

c nextchar;while c eof do s move(s,c); c nextchar;end;if s is in F then return “yes” else return “no”

MTE.21

CSE4100

Example - DFAExample - DFA

start0 3b21 ba

a

b

start0 3b21 ba

b

ab

aa

What Language is Accepted?

Recall the original NFA:

MTE.22

CSE4100

Regular Expression to NFA ConstructionRegular Expression to NFA Construction

We now focus on transforming a Reg. Expr. to an NFA

This construction allows us to take:

• Regular Expressions (which describe tokens)

• To an NFA (to characterize language)

• To a DFA (which can be computerized)

The construction process is componentwise

Builds NFA from components of the regular expression in a special order with particular techniques.

NOTE: Construction is syntax-directed translation, i.e., syntax of regular expression is determining factor for NFA construction and structure.

MTE.23

CSE4100

Motivation: Construct NFA For:Motivation: Construct NFA For:

start i f

astart 0 1

b A Bastart 0 1

bstart A B

:

a :

b:

ab:

| ab :

a*

( | ab )* :

MTE.24

CSE4100

Construction Algorithm : R.E. Construction Algorithm : R.E. NFA NFA

Construction Process :

1st : Identify subexpressions of the regular expression

symbols

r | s

rs

r*

2nd : Characterize “pieces” of NFA for each subexpression

MTE.25

CSE4100

Piecing Together NFAsPiecing Together NFAs

2. For a in the regular expression, construct NFA

astart i f L(a)

1. For in the regular expression, construct NFA

L()start i f

MTE.26

CSE4100

Piecing Together NFAs – continued(1)Piecing Together NFAs – continued(1)

where i and f are new start / final states, and -moves are introduced from i to the old start states of N(s) and N(t) as well as from all of their final states to f.

3.(a) If s, t are regular expressions, N(s), N(t) their NFAs s|t has NFA:

start i f

N(s)

N(t)

L(s) L(t)

MTE.27

CSE4100


3.(b) If s, t are regular expressions, N(s), N(t) their NFAs st (concatenation) has NFA:

starti fN(s) N(t) L(s) L(t)

Alternative:

overlap

N(s)start i fN(t)

where i is the start state of N(s) (or new under the alternative) and f is the final state of N(t) (or new). Overlap maps final states of N(s) to start state of N(t).

MTE.28

CSE4100


fN(s)start i

where : i is new start state and f is new final state

-move i to f (to accept null string)

-moves i to old start, old final(s) to f

-move old final to old start (WHY?)

3.(c) If s is a regular expressions, N(s) its NFA, s* (Kleene star) has NFA:

MTE.29

CSE4100

Properties of Construction Properties of Construction

1. N(r) has at most 2*(#symbols + #operators) of r

2. N(r) has exactly one start and one accepting state

3. Each state of N(r) has at most one outgoing edge

a and at most two outgoing ’s

4. BE CAREFUL to assign unique names to all states !

Let r be a regular expression, with NFA N(r), then

MTE.30

CSE4100

Detailed ExampleDetailed Example

r13

r12r5

r3 r11r4

r9

r10

r8r7

r6

r0

r1 r2

b

*c

a a

|

( )

b

|

*

c

See example 3.16 in textbook for (a | b)*abb2nd Example - (ab*c) | (a(b|c*))

Parse Tree for this regular expression:

What is the NFA? Let’s construct it !

MTE.31

CSE4100

Detailed Example – Construction(1)Detailed Example – Construction(1)

r3: a

r0: b

r2: c

b

r1:

r4 : r1 r2b

c

r5 : r3 r4

b

a c

MTE.32

CSE4100

Detailed Example – Construction(2)Detailed Example – Construction(2)

r11: a

r7: b

r6: c

c

r9 : r7 | r8

b

r10 : r9

c

r8:

c

r12 : r11 r10

b

a

MTE.33

CSE4100

Detailed Example – Final StepDetailed Example – Final Step

r13 : r5 | r12

b

a c

c

b

a

1

6543

8

2

10

9 12 13 14

11

15

7

16

17

MTE.34

CSE4100

Conversion : NFA Conversion : NFA DFA Algorithm DFA Algorithm

• Algorithm Constructs a Transition Table for DFA from NFA

• Each state in DFA corresponds to a SET of states of the NFA

• Why does this occur ?

• moves

• non-determinism

Both require us to characterize multiple situations that occur for accepting the same string.

(Recall : Same input can have multiple paths in NFA)

• Key Issue : Reconciling AMBIGUITY !

MTE.35

CSE4100

Converting NFA to DFA – 1Converting NFA to DFA – 1stst Look Look

0 85

4

7

3

6

2

1

ba

c

From State 0, Where can we move without consuming any input ?

This forms a new state: 0,1,2,6,8 What transitions are defined for this new state ?

MTE.36

CSE4100

The Resulting DFAThe Resulting DFA

Which States are FINAL States ?

1, 2, 5, 6, 7, 81, 2, 4, 5, 6, 8

0, 1, 2, 6, 8 3

c

ba

a

a

c

c

DC

AB

c

baa

a

c

c

How do we handle alphabet symbols not defined for A, B, C, D ?

MTE.37

CSE4100

Algorithm ConceptsAlgorithm Concepts

NFA N = ( S, , s0, F, MOVE )

-Closure(S) : s S

: set of states in S that are reachable

from s via -moves of N that originate

from s.

-Closure of T : T S

: NFA states reachable from all t T

on -moves only.

move(T,a) : T S, a : Set of states to which there is a

transition on input a from some t T

These 3 operations are utilized by algorithms / techniques to facilitate the conversion process.

No input is consumed

MTE.38

CSE4100

Illustrating Conversion – An ExampleIllustrating Conversion – An Example

First we calculate: -closure(0) (i.e., state 0)

-closure(0) = {0, 1, 2, 4, 7} (all states reachable from 0 on -moves)Let A={0, 1, 2, 4, 7} be a state of new DFA, D.

0 1

2 3

54

6 7 8 9

10

a

a

b

b

b

start

Start with NFA: (a | b)*abb

MTE.39

CSE4100

Chapter 4 Excerpted MaterialChapter 4 Excerpted MaterialContext Free GrammarsContext Free Grammars

Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where:

T: Terminals / tokens of the language

NT: Non-terminals to denote sets of strings generatable by the grammar & in the language

S: Start symbol, SNT, which defines all strings of the language

PR: Production rules to indicate how T and NT are combines to generate valid strings of the language.

PR: NT (T | NT)*

Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model !

MTE.40

CSE4100

Context Free Grammars : A First LookContext Free Grammars : A First Look

assign_stmt id := expr ;

expr term operator term

term id

term real

term integer

operator +

operator -

What do “BLUE” symbols represent?

Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a collection of terminals / tokens.

Simply stated: Grammars / production rules allow us to “rewrite” and “identify” correct syntax.

What do “BLACK” symbols represent?

MTE.41

CSE4100

How is Grammar Used ?How is Grammar Used ?

Given the rules on the previous slide, suppose

id := real + int;

is input. Is it syntactically correct? How do we know?

expr is represented as: expr term operator term

Is this accurate / complete?

expr expr operator term

expr term

How does this affect the derivations which are possible?

MTE.42

CSE4100

Grammar ConceptsGrammar Concepts

A step in a derivation is zero or one action that replaces a NT with the RHS of a production rule.

EXAMPLE: E -E (the means “derives” in one step) using the production rule: E -E

EXAMPLE: E E A E E * E E * ( E )

DEFINITION: derives in one step

derives in one step

derives in zero steps

+

*

EXAMPLES: A if A is a production rule

1 2 … n 1 n ; for all

If and then

* *

**

MTE.43

CSE4100

lm

Leftmost and Rightmost DerivationsLeftmost and Rightmost Derivations

Leftmost: Replace the leftmost non-terminal symbol

E E A E id A E id * E id * id

Rightmost: Replace the leftmost non-terminal symbol

E E A E E A id E * id id * idrm

lmlmlmlm

rmrmrm

Important Notes: A

If A , what’s true about ?

If A , what’s true about ?rm

Derivations: Actions to parse input can be represented pictorially in a parse tree.

MTE.44

CSE4100

Examples of LM / RM DerivationsExamples of LM / RM Derivations

E E A E | ( E ) | -E | id

A + | - | * | / |

A leftmost derivation of : id + id * id

A rightmost derivation of : id + id * id

MTE.45

CSE4100

Derivations & Parse TreeDerivations & Parse Tree

E

EE A

E

EE A

*

E

EE A

id *

E

EE A

id id*

E * E

E E A E

id * E

id * id

MTE.46

CSE4100

Parse Trees and DerivationsParse Trees and Derivations

Consider the expression grammar:

E E+E | E*E | (E) | -E | id

Leftmost derivations of id + id * id

E E + E E + E id + E

E

EE +

id

E

EE *

id + E id + E * E

E

EE +

id

E

EE +

MTE.47

CSE4100

Removing AmbiguityRemoving Ambiguity

Take Original Grammar:

stmt if expr then stmt

| if expr then stmt else stmt

| other (any other statement)

Revise to remove ambiguity:

stmt matched_stmt | unmatched_stmt

matched_stmt if expr then matched_stmt else matched_stmt | other

unmatched_stmt if expr then stmt

| if expr then matched_stmt else unmatched_stmt

How does this grammar work ?

MTE.48

CSE4100

Resolving Difficulties : Left RecursionResolving Difficulties : Left Recursion

A left recursive grammar has rules that support the derivation : A A, for some .+

Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination.

A A A A … etc. A A |

Take left recursive grammar:

A A | To the following:

A’ A’

A’ A’ |

MTE.49

CSE4100

Why is Left Recursion a Problem ?Why is Left Recursion a Problem ?

Consider: E E + T | T T T * F | F F ( E ) | id

Derive : id + id + id

E E + T

How can left recursion be removed ?

E E + T | T What does this generate?

E E + T T + T

E E + T E + T + T T + T + T

How does this build strings ?

What does each string have to start with ?

MTE.50

CSE4100

Resolving Difficulties : Left Recursion (2)Resolving Difficulties : Left Recursion (2)

Informal Discussion:

Take all productions for A and order as:

A A1 | A2 | … | Am | 1 | 2 | … | n

Where no i begins with A.

Now apply concepts of previous slide:

A 1A’ | 2A’ | … | nA’

A’ 1A’ | 2A’ | … | m A’ | For our example: E E + T | T

T T * F | F

F ( E ) | id

E TE’ E’ + TE’ | T FT’

T’ * FT’ | F ( E ) | id

MTE.51

CSE4100

Resolving Difficulties : Left Recursion (3)Resolving Difficulties : Left Recursion (3)

Problem: If left recursion is two-or-more levels deep, this isn’t enough

S Aa | b

A Ac | Sd | S Aa Sda

Algorithm:Input: Grammar G with no cycles or -productions

Output: An equivalent grammar with no left recursion

1. Arrange the non-terminals in some order A1,A2,…An

2. for i := 1 to n do begin

for j := 1 to i – 1 do begin

replace each production of the form Ai Aj

by the productions Ai 1 | 2 | … | k

where Aj 1|2|…|k are all current Aj productions; end

eliminate the immediate left recursion among Ai productions end

MTE.52

CSE4100

Using the AlgorithmUsing the Algorithm

Apply the algorithm to: A1 A2a | b

A2 A2c | A1d |

i = 1

For A1 there is no left recursion

i = 2

for j=1 to 1 do

Take productions: A2 A1 and replace with

A2 1 | 2 | … | k |

where A1 1 | 2 | … | k are A1 productions

in our case A2 A1d becomes A2 A2ad | bdWhat’s left: A1 A2a | b

A2 A2 c | A2 ad | bd | Are we done ?

MTE.53

CSE4100

Using the Algorithm (2)Using the Algorithm (2)

No ! We must still remove A2 left recursion !

A1 A2a | b

A2 A2 c | A2 ad | bd |

Recall:

A A1 | A2 | … | Am | 1 | 2 | … | n

A 1A’ | 2A’ | … | nA’

A’ 1A’ | 2A’ | … | m A’ |

Apply to above case. What do you get ?

MTE.54

CSE4100

Removing Difficulties : Removing Difficulties : -Moves-Moves

Very Simple: A and B uAv implies add B uv to the grammar G.

Why does this work ?

E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id

Examples:

A1 A2 a | b

A2 bd A2’ | A2’

A2’ c A2’ | bd A2’ |

MTE.55

CSE4100

Removing Difficulties : CyclesRemoving Difficulties : Cycles

How would cycles be removed ?

S SS | ( S ) |

Has: S SS S

S

MTE.56

CSE4100

Removing Difficulties : Left FactoringRemoving Difficulties : Left Factoring

Problem : Uncertain which of 2 rules to choose:

stmt if expr then stmt else stmt

| if expr then stmt

When do you know which one is valid ?

What’s the general form of stmt ?

A 1 | 2 : if expr then stmt

1: else stmt 2 :

Transform to:

A A’

A’ 1 | 2

EXAMPLE:

stmt if expr then stmt rest

rest else stmt |

MTE.57

CSE4100

Resolving Grammar Problems : Resolving Grammar Problems : Another LookAnother Look

• Forcing Matching Within Grammar

- if-then-else else must go with most recent if

stmt if expr then stmt | if expr then stmt else stmt | other

stmt matched_stmt | unmatched_stmt

matched_stmt if expr then matched_stmt else matched_stmt | other

unmatched_stmt if expr then stmt

| if expr then matched_stmt else unmatched_stmt

Leads to:

MTE.58

CSE4100

Resolving Grammar Problems : Resolving Grammar Problems : Another Look (2)Another Look (2)

• Left Factoring - Know correct choice in advance !

where_queries where are_does declarations of symbol

| where are_does “search_option” appear

Rules given in Assignment or

where_queries where rest_where

rest_where are declarations of symbol

| does “search_option” appear

MTE.59

CSE4100

Resolving Grammar Problems : Resolving Grammar Problems : Another Look (3)Another Look (3)

• Left Recursion - Avoid infinite loop when choosing rules

A A | A A’

A’ A’ | EXAMPLE:

E E + T | T

T T * F | F

F ( E ) | id

E TE’ E’ + TE’ | T FT’

T’ * FT’ | F ( E ) | id

If we use the original production rules on input: id + id (with leftmost):

E E + T E + T + T E + T + T + T T + T + T + … + T

If we use the transformed production rules on input: id + id (with leftmost):

E TE’ T + TE’ id + TE’ id + id E’ id + id

*

not assured

**

MTE.60

CSE4100

Overall Top-Down AlgorithmOverall Top-Down Algorithm

Data structuresData structures A stack Holds the symbols to treat A lookahead window to choose a prediction

AlgorithmAlgorithm Startup

Initialize stack to start symbol (a non-terminal) Initialize lookahead window at start of token stream

Recursive Process Find out if front of window and top of stack match If match

– Consume the symbols

No match– Pop / Select / Push

MTE.61

CSE4100

Lookahead Size and LanguagesLookahead Size and Languages

With k tokens of lookaheadWith k tokens of lookahead Some languages can be parsed (this way) Some languages cannot be parsed

What about k+1 tokens ?What about k+1 tokens ? More languages can be parsed.... So the set of languages recognized with k is a

subset of the set of languages recognized with k+1!

We have a hierarchy!

Still...Still... In practice LL(1) should be enough.

MTE.62

CSE4100

Top-Down ParsingTop-Down Parsing

Identify a leftmost derivation for an input string Why ?

By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion consistent with scanning the input.

A aBc adDc adec (scan a, scan d, scan e, scan c - accept!)

Recursive-descent parsing concepts Predictive parsing

Recursive / Brute force technique non-recursive / table driven

Error recovery Implementation

MTE.63

CSE4100

Non-Recursive / Table DrivenNon-Recursive / Table Driven

Empty stack symbol

a + b $

Y

X

$

Z

Input

Predictive Parsing Program

Stack Output

Parsing Table M[A,a]

(String + terminator)

NT + T symbols of CFG What actions parser

should take based on stack / input

General parser behavior: X : top of stack a : current input

1. When X=a = $ halt, accept, success

2. When X=a $ , POP X off stack, advance input, go to 1.

3. When X is a non-terminal, examine M[X,a]

if it is an error call recovery routineif M[X,a] = {X UVW}, POP X, PUSH W,V,UDO NOT expend any input

MTE.64

CSE4100

Algorithm for Non-Recursive ParsingAlgorithm for Non-Recursive Parsing

Set ip to point to the first symbol of w$;

repeat

let X be the top stack symbol and a the symbol pointed to by ip;

if X is terminal or $ then

if X=a then

pop X from the stack and advance ip

else error()

else /* X is a non-terminal */

if M[X,a] = XY1Y2…Yk then begin

pop X from stack;

push Yk, Yk-1, … , Y1 onto stack, with Y1 on top

output the production XY1Y2…Yk

end

else error()

until X=$ /* stack is empty */

Input pointer

May also execute other code based on the production used

MTE.65

CSE4100

ExampleExample


Our well-worn example !

Table M

Non-terminal

INPUT SYMBOL

id + * ( ) $

E

E’

T

T’

F

ETE’

TFT’

Fid

E’+TE’

T’ T’*FT’

F(E)

TFT’

ETE’

T’

E’ E’

T’

MTE.66

CSE4100

Trace of ExampleTrace of Example

Expend Input

$E

$E’T$E’T’F$E’T’id$E’T’$E’$E’T+$E’T$E’T’F$E’T’id$E’T’$E’T’F*$E’T’F$E’T’id$E’T’$E’$

id + id * id$

id + id * id$id + id * id$id + id * id$

+ id * id$+ id * id$+ id * id$

id * id$id * id$id * id$

* id$* id$

id$id$

$$$

E TE’T FT’F id

T’ E’ +TE’

T FT’F id

T’ *FT’

F id

T’ E’

STACK INPUT OUTPUT

MTE.67

CSE4100

Leftmost Derivation for the ExampleLeftmost Derivation for the Example

The leftmost derivation for the example is as follows:

E TE’ FT’E’ id T’E’ id E’ id + TE’ id + FT’E’

id + id T’E’ id + id * FT’E’ id + id * id T’E’

id + id * id E’ id + id * id

MTE.68

CSE4100

What’s the Missing Puzzle Piece ?What’s the Missing Puzzle Piece ?

Constructing the Parsing Table M !

1st : Calculate First & Follow for Grammar

2nd: Apply Construction Algorithm for Parsing Table

Conceptual Perspective:

First: Let be a string of grammar symbols. First() are the first terminals that can appear in in any possible derivation. NOTE: If , then is First( ).

Follow: Let A be a non-terminal. Follow(A) is the set of terminals that can appear directly to the right of A in some sentential form. (S Aa, for some and ). NOTE: If S A, then $ is Follow(A).

*

* *

MTE.69

CSE4100

Computing First(X) : Computing First(X) : All Grammar SymbolsAll Grammar Symbols

1. If X is a terminal, First(X) = {X}

2. If X is a production rule, add to First(X)

3. If X is a non-terminal, and X Y1Y2…Yk is a production rule

Place First(Y1) in First(X)

if Y1 , Place First(Y2) in First(X)

if Y2 , Place First(Y3) in First(X)

…

if Yk-1 , Place First(Yk) in First(X)

NOTE: As soon as Yi , Stop.

May repeat 1, 2, and 3, above for each Yj

*

*

*

*

MTE.70

CSE4100

Computing First(X) : Computing First(X) : All Grammar Symbols - continuedAll Grammar Symbols - continued

Informally, suppose we want to compute

First(X1 X2 … Xn ) = First (X1) “+”

First(X2) if is in First(X1) “+”

First(X3) if is in First(X2) “+”

…

First(Xn) if is in First(Xn-1)

Note 1: Only add to First(X1 X2 … Xn) if is in First(Xi) for all i

Note 2: For First(X1), if X1 Z1 Z2 … Zm , then we need to compute First(Z1 Z2 … Zm) !

MTE.71

CSE4100

ExampleExample

Computing First for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id

First(E)

First(TE’)

First(T)

First(T) “+” First(E’)

First(F)

First((E)) “+” First(id)

First(F) “+” First(T’)

“(“ and “id”

Not First(E’) since T

Not First(T’) since F

*

*

Overall: First(E) = { ( , id } = First(F)

First(E’) = { + , } First(T’) = { * , }

First(T) First(F) = { ( , id }

MTE.72

CSE4100

Example 2Example 2

Given the production rules:

S i E t SS’ | a

S’ eS |

E b

Verify that

First(S) = { i, a }

First(S’) = { e, }

First(E) = { b }

MTE.73

CSE4100

Computing Follow(A) : Computing Follow(A) : All Non-TerminalsAll Non-Terminals

1. Place $ in Follow(S), where S is the start symbol and $ signals end of input

2. If there is a production A B, then everything in First() is in Follow(B) except for .

3. If A B is a production, or A B and (First() contains ), then everything in Follow(A) is in Follow(B)

(Whatever followed A must follow B, since nothing follows B from the production rule)

*

We’ll calculate Follow for two grammars.

MTE.74

CSE4100

ExampleExample

Compute Follow for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id

• Follow(E) - contains $ since E is the start symbol. Also, since F (E) then First(“)”) is in Follow(E). Thus Follow(E) = { ) , $ }

• Follow(E’) : E TE’ implies Follow(E) is in Follow(E’), and Follow(E’) = { ) , $ }

• Follow(T) : E TE’ implies put in First(E’). Since E’ , put in Follow(E). Since E’ +TE’ , Put in First(E’), and since E’ , put in Follow(E’). Thus Follow(T) = { +, ), $ }.

• Follow(T’)

• Follow(F)You do these !

**

MTE.75

CSE4100

Computing Follow : 2Computing Follow : 2ndnd Example Example

S i E t SS’ | a

S’ eS |

E b

First(S) = { i, a }

First(S’) = { e, }

First(E) = { b }

Recall:

Follow(S) – Contains $, since S is start symbol

Since S i E t SS’ , put in First(S’) – not

Since S’ , Put in Follow(S)

Since S’ eS, put in Follow(S’) So…. Follow(S) = { e, $ }

Follow(S’) = Follow(S) HOW?

Follow(E) = { t }

*

MTE.76

CSE4100

Motivation Behind First & FollowMotivation Behind First & Follow

First:

Follow:

Is used to indicate the relationship between non-terminals (in the stack) and input symbols (in input stream)

Example: If A , and a is in First(), then when a=input, replace with .

( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH .

Is used when First has a conflict, to resolve choices. When or , then what follows A dictates the next choice to be made.

Example: If A , and b is in Follow(A ), then when a , and if b is an input character, then we expand A with , which will eventually expand to , of which b follows!

( Above . Here First( ) contains .)

*

*

*

MTE.77

CSE4100

Constructing Parsing TableConstructing Parsing Table

Algorithm:

1. Repeat Steps 2 & 3 for each rule A

2. Terminal a in First()? Add A to M[A, a ]

3.1 in First()? Add A to M[A, a ] for all terminals b in Follow(A).

3.2 in First() and $ in Follow(A)? Add A to M[A, $ ]

4. All undefined entries are errors.

MTE.78

CSE4100

Constructing Parsing Table - ExampleConstructing Parsing Table - Example


First(E,F,T) = { (, id }

First(E’) = { +, }First(T’) = { *, }

Follow(E,E’) = { ), $}

Follow(F) = { *, +, ), }Follow(T,T’) = { +, ), }

Expression Example: E TE’ : First(TE’) = First(T) = { (, id }

M[E, ( ] : E TE’

M[E, id ] : E TE’

(by rule 2) E’ +TE’ : First(+TE’) = + : M[E’, +] : E’ +TE’

(by rule 3) E’ : in First( ) T’ : in First( )

by rule 2

M[E’, )] : E’ (3.1) M[T’, +] : T’ (3.1)

M[E’, $] : E’ (3.2) M[T’, )] : T’ (3.1)

(Due to Follow(E’) M[T’, $] : T’ (3.2)

MTE.79

CSE4100

Constructing Parsing Table – Example 2Constructing Parsing Table – Example 2

S i E t SS’ | a

S’ eS |

E b

First(S) = { i, a }

First(S’) = { e, }

First(E) = { b }

Follow(S) = { e, $ }

Follow(S’) = { e, $ }

Follow(E) = { t }

S i E t SS’ S a E b

First(i E t SS’)={i} First(a) = {a} First(b) = {b}

S’ eS S First(eS) = {e} First() = {} Follow(S’) = { e, $ }

INPUT SYMBOL

a $tie

S

S’

E

bNon-terminal

S a S iEtSS’

S

E b

S’ S’ eS

MTE.80

CSE4100

ExampleExample

Step 1Step 1 Compute

Follow First

T → F T’T’→ * F T’

→ F → ( E )

→ Id

S → E $E → T E’ E’→ + T E’

→

MTE.81

CSE4100

ExampleExample

Step 2Step 2 Build the parser table

Step 3Step 3 Input: Id + Id * Id $

Input SymbolsInput Symbols

NTNT IdId ++ ** (( )) $$

SS EE → → TE’TE’ EE → → TE’TE’

EE EE → → TE’TE’ EE → →TE’TE’

E’E’ E’E’ →+ →+TE’TE’ E’E’ → → E’E’ → → TT TT → → FT’FT’ TT → →FT’FT’

T’T’ T’T’ → → T’T’ → →**FT’FT’ T’T’ → → T’T’ → →

FF FF → → IdId FF → → ( (EE))

Pa

rse

r T

ab

le

T → F T’T’→ * F T’

→ F → ( E )

→ Id

S → E $E → T E’ E’→ + T E’

→

MTE.82

CSE4100

Motivation Behind First & FollowMotivation Behind First & Follow

First:

Follow:

Is used to indicate the relationship between non-terminals (in the stack) and input symbols (in input stream)

Example: If A , and a is in First(), then when a=input, replace with .

( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH .

Is used when First has a conflict, to resolve choices. When or , then what follows A dictates the next choice to be made.

Example: If A , and b is in Follow(A ), then when a , and if b is an input character, then we expand A with , which will eventually expand to , of which b follows!

( Above . Here First( ) contains .)

*

*

*

MTE.83

CSE4100

Constructing Parsing TableConstructing Parsing Table

Algorithm:

1. Repeat Steps 2 & 3 for each rule A

2. Terminal a in First()? Add A to M[A, a ]

3.1 in First()? Add A to M[A, a ] for all terminals b in Follow(A).

3.2 in First() and $ in Follow(A)? Add A to M[A, $ ]

4. All undefined entries are errors.

MTE.84

CSE4100

Constructing Parsing Table - ExampleConstructing Parsing Table - Example


First(E,F,T) = { (, id }

First(E’) = { +, }First(T’) = { *, }

Follow(E,E’) = { ), $}

Follow(F) = { *, +, ), }Follow(T,T’) = { +, ), }

Expression Example: E TE’ : First(TE’) = First(T) = { (, id }

M[E, ( ] : E TE’

M[E, id ] : E TE’

(by rule 2) E’ +TE’ : First(+TE’) = + : M[E’, +] : E’ +TE’

(by rule 3) E’ : in First( ) T’ : in First( )

by rule 2

M[E’, )] : E’ (3.1) M[T’, +] : T’ (3.1)

M[E’, $] : E’ (3.2) M[T’, )] : T’ (3.1)

(Due to Follow(E’) M[T’, $] : T’ (3.2)

MTE.85

CSE4100

Constructing Parsing Table – Example 2Constructing Parsing Table – Example 2

S i E t SS’ | a

S’ eS |

E b

First(S) = { i, a }

First(S’) = { e, }

First(E) = { b }

Follow(S) = { e, $ }

Follow(S’) = { e, $ }

Follow(E) = { t }

S i E t SS’ S a E b

First(i E t SS’)={i} First(a) = {a} First(b) = {b}

S’ eS S First(eS) = {e} First() = {} Follow(S’) = { e, $ }

INPUT SYMBOL

a $tie

S

S’

E

bNon-terminal

S a S iEtSS’

S

E b

S’ S’ eS

MTE.86

CSE4100

ExampleExample

Step 1Step 1 Compute

Follow First

T → F T’T’→ * F T’

→ F → ( E )

→ Id

S → E $E → T E’ E’→ + T E’

→

MTE.87

CSE4100

ExampleExample

Step 2Step 2 Build the parser table

Step 3Step 3 Input: Id + Id * Id $

Input SymbolsInput Symbols

NTNT IdId ++ ** (( )) $$

SS EE → → TE’TE’ EE → → TE’TE’

EE EE → → TE’TE’ EE → →TE’TE’

E’E’ E’E’ →+ →+TE’TE’ E’E’ → → E’E’ → → TT TT → → FT’FT’ TT → →FT’FT’

T’T’ T’T’ → → T’T’ → →**FT’FT’ T’T’ → → T’T’ → →

FF FF → → IdId FF → → ( (EE))

Pa

rse

r T

ab

le

T → F T’T’→ * F T’

→ F → ( E )

→ Id

S → E $E → T E’ E’→ + T E’

→

MTE.88

CSE4100

LL(1) GrammarsLL(1) Grammars

L : Scan input from Left to Right

L : Construct a Leftmost Derivation

1 : Use “1” input symbol as lookahead in conjunction with stack to decide on the parsing action

LL(1) grammars have no multiply-defined entries in the parsing table.

Properties of LL(1) grammars:

• Grammar can’t be ambiguous or left recursive• Grammar is LL(1) when A 1. & do not derive strings starting with the same terminal a 2. Either or can derive , but not both.

Note: It may not be possible for a grammar to be manipulated into an LL(1) grammar

MTE.89

CSE4100

Error RecoveryError Recovery

a + b $

Y

X

$

Z

Input

Predictive Parsing Program

Stack Output

Parsing Table M[A,a]

When Do Errors Occur? Recall Predictive Parser Function:

1. If X is a terminal and it doesn’t match input.

2. If M[ X, Input ] is empty – No allowable actions

Consider two recovery techniques:

A. Panic Mode

B. Phase-level Recovery

MTE.90

CSE4100

Panic Mode RecoveryPanic Mode Recovery

Augment parsing table with action that attempts to realign / synchronize token stream with the expected input.

Suppose : A on top of stack doesn’t mesh with current input symbol

1. Use Follow(A) to remove input tokens – sync (discard)

2. Use First(A) to determine when to restart parsing

3. Incorporate higher level language concepts (begin/end, while, repeat/until) to sync actions we don’t skip tokens unnecessarily.

Other actions:

4. When A , use it to manipulate stack to postpone error detection

5. Use non-matching terminal on stack as token that is inserted into input.

MTE.91

CSE4100

Revised Parsing Table / ExampleRevised Parsing Table / Example

synch

synch

synch

Non-terminal

INPUT SYMBOL

id + * ( ) $

E

E’

T

T’

F

ETE’

TFT’

Fid

E’+TE’

T’ T’*FT’

F(E)

TFT’

ETE’

T’

E’ E’

T’

synch

synch synch

synch

synch

synch

From Follow sets. Pop stack entry – T or NT

Skip input symbol

MTE.92

CSE4100

Skip & SynchSkip & Synch

MeaningMeaning Skip

Discard input symbol

SynchPop top of stack

MessagesMessages Constructed based on lookahead an non-terminal

ExampleExample NT = F Lookahead = + Expecting a FACTOR. Got + for a Term. So a

factor is missing.

MTE.93

CSE4100

Revised Parsing Table / Example(2)Revised Parsing Table / Example(2)

$E$E$E’T$E’T’F$E’T’id$E’T’$E’T’F*$E’T’F$E’T’$E’$E’T+$E’T$E’T’F$E’T’id$E’T’$E’$

) id * + id$

id * + id$ id * + id$ id * + id$

id * + id$ * + id$ * + id$

+ id$ + id$

+ id$ + id$

id$ id$ id$

$ $ $

STACK INPUT Remark

error, M[F,+] = synch

id is in First(E)

F has been popped

error, skip )

MTE.94

CSE4100

Phase-Level RecoveryPhase-Level Recovery

Fill in blanks entries of parsing table with error Fill in blanks entries of parsing table with error handling routineshandling routines

These routines These routines modify stack and / or input stream issue error message

Problems:Problems: Modifying stack has to be done with care, so as

to not create possibility of derivations that aren’t in language

Infinite loops must be avoided Can be used in conjunction with panic mode to Can be used in conjunction with panic mode to

have more complete error handlinghave more complete error handling

MTE.1 CSE4100 Midterm Exam Advice and Hints Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191.

Documents