Top Banner
Chapter 5: Scanner design 51 / 82 Lexical Analysis
70

Lexical Analysis - in.tum.de

May 02, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lexical Analysis - in.tum.de

Chapter 5:

Scanner design

51 / 82

Lexical Analysis

Page 2: Lexical Analysis - in.tum.de

Scanner design

Input (simplified): a set of rules:

e1 { action1 }e2 { action2 }

. . .ek { actionk }

Output: a program,

... reading a maximal prefix w from the input, that satisfiese1 | . . . | ek;

... determining the minimal i , such that w ∈ [[ei]];

... executing actioni for w.

52 / 82

Page 3: Lexical Analysis - in.tum.de

Scanner design

Input (simplified): a set of rules:

e1 { action1 }e2 { action2 }

. . .ek { actionk }

Output: a program,

... reading a maximal prefix w from the input, that satisfiese1 | . . . | ek;

... determining the minimal i , such that w ∈ [[ei]];

... executing actioni for w.

52 / 82

Page 4: Lexical Analysis - in.tum.de

Implementation:

Idea:

Create the DFA P(Ae) = (Q,Σ, δ, q0, F ) for the expressione = (e1 | . . . | ek);Define the sets:

F1 = {q ∈ F | q ∩ last[e1] 6= ∅}F2 = {q ∈ (F\F1) | q ∩ last[e2] 6= ∅}

. . .Fk = {q ∈ (F\(F1 ∪ . . . ∪ Fk−1)) | q ∩ last[ek] 6= ∅}

For input w we find: δ∗(q0, w) ∈ Fi iff the scannermust execute actioni for w

53 / 82

Page 5: Lexical Analysis - in.tum.de

Implementation:

Idea (cont’d):The scanner manages two pointers 〈A,B〉 and the related states〈qA, qB〉...Pointer A points to the last position in the input, after which astate qA ∈ F was reached;Pointer B tracks the current position.

H a l l o " ) ;( "s t d o u t . w r i t le n

A B

54 / 82

Page 6: Lexical Analysis - in.tum.de

Implementation:

Idea (cont’d):The scanner manages two pointers 〈A,B〉 and the related states〈qA, qB〉...Pointer A points to the last position in the input, after which astate qA ∈ F was reached;Pointer B tracks the current position.

H a l l o " ) ;( "w r i t le n

A B

⊥ q0

54 / 82

Page 7: Lexical Analysis - in.tum.de

Implementation:

Idea (cont’d):The current state being qB = ∅ , we consume input up toposition A and reset:

B := A; A := ⊥;qB := q0; qA := ⊥

H a l l o " ) ;( "w r i t le n

A B

44

55 / 82

Page 8: Lexical Analysis - in.tum.de

Implementation:

Idea (cont’d):The current state being qB = ∅ , we consume input up toposition A and reset:

B := A; A := ⊥;qB := q0; qA := ⊥

H a l l o " ) ;( "w r i t le n

A B

4 ∅

55 / 82

Page 9: Lexical Analysis - in.tum.de

Implementation:

Idea (cont’d):The current state being qB = ∅ , we consume input up toposition A and reset:

B := A; A := ⊥;qB := q0; qA := ⊥

H a l l o " ) ;( "

w r i t le nA B

q0⊥

55 / 82

Page 10: Lexical Analysis - in.tum.de

Extension: States

Now and then, it is handy to differentiate between particularscanner states.In different states, we want to recognize different token classeswith different precedences.Depending on the consumed input, the scanner state can bechanged

Example: Comments

Within a comment, identifiers, constants, comments, ... are ignored

56 / 82

Page 11: Lexical Analysis - in.tum.de

Input (generalized): a set of rules:

〈state〉 { e1 { action1 yybegin(state1); }e2 { action2 yybegin(state2); }

. . .ek { actionk yybegin(statek); }

}

The statement yybegin (statei); resets the current stateto statei.The start state is called (e.g.flex JFlex) YYINITIAL.

... for example:

〈YYINITIAL〉 ′′/∗′′ { yybegin(COMMENT); }〈COMMENT〉 { ′′ ∗ /′′ { yybegin(YYINITIAL); }

. | \n { }}

57 / 82

Page 12: Lexical Analysis - in.tum.de

Remarks:

“.” matches all characters different from “\n”.For every state we generate the scanner respectively.Method yybegin (STATE); switches between differentscanners.Comments might be directly implemented as (admittedly overlycomplex) token-class.Scanner-states are especially handy for implementingpreprocessors, expanding special fragments in regular programs.

58 / 82

Page 13: Lexical Analysis - in.tum.de

Topic:

Syntactic Analysis

59 / 82

Page 14: Lexical Analysis - in.tum.de

Syntactic Analysis

ParserToken-Stream Syntaxtree

Syntactic analysis tries to integrate Tokens into larger programunits.

Such units may possibliy be:

→ Expressions;

→ Statements;

→ Conditional branches;

→ loops; ...

60 / 82

Page 15: Lexical Analysis - in.tum.de

Syntactic Analysis

ParserToken-Stream Syntaxtree

Syntactic analysis tries to integrate Tokens into larger programunits.

Such units may possibliy be:

→ Expressions;

→ Statements;

→ Conditional branches;

→ loops; ...60 / 82

Page 16: Lexical Analysis - in.tum.de

Discussion:

In general, parsers are not developed by hand, but generated from aspecification:

ParserSpecification Generator

Specification of the hierarchical structure: contextfree grammarsGenerated implementation: Pushdown automata + X

61 / 82

Page 17: Lexical Analysis - in.tum.de

Discussion:

In general, parsers are not developed by hand, but generated from aspecification:

E→E{op}E Generator

Specification of the hierarchical structure: contextfree grammarsGenerated implementation: Pushdown automata + X

61 / 82

Page 18: Lexical Analysis - in.tum.de

Chapter 1:

Basics of Contextfree Grammars

62 / 82

Syntactic Analysis

Page 19: Lexical Analysis - in.tum.de

Basics: Context-free Grammars

Programs of programming languages can have arbitrarynumbers of tokens, but only finitely many Token-classes.This is why we choose the set of Token-classes to be the finitealphabet of terminals T .The nested structure of program components can be describedelegantly via context-free grammars...

Definition: Context-Free GrammarA context-free grammar (CFG) is a4-tuple G = (N,T , P , S) with:

N the set of nonterminals,

T the set of terminals,

P the set of productions or rules, andS ∈ N the start symbol

63 / 82

Page 20: Lexical Analysis - in.tum.de

Basics: Context-free Grammars

Programs of programming languages can have arbitrarynumbers of tokens, but only finitely many Token-classes.This is why we choose the set of Token-classes to be the finitealphabet of terminals T .The nested structure of program components can be describedelegantly via context-free grammars...

Definition: Context-Free GrammarA context-free grammar (CFG) is a4-tuple G = (N,T , P , S) with:

N the set of nonterminals,

T the set of terminals,

P the set of productions or rules, andS ∈ N the start symbol

63 / 82

Noam Chomsky John Backus

Page 21: Lexical Analysis - in.tum.de

Conventions

The rules of context-free grammars take the following form:

A→ α with A ∈ N , α ∈ (N ∪ T )∗

... for example:S → aS bS → ε

Specified language: {anbn | n ≥ 0}

Conventions:In examples, we specify nonterminals and terminals in generalimplicitely:

nonterminals are: A,B,C, ..., 〈exp〉, 〈stmt〉, ...;terminals are: a, b, c, ..., int, name, ...;

64 / 82

Page 22: Lexical Analysis - in.tum.de

Conventions

The rules of context-free grammars take the following form:

A→ α with A ∈ N , α ∈ (N ∪ T )∗

... for example:S → aS bS → ε

Specified language: {anbn | n ≥ 0}

Conventions:In examples, we specify nonterminals and terminals in generalimplicitely:

nonterminals are: A,B,C, ..., 〈exp〉, 〈stmt〉, ...;terminals are: a, b, c, ..., int, name, ...;

64 / 82

Page 23: Lexical Analysis - in.tum.de

Conventions

The rules of context-free grammars take the following form:

A→ α with A ∈ N , α ∈ (N ∪ T )∗

... for example:S → aS bS → ε

Specified language: {anbn | n ≥ 0}

Conventions:In examples, we specify nonterminals and terminals in generalimplicitely:

nonterminals are: A,B,C, ..., 〈exp〉, 〈stmt〉, ...;terminals are: a, b, c, ..., int, name, ...;

64 / 82

Page 24: Lexical Analysis - in.tum.de

... a practical example:

S → 〈stmt〉〈stmt〉 → 〈if〉 | 〈while〉 | 〈rexp〉;〈if〉 → if ( 〈rexp〉 ) 〈stmt〉 else 〈stmt〉〈while〉 → while ( 〈rexp〉 ) 〈stmt〉〈rexp〉 → int | 〈lexp〉 | 〈lexp〉 = 〈rexp〉 | ...〈lexp〉 → name | ...

More conventions:For every nonterminal, we collect the right hand sides of rulesand list them together.The j-th rule for A can be identified via the pair (A, j)( with j ≥ 0).

65 / 82

Page 25: Lexical Analysis - in.tum.de

... a practical example:

S → 〈stmt〉〈stmt〉 → 〈if〉 | 〈while〉 | 〈rexp〉;〈if〉 → if ( 〈rexp〉 ) 〈stmt〉 else 〈stmt〉〈while〉 → while ( 〈rexp〉 ) 〈stmt〉〈rexp〉 → int | 〈lexp〉 | 〈lexp〉 = 〈rexp〉 | ...〈lexp〉 → name | ...

More conventions:For every nonterminal, we collect the right hand sides of rulesand list them together.The j-th rule for A can be identified via the pair (A, j)( with j ≥ 0).

65 / 82

Page 26: Lexical Analysis - in.tum.de

Pair of grammars:

E → E+E 0 | E∗E 1 | ( E ) 2 | name 3 | int 4

E → E+T 0 | T 1

T → T∗F 0 | F 1

F → ( E ) 0 | name 1 | int 2

Both grammars describe the same language

66 / 82

Page 27: Lexical Analysis - in.tum.de

Pair of grammars:

E → E+E 0 | E∗E 1 | ( E ) 2 | name 3 | int 4

E → E+T 0 | T 1

T → T∗F 0 | F 1

F → ( E ) 0 | name 1 | int 2

Both grammars describe the same language

66 / 82

Page 28: Lexical Analysis - in.tum.de

Derivation

Grammars are term rewriting systems. The rules offer feasiblerewriting steps. A sequence of such rewriting steps α0 → . . . → αm

is called derivation.E

→ E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe derivation relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

67 / 82

... for example:

Page 29: Lexical Analysis - in.tum.de

Derivation

Grammars are term rewriting systems. The rules offer feasiblerewriting steps. A sequence of such rewriting steps α0 → . . . → αm

is called derivation.E → E + T

→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe derivation relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

67 / 82

... for example:

Page 30: Lexical Analysis - in.tum.de

Derivation

Grammars are term rewriting systems. The rules offer feasiblerewriting steps. A sequence of such rewriting steps α0 → . . . → αm

is called derivation.E → E + T→ T + T

→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe derivation relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

67 / 82

... for example:

Page 31: Lexical Analysis - in.tum.de

Derivation

Grammars are term rewriting systems. The rules offer feasiblerewriting steps. A sequence of such rewriting steps α0 → . . . → αm

is called derivation.E → E + T→ T + T→ T ∗ F + T

→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe derivation relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

67 / 82

... for example:

Page 32: Lexical Analysis - in.tum.de

Derivation

Grammars are term rewriting systems. The rules offer feasiblerewriting steps. A sequence of such rewriting steps α0 → . . . → αm

is called derivation.E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T

→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe derivation relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

67 / 82

... for example:

Page 33: Lexical Analysis - in.tum.de

Derivation

Grammars are term rewriting systems. The rules offer feasiblerewriting steps. A sequence of such rewriting steps α0 → . . . → αm

is called derivation.E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T

→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe derivation relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

67 / 82

... for example:

Page 34: Lexical Analysis - in.tum.de

Derivation

Grammars are term rewriting systems. The rules offer feasiblerewriting steps. A sequence of such rewriting steps α0 → . . . → αm

is called derivation.E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T

→ name ∗ int + F→ name ∗ int + int

DefinitionThe derivation relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

67 / 82

... for example:

Page 35: Lexical Analysis - in.tum.de

Derivation

Grammars are term rewriting systems. The rules offer feasiblerewriting steps. A sequence of such rewriting steps α0 → . . . → αm

is called derivation.E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F

→ name ∗ int + int

DefinitionThe derivation relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

67 / 82

... for example:

Page 36: Lexical Analysis - in.tum.de

Derivation

Grammars are term rewriting systems. The rules offer feasiblerewriting steps. A sequence of such rewriting steps α0 → . . . → αm

is called derivation.E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe derivation relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

67 / 82

... for example:

Page 37: Lexical Analysis - in.tum.de

Derivation

Grammars are term rewriting systems. The rules offer feasiblerewriting steps. A sequence of such rewriting steps α0 → . . . → αm

is called derivation.E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe derivation relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

67 / 82

... for example:

Page 38: Lexical Analysis - in.tum.de

Derivation

Grammars are term rewriting systems. The rules offer feasiblerewriting steps. A sequence of such rewriting steps α0 → . . . → αm

is called derivation.E → E + T→ T + T→ T ∗ F + T→ T ∗ int + T→ F ∗ int + T→ name ∗ int + T→ name ∗ int + F→ name ∗ int + int

DefinitionThe derivation relation→ is a relation on words over N ∪ T , with

α→ α′ iff α = α1 A α2 ∧ α′ = α1 β α2 for an A→ β ∈ P

The reflexive and transitive closure of → is denoted as: →∗

67 / 82

... for example:

Page 39: Lexical Analysis - in.tum.de

Derivation

Remarks:The relation → depends on the grammarIn each step of a derivation, we may choose:

∗ a spot, determining where we will rewrite.

∗ a rule, determining how we will rewrite.

The language, specified by G is:

L(G) = {w ∈ T ∗ | S →∗ w}

Attention:The order, in which disjunct fragments are rewritten is not relevant.

68 / 82

Page 40: Lexical Analysis - in.tum.de

Derivation

Remarks:The relation → depends on the grammarIn each step of a derivation, we may choose:

∗ a spot, determining where we will rewrite.

∗ a rule, determining how we will rewrite.

The language, specified by G is:

L(G) = {w ∈ T ∗ | S →∗ w}

Attention:The order, in which disjunct fragments are rewritten is not relevant.

68 / 82

Page 41: Lexical Analysis - in.tum.de

Derivation Tree

Derivations of a symbol are represented as derivation trees:

... for example:

E → 0 E + T→ 1 T + T→ 0 T ∗ F + T→ 2 T ∗ int + T→ 1 F ∗ int + T→ 1 name ∗ int + T→ 1 name ∗ int + F→ 2 name ∗ int + int

A derivation tree for A ∈ N :inner nodes: rule applications

root: rule application for A

leaves: terminals or εThe successors of (B, i) correspond to right hand sides of the rule

69 / 82

E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

Page 42: Lexical Analysis - in.tum.de

Special Derivations

Attention:In contrast to arbitrary derivations, we find special ones, alwaysrewriting the leftmost (or rather rightmost) occurance of anonterminal.

These are called leftmost (or rather rightmost) derivations andare denoted with the index L (or R respectively).Leftmost (or rightmost) derivations correspondt to a left-to-right(or right-to-left) preorder-DFS-traversal of the derivation tree.Reverse rightmost derivations correspond to a left-to-rightpostorder-DFS-traversal of the derivation tree

70 / 82

Page 43: Lexical Analysis - in.tum.de

Special Derivations

... for example:E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

Leftmost derivation: (E, 0) (E, 1) (T , 0) (T , 1) (F , 1) (F , 2) (T , 1) (F , 2)Rightmost derivation:

(E, 0) (T , 1) (F , 2) (E, 1) (T , 0) (F , 2) (T , 1) (F , 1)Reverse rightmost derivation:

(F , 1) (T , 1) (F , 2) (T , 0) (E, 1) (F , 2) (T , 1) (E, 0)

71 / 82

Page 44: Lexical Analysis - in.tum.de

Special Derivations

... for example:E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

Leftmost derivation: (E, 0) (E, 1) (T , 0) (T , 1) (F , 1) (F , 2) (T , 1) (F , 2)

Rightmost derivation:(E, 0) (T , 1) (F , 2) (E, 1) (T , 0) (F , 2) (T , 1) (F , 1)

Reverse rightmost derivation:(F , 1) (T , 1) (F , 2) (T , 0) (E, 1) (F , 2) (T , 1) (E, 0)

71 / 82

Page 45: Lexical Analysis - in.tum.de

Special Derivations

... for example:E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

Leftmost derivation: (E, 0) (E, 1) (T , 0) (T , 1) (F , 1) (F , 2) (T , 1) (F , 2)Rightmost derivation:

(E, 0) (T , 1) (F , 2) (E, 1) (T , 0) (F , 2) (T , 1) (F , 1)

Reverse rightmost derivation:(F , 1) (T , 1) (F , 2) (T , 0) (E, 1) (F , 2) (T , 1) (E, 0)

71 / 82

Page 46: Lexical Analysis - in.tum.de

Special Derivations

... for example:E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

Leftmost derivation: (E, 0) (E, 1) (T , 0) (T , 1) (F , 1) (F , 2) (T , 1) (F , 2)Rightmost derivation:

(E, 0) (T , 1) (F , 2) (E, 1) (T , 0) (F , 2) (T , 1) (F , 1)Reverse rightmost derivation:

(F , 1) (T , 1) (F , 2) (T , 0) (E, 1) (F , 2) (T , 1) (E, 0)

71 / 82

Page 47: Lexical Analysis - in.tum.de

Unique Grammars

The concatenation of leaves of a derivation tree t are often calledyield(t) .

... for example:E 0

+E 1

T 0

T 1

F 1

F 2

F 2

T 1

name

int

int∗

gives rise to the concatenation: name ∗ int + int .72 / 82

Page 48: Lexical Analysis - in.tum.de

Unique grammars

Definition:Grammar G is called unique, if for every w ∈ T ∗ there ismaximally one derivation tree t of S with yield(t) = w.

... in our example:

E → E+E 0 | E∗E 1 | ( E ) 2 | name 3 | int 4

E → E+T 0 | T 1

T → T∗F 0 | F 1

F → ( E ) 0 | name 1 | int 2

The first one is ambiguous, the second one is unique

73 / 82

Page 49: Lexical Analysis - in.tum.de

Conclusion:

A derivation tree represents a possible hierarchical structure of aword.For programming languages, only those grammars with a uniquestructure are of interest.Derivation trees are one-to-one corresponding with leftmostderivations as well as (reverse) rightmost derivations.

Leftmost derivations correspond to a top-down reconstruction ofthe syntax tree.Reverse rightmost derivations correspond to a bottom-upreconstruction of the syntax tree.

74 / 82

Page 50: Lexical Analysis - in.tum.de

Conclusion:

A derivation tree represents a possible hierarchical structure of aword.For programming languages, only those grammars with a uniquestructure are of interest.Derivation trees are one-to-one corresponding with leftmostderivations as well as (reverse) rightmost derivations.

Leftmost derivations correspond to a top-down reconstruction ofthe syntax tree.Reverse rightmost derivations correspond to a bottom-upreconstruction of the syntax tree.

74 / 82

Page 51: Lexical Analysis - in.tum.de

Chapter 2:

Basics of Pushdown Automata

75 / 82

Syntactic Analysis

Page 52: Lexical Analysis - in.tum.de

Basics of Pushdown Automata

Languages, specified by context free grammars are accepted byPushdown Automata:

The pushdown is used e.g. to verify correct nesting of braces.

76 / 82

Page 53: Lexical Analysis - in.tum.de

Example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

Conventions:We do not differentiate between pushdown symbols and statesThe rightmost / upper pushdown symbol represents the stateEvery transition consumes / modifies the upper part of thepushdown

77 / 82

Page 54: Lexical Analysis - in.tum.de

Example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

Conventions:We do not differentiate between pushdown symbols and statesThe rightmost / upper pushdown symbol represents the stateEvery transition consumes / modifies the upper part of thepushdown

77 / 82

Page 55: Lexical Analysis - in.tum.de

Definition: Pushdown AutomatonA pushdown automaton (PDA) is a tupleM = (Q,T , δ, q0, F ) with:

Q a finite set of states;T an input alphabet;q0 ∈ Q the start state;F ⊆ Q the set of final states andδ ⊆ Q+ × (T ∪ {ε})×Q∗ a finite set of transitions

We define computations of pushdown automata with the help oftransitions; a particular computation state (the current configuration)is a pair:

(γ,w) ∈ Q∗ × T ∗

consisting of the pushdown content and the remaining input.

78 / 82

Friedrich Bauer Klaus Samelson

Page 56: Lexical Analysis - in.tum.de

Definition: Pushdown AutomatonA pushdown automaton (PDA) is a tupleM = (Q,T , δ, q0, F ) with:

Q a finite set of states;T an input alphabet;q0 ∈ Q the start state;F ⊆ Q the set of final states andδ ⊆ Q+ × (T ∪ {ε})×Q∗ a finite set of transitions

We define computations of pushdown automata with the help oftransitions; a particular computation state (the current configuration)is a pair:

(γ,w) ∈ Q∗ × T ∗

consisting of the pushdown content and the remaining input.

78 / 82

Friedrich Bauer Klaus Samelson

Page 57: Lexical Analysis - in.tum.de

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

79 / 82

Page 58: Lexical Analysis - in.tum.de

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b)

` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

79 / 82

Page 59: Lexical Analysis - in.tum.de

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)

` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

79 / 82

Page 60: Lexical Analysis - in.tum.de

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)

` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

79 / 82

Page 61: Lexical Analysis - in.tum.de

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)

` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

79 / 82

Page 62: Lexical Analysis - in.tum.de

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)

` (1 2 , b)` (2 , ε)

79 / 82

Page 63: Lexical Analysis - in.tum.de

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)

` (2 , ε)

79 / 82

Page 64: Lexical Analysis - in.tum.de

... for example:

States: 0, 1, 2Start state: 0Final states: 0, 2

0 a 111 a 1111 b 212 b 2

(0 , a a a b b b) ` (1 1 , a a b b b)` (1 1 1 , a b b b)` (1 1 1 1 , b b b)` (1 1 2 , b b)` (1 2 , b)` (2 , ε)

79 / 82

Page 65: Lexical Analysis - in.tum.de

A computation step is characterized by the relation ` ⊆ (Q∗ × T ∗)2

with

(αγ, xw) ` (αγ′, w) for (γ, x, γ′) ∈ δ

Remarks:

The relation ` depends of the pushdown automaton MThe reflexive and transitive closure of ` is denoted by `∗

Then, the language accepted by M is

L(M) = {w ∈ T ∗ | ∃ f ∈ F : (q0, w)`∗ (f, ε)}

We accept with a final state together with empty input.

80 / 82

Page 66: Lexical Analysis - in.tum.de

A computation step is characterized by the relation ` ⊆ (Q∗ × T ∗)2

with

(αγ, xw) ` (αγ′, w) for (γ, x, γ′) ∈ δ

Remarks:

The relation ` depends of the pushdown automaton MThe reflexive and transitive closure of ` is denoted by `∗

Then, the language accepted by M is

L(M) = {w ∈ T ∗ | ∃ f ∈ F : (q0, w)`∗ (f, ε)}

We accept with a final state together with empty input.

80 / 82

Page 67: Lexical Analysis - in.tum.de

A computation step is characterized by the relation ` ⊆ (Q∗ × T ∗)2

with

(αγ, xw) ` (αγ′, w) for (γ, x, γ′) ∈ δ

Remarks:

The relation ` depends of the pushdown automaton MThe reflexive and transitive closure of ` is denoted by `∗

Then, the language accepted by M is

L(M) = {w ∈ T ∗ | ∃ f ∈ F : (q0, w)`∗ (f, ε)}

We accept with a final state together with empty input.

80 / 82

Page 68: Lexical Analysis - in.tum.de

Definition: Deterministic Pushdown AutomatonThe pushdown automaton M is deterministic, if everyconfiguration has maximally one successor configuration.

This is exactly the case if for distinct transitions(γ1, x, γ2) , (γ′1, x

′, γ′2) ∈ δ we can assume:Is γ1 a suffix of γ′1, then x 6= x′ ∧ x 6= ε 6= x′ is valid.

... for example:

0 a 111 a 1111 b 212 b 2

... this obviously holds

81 / 82

Page 69: Lexical Analysis - in.tum.de

Definition: Deterministic Pushdown AutomatonThe pushdown automaton M is deterministic, if everyconfiguration has maximally one successor configuration.

This is exactly the case if for distinct transitions(γ1, x, γ2) , (γ′1, x

′, γ′2) ∈ δ we can assume:Is γ1 a suffix of γ′1, then x 6= x′ ∧ x 6= ε 6= x′ is valid.

... for example:

0 a 111 a 1111 b 212 b 2

... this obviously holds

81 / 82

Page 70: Lexical Analysis - in.tum.de

Pushdown Automata

Theorem:For each context free grammar G = (N,T , P , S)a pushdown automaton M with L(G) = L(M) can be built.

The theorem is so important for us, that we take a look at twoconstructions for automata, motivated by both of the specialderivations:

MLG to build Leftmost derivations

MRG to build reverse Rightmost derivations

82 / 82

M. Schützenberger A. Öttinger