Top Banner
CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
52

CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

May 11, 2018

Download

Documents

trinhliem
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

CS415 Compilers

Lexical Analysis and

These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice

University

Page 2: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14

First Programming Project

2 Lecture 7

Instruction Scheduling Project has been posted Due dates: code – February 25, report – March 4 - start early - problem is not totally defined; you may run into situations where you will need time to “redesign” your solution - the report is 40% of the grade, code is 60% - need to follow calling conventions since we will use automatic testing framework - we mainly provide help for C/C++ implementations - not a group project!

Page 3: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

3

Constructing a Scanner – Review

→  The scanner is the first stage in the front end →  Specifications can be expressed using regular expressions →  Build tables and code from a DFA

Scanner

Scanner Generator

specifications

source code parts of speech & words

Tables and code

Page 4: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

4

Review: Goal

•  We show how to construct a finite state automaton to recognize any RE

•  Overview: →  Direct construction of a nondeterministic finite automaton

(NFA) to recognize a given RE §  Requires ε-transitions to combine regular subexpressions

→  Construct a deterministic finite automaton (DFA) to simulate the NFA §  Use a set-of-states construction

→  Minimize the number of states §  Hopcroft state minimization algorithm

→  Generate the scanner code §  Additional specifications needed for details

Page 5: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

5

Non-deterministic Finite Automata

Each RE corresponds to a deterministic finite automaton (DFA) •  May be hard to directly construct the right DFA What about an RE such as ( a | b )* abb ?

This is a little different

•  S0 has a transition on ε •  S1 has two transitions on a This is a non-deterministic finite automaton (NFA)

a | b

S0 S1 S4 S2 S3 ε a b b

Page 6: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

6

Non-deterministic Finite Automata

•  An NFA accepts a string x iff ∃ a path though the transition graph from s0 to a final state such that the edge labels spell x

•  Transitions on ε consume no input •  To “run” the NFA, start in s0 and guess the right transition at

each step →  Always guess correctly →  If some sequence of correct guesses accepts x then accept

Why study NFAs? •  They are the key to automate the RE→DFA construction •  We can paste together NFAs with ε-transitions

NFA NFA becomes an NFA ε

Page 7: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

7

Relationship between NFAs and DFAs

DFA is a special case of an NFA •  DFA has no ε transitions •  DFA’s transition function is single-valued •  Same rules will work

DFA can be simulated with an NFA →  Obviously

NFA can be simulated with a DFA (less obvious) •  Simulate sets of possible states •  Possible exponential blowup in the state space •  Still, one state per character in the input stream

Page 8: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

8

Automating Scanner Construction

To convert a specification into code: 1  Write down the RE for the input language 2  Build a big NFA 3  Build the DFA that simulates the NFA 4  Systematically shrink the DFA 5  Turn it into code

Scanner generators •  Lex and Flex work along these lines •  Algorithms are well-known and well-understood •  Key issue is interface to parser (define all parts of speech) •  You could build one in a weekend!

Page 9: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

9

Review: Automating Scanner Construction

RE→ NFA (Thompson’s construction) •  Build an NFA for each term •  Combine them with ε-moves

NFA → DFA (subset construction) •  Build the simulation

DFA → Minimal DFA •  Hopcroft’s algorithm

DFA →RE (Not part of the scanner construction) •  All pairs, all paths problem •  Take the union of all paths from s0 to an accepting state

minimal DFA

RE NFA DFA

The Cycle of Constructions

Page 10: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

10

RE →NFA using Thompson’s Construction

Key idea •  NFA pattern for each symbol and each operator •  Each NFA has a single start and accept state •  Join them with ε moves in precedence order

S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

Page 11: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

11

RE →NFA using Thompson’s Construction

Key idea •  NFA pattern for each symbol and each operator •  Each NFA has a single start and accept state •  Join them with ε moves in precedence order

S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

Page 12: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

12

RE →NFA using Thompson’s Construction

Key idea •  NFA pattern for each symbol and each operator •  Each NFA has a single start and accept state •  Join them with ε moves in precedence order

S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S1 S2 a

S3 S4 b

Page 13: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

13

RE →NFA using Thompson’s Construction

Key idea •  NFA pattern for each symbol and each operator •  Each NFA has a single start and accept state •  Join them with ε moves in precedence order

S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5

Page 14: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

14

RE →NFA using Thompson’s Construction

Key idea •  NFA pattern for each symbol and each operator •  Each NFA has a single start and accept state •  Join them with ε moves in precedence order

S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε

Page 15: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

15

RE →NFA using Thompson’s Construction

Key idea •  NFA pattern for each symbol and each operator •  Each NFA has a single start and accept state •  Join them with ε moves in precedence order

S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε

S1 S3

NFA for a*

a

Page 16: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

16

RE →NFA using Thompson’s Construction

Key idea •  NFA pattern for each symbol and each operator •  Each NFA has a single start and accept state •  Join them with ε moves in precedence order

S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε

S1 S3

NFA for a*

a

ε

Page 17: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

17

RE →NFA using Thompson’s Construction

Key idea •  NFA pattern for each symbol and each operator •  Each NFA has a single start and accept state •  Join them with ε moves in precedence order

S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε

S0 S1 S3 S4

NFA for a*

a

ε

Page 18: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

18

RE →NFA using Thompson’s Construction

Key idea •  NFA pattern for each symbol and each operator •  Each NFA has a single start and accept state •  Join them with ε moves in precedence order

S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε

S0 S1 ε S3 S4

ε

NFA for a*

a

ε

ε

Page 19: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

19

RE →NFA using Thompson’s Construction

Key idea •  NFA pattern for each symbol and each operator •  Each NFA has a single start and accept state •  Join them with ε moves in precedence order

S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε

S0 S1 ε S3 S4

ε

NFA for a*

a

ε

ε

Ken Thompson, CACM, 1968

Page 20: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

20

Example of Thompson’s Construction

Let’s try a ( b | c )*

1. a, b, & c

2. b | c

3. ( b | c )*

S0 S1 a

S0 S1 b

S0 S1 c

S2 S3 b

S4 S5 c

S1 S6 S0 S7 ε ε

ε ε

S1 S2 b

S3 S4 c

S0 S5 ε

ε

ε

ε

Page 21: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

21

Example of Thompson’s Construction

Let’s try a ( b | c )*

1. a, b, & c

2. b | c

3. ( b | c )*

S0 S1 a

S0 S1 b

S0 S1 c

S2 S3 b

S4 S5 c

S1 S6 S0 S7

ε

ε

ε ε

ε ε

ε ε

S1 S2 b

S3 S4 c

S0 S5 ε

ε

ε

ε

Page 22: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

22

Example of Thompson’s Construction (con’t)

4. a ( b | c )*

Of course, a human would design something simpler ...

S0 S1 a

b | c But, we can automate production of the more complex one ...

S0 S1 a ε

S4 S5 b

S6 S7 c

S3 S8 S2 S9

ε

ε

ε ε

ε ε

ε ε

Page 23: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

23

Review: Automating Scanner Construction

RE→ NFA (Thompson’s construction) •  Build an NFA for each term •  Combine them with ε-moves

NFA → DFA (subset construction) •  Build the simulation

DFA → Minimal DFA •  Hopcroft’s algorithm

DFA →RE (Not part of the scanner construction) •  All pairs, all paths problem •  Take the union of all paths from s0 to an accepting state

minimal DFA

RE NFA DFA

The Cycle of Constructions

Page 24: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

24

NFA →DFA with Subset Construction

Need to build a simulation of the NFA

Two key functions •  Move(si , a) is set of states reachable from si by a •  ε-closure(si) is set of states reachable from si by ε

The algorithm: •  Start state derived from s0 of the NFA •  Take its ε-closure S0 = ε-closure(s0) •  Take the image of S0, move(S0, a) for each a ∈ Σ, and take

its ε-closure, add it to the state set S •  For each state S, compute move(S, a) for each a ∈ Σ, and

take its ε-closure •  Iterate until no more states are added

Sounds more complex than it is…

Page 25: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

25

NFA →DFA with Subset Construction

The algorithm:

s0 ← ε-closure(q0 ) add s0 to S

while ( S is still changing ) for each si ∈ S for each a∈ Σ s?← ε-closure(move(si,a)) if ( s? ∉ S ) then add s? to S as sj T[ si,a] ← sj

else T[ si,a] ← s?

Let’s think about why this works

Page 26: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

26

NFA →DFA with Subset Construction

The algorithm:

s0 ← ε-closure(q0 ) add s0 to S

while ( S is still changing ) for each si ∈ S for each a∈ Σ s?← ε-closure(move(si,a)) if ( s? ∉ S ) then add s? to S as sj T[ si,a] ← sj

else T[ si,a] ← s?

Let’s think about why this works

The algorithm halts:

1. S contains no duplicates (test before adding)

2. 2Q is finite

3. while loop adds to S, but does not remove from S (monotone)

⇒ the loop halts

S contains all the reachable NFA states It tries each symbol in each si. It builds every possible NFA configuration.

⇒ S and T form the DFA

Page 27: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

27

NFA →DFA with Subset Construction

Example of a fixed-point computation •  Monotone construction of some finite set •  Halts when it stops adding to the set •  Proofs of halting & correctness are similar •  These computations arise in many contexts

Other fixed-point computations •  Canonical construction of sets of LR(1) items

→  Quite similar to the subset construction •  Classic data-flow analysis

→  Solving sets of simultaneous set equations •  DFA minimization algorithm (coming up!)

We will see many more fixed-point computations

Page 28: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

28

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1

NFA →DFA with Subset Construction

Applying the subset construction:

a ( b | c )* : ε

Page 29: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

29

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

NFA →DFA with Subset Construction

Applying the subset construction:

a ( b | c )* : ε

Page 30: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

30

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

NFA →DFA with Subset Construction

Applying the subset construction:

a ( b | c )* : ε

Page 31: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

31

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none

NFA →DFA with Subset Construction

Applying the subset construction:

a ( b | c )* : ε

Page 32: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

32

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none q5

q7

NFA →DFA with Subset Construction

Applying the subset construction:

a ( b | c )* : ε

Page 33: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

33

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none q5, q8, q9, q3, q4, q6

q7, q8, q9, q3, q4, q6

NFA →DFA with Subset Construction

Applying the subset construction:

a ( b | c )* : ε

Page 34: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

34

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none q5, q8, q9, q3, q4, q6

q7, q8, q9, q3, q4, q6

s2 q5, q8, q9, q3, q4, q6

NFA →DFA with Subset Construction

Applying the subset construction:

a ( b | c )* : ε

Page 35: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

35

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none q5, q8, q9, q3, q4, q6

q7, q8, q9, q3, q4, q6

s2 q5, q8, q9, q3, q4, q6

none s2 s3

NFA →DFA with Subset Construction

Applying the subset construction:

a ( b | c )* : ε

Page 36: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

36

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none q5, q8, q9, q3, q4, q6

q7, q8, q9, q3, q4, q6

s2 q5, q8, q9, q3, q4, q6

none s2 s3

s3 q7, q8, q9, q3, q4, q6

NFA →DFA with Subset Construction

Applying the subset construction:

a ( b | c )* : ε

Page 37: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

37

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3,q4, q6, q9

none q5, q8, q9,q3, q4, q6

q7, q8, q9,q3, q4, q6

s2 q5, q8, q9,q3, q4, q6

none s2 s3

s3 q7, q8, q9,q3, q4, q6

none s2 s3

NFA →DFA with Subset Construction

Applying the subset construction:

a ( b | c )* : ε

Page 38: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

38

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3,q4, q6, q9

none q5, q8, q9,q3, q4, q6

q7, q8, q9,q3, q4, q6

s2 q5, q8, q9,q3, q4, q6

none s2 s3

s3 q7, q8, q9,q3, q4, q6

none s2 s3

Final states

NFA →DFA with Subset Construction

Applying the subset construction:

a ( b | c )* : ε

Page 39: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

39

NFA →DFA with Subset Construction

The DFA for a ( b | c )*

•  Ends up smaller than the NFA •  All transitions are deterministic

δ a b c

s0 s1 - -

s1 - s2 s3

s2 - s2 s3

s3 - s2 s3

s3

s2

s0 s1 c

ba

b

c

c

b

Page 40: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

40

Automating Scanner Construction

RE→NFA (Thompson’s construction) •  Build an NFA for each term •  Combine them with ε-moves

NFA →DFA (subset construction) •  Build the simulation

DFA →Minimal DFA •  Hopcroft’s algorithm

DFA →RE (not really part of scanner construction) •  All pairs, all paths problem •  Union together paths from s0 to a final state

minimal DFA

RE NFA DFA

The Cycle of Constructions

Page 41: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

41

DFA Minimization

The Big Picture •  Discover sets of equivalent states •  Represent each such set with just one state

Page 42: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

42

DFA Minimization

The Big Picture •  Discover sets of equivalent states •  Represent each such set with just one state

Two states are equivalent if and only if: •  ∀ a ∈ Σ, transitions on a lead to equivalent states (DFA) •  a-transitions to distinct sets ⇒ states must be in distinct sets

Page 43: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

43

DFA Minimization

The Big Picture •  Discover sets of equivalent states •  Represent each such set with just one state

Two states are equivalent if and only if: •  ∀ a ∈ Σ, transitions on a lead to equivalent states (DFA) •  a-transitions to distinct sets ⇒ states must be in distinct sets

A partition P of S •  Each state s ∈ S is in exactly one set pi ∈ P •  The algorithm iteratively partitions the DFA’s states

Page 44: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

44

DFA Minimization

Details of the algorithm •  Group states into maximal size sets, optimistically •  Iteratively subdivide those sets, as needed •  States that remain grouped together are equivalent

Initial partition, P0 , has two sets: {F} & {Q-F} (D =(Q,Σ,δ,q0,F))

Splitting a set (“partitioning a set by a”) •  Assume qa, & qb ∈ s, and δ(qa,a) = qx, & δ(qb,a) = qy •  If qx & qy are not in the same set, then s must be split

→  qa has transition on a, qb does not ⇒ a splits s

Page 45: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

45

DFA Minimization

The algorithm P ← { F, {Q-F}} while ( P is still changing) T ← { } for each set S ∈ P T ← T ∪ split(S) P ← T split(S): for each a∈ Σ if a splits S into S1 , S2 , … then return {S1 , S2, …} else return S

Why does this work? •  Partition P ∈ 2Q •  Start off with 2 subsets of Q

{F} and {Q-F} •  While loop takes Pi→Pi+1 by

splitting 1 or more sets •  Pi+1 is at least one step closer

to the partition with |Q| sets •  Maximum of |Q | splits Note that •  Partitions are never combined

Page 46: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

46

DFA Minimization

The algorithm P ← { F, {Q-F}} while ( P is still changing) T ← { } for each set S ∈ P T ← T ∪ split(S) P ← T split(S): for each a∈ Σ if a splits S into S1 , S2 , … then return {S1 , S2, …} else return S

Why does this work? •  Partition P ∈ 2Q •  Start off with 2 subsets of Q

{F} and {Q-F} •  While loop takes Pi→Pi+1 by

splitting 1 or more sets •  Pi+1 is at least one step closer

to the partition with |Q| sets •  Maximum of |Q | splits Note that •  Partitions are never combined

This is a fixed-point algorithm!

Page 47: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

47

Back to our DFA Minimization example

Then, apply the minimization algorithm

To produce the minimal DFA

s3

s2

s0 s1 c

ba

b

b

c

c

Split onCurrent Partition a b c

P0 { s1, s2, s3} {s0} none none none

s0 s1 a

b | c We observed that a human would design a simpler automaton than Thompson’s construction & the subset construction did.

Minimizing that DFA produces the one that a human would design!

final states

Page 48: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

48

Abbreviated Register Specification

Start with a regular expression r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | r8 | r9

minimal DFA

RE NFA DFA

The Cycle of Constructions

Page 49: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

49

Abbreviated Register Specification

Thompson’s construction produces r 0

r 1

r 2

r 8

r 9

… …

s0 sf

ε ε

ε

ε

ε ε

ε ε ε

ε

ε

ε ε

ε ε

ε

ε

ε

ε ε

minimal DFA

RE NFA DFA

The Cycle of Constructions

Page 50: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

50

Abbreviated Register Specification

The subset construction builds This is a DFA, but it has a lot of states …

r 0

sf0

s0

sf1 1 sf2 2

sf9 sf8

… 9

8

minimal DFA

RE NFA DFA

The Cycle of Constructions

Page 51: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

51

Abbreviated Register Specification

The DFA minimization algorithm builds This looks like what a skilled compiler writer would do!

r s0 sf

0,1,2,3,4, 5,6,7,8,9

minimal DFA

RE NFA DFA

The Cycle of Constructions

Page 52: CS415 Compilers Lexical Analysis and - Computer …zz124/cs415_spring2014/lectures/lec07...CS415 Compilers Lexical Analysis and These slides are based on slides copyrighted by Keith

cs415, spring 14 Lecture 7

52

Syntax Analysis Read EaC: 3.1 – 3.3

Homework and Next class