Top Banner
Lecture # 3 Chapter #3: Lexical Analysis
31

Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Dec 31, 2015

Download

Documents

Alban McCoy
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Lecture # 3

Chapter #3: Lexical Analysis

Page 2: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Role of Lexical Analyzer

• It is the first phase of compiler

• Its main task is to read the input characters and produce as output a sequence of tokens that the parser uses for syntax analysis

• Reasons to make it a separate phase are:– Simplifies the design of the compiler– Provides efficient implementation– Improves portability

Page 3: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

3

Interaction of the Lexical Analyzer with the Parser (Fig 3.1)

LexicalAnalyzer Parser

SourceProgram

Token,tokenval

Symbol Table

Get nexttoken

error error

Page 4: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

4

Tokens, Patterns, and Lexemes

• A token is a classification of lexical units– For example: id and num

• Lexemes are the specific character strings that make up a token– For example: abc and 123

• Patterns are rules describing the set of lexemes belonging to a token– For example: “letter followed by letters and digits” and

“non-empty sequence of digits”

Page 5: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Diff b/w Token, Lexeme and Pattern

Token Lexeme Pattern

if if if

relation <, <=,=,<>,>,>= < or <= or = or <> or > or >=

id y, x Letter followed by letters and digits

num 31 , 28 Any numeric constant

operator + , *, - ,/ Any arithmetic operator+ or * or – or /

Page 6: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

6

Attributes of Tokens

Lexical analyzer

<id, “y”> <assign, :=> <num, 31> <operator , +> <num, 28> <operator, *> <id, “x”>

y := 31 + 28*x

Parser

token

tokenval(token attribute)

Page 7: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Section # 3.3: Specification of Tokens

• Alphabet: Finite, nonempty set of symbols Example: Example:

• Strings: Finite sequence of symbols from an alphabet e.g. 0011001

• Empty String: The string with zero occurrences of symbols from alphabet. The empty string is denoted by

Page 8: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Continue…• Length of String: Number of positions for

symbols in the string. |w| denotes the length of string w

Example |0110| = 4; | | = 0• Powers of an Alphabet: = = the set of strings

of length k with symbols from Example:

Page 9: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Continue..

• The set of all strings over is denoted

Page 10: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Continue..

• Language: is a specific set of strings over some fixed alphabet

Example: The set of legal English words The set of strings consisting of n 0's followed by n

1’s LP = the set of binary numbers whose value is

prime

Page 11: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Concatenation and Exponentiation

• The concatenation of two strings x and y is denoted by xy

• The exponentiation of a string s is defined by

s0 = si = si-1s for i > 0

note that s = s = s

Page 12: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Language Operations

• UnionL M = {s s L or s M}

• ConcatenationLM = {xy x L and y M}

• ExponentiationL0 = {}; Li = Li-1L

• Kleene closureL* = i=0,…, Li

• Positive closureL+ = i=1,…, Li

Page 13: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

13

Regular Expressions

• Basis symbols:– is a regular expression denoting language {}– a is a regular expression denoting {a}

• If r and s are regular expressions denoting languages L(r) and M(s) respectively, then– rs is a regular expression denoting L(r) M(s)– rs is a regular expression denoting L(r)M(s)– r* is a regular expression denoting L(r)*

– (r) is a regular expression denoting L(r)• A language defined by a regular expression is called a

Regular set or a Regular Language

Page 14: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

14

Regular Definitions

• Regular definitions introduce a naming convention: d1 r1

d2 r2

…dn rn

where each ri is a regular expression over {d1, d2, …, di-1 }

Page 15: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

• Example:

letter AB…Zab…z digit 01…9 id letter ( letterdigit )*

• The following shorthands are often used:

r+ = rr*

r? = r[a-z] = abc…z

• Examples:digit [0-9]num digit+ (. digit+)? ( E (+-)? digit+ )?

Page 16: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

16

Regular Definitions and Grammars

stmt if expr then stmt

if expr then stmt else stmt

expr term relop term

termterm id

num if if then then else else

relop < <= <> > >= = id letter ( letter | digit )*

num digit+ (. digit+)? ( E (+-)? digit+ )?

Grammar

Regular definitions

Page 17: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

17

Coding Regular Definitions in Transition Diagrams

0 21

6

3

4

5

7

8

return(relop, LE)

return(relop, NE)

return(relop, LT)

return(relop, EQ)

return(relop, GE)

return(relop, GT)

start <

=

>

=

>

=

other

other

*

*

9start letter 10 11*other

letter or digit

return(gettoken(), install_id())

relop <<=<>>>==

id letter ( letterdigit )*

Page 18: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Section 3.6 : Finite Automata

• Finite Automata are used as a model for:

– Software for designing digital circuits – Lexical analyzer of a compiler

– Searching for keywords in a file or on the web.

– Software for verifying finite state systems, such as communication protocols.

Page 19: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

19

Design of a Lexical Analyzer Generator

• Translate regular expressions to NFA• Translate NFA to an efficient DFA

regularexpressions NFA DFA

Simulate NFAto recognize

tokens

Simulate DFAto recognize

tokens

Optional

Page 20: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

20

Nondeterministic Finite Automata

• An NFA is a 5-tuple (S, , , s0, F) where

S is a finite set of states is a finite set of symbols, the alphabet is a mapping from S to a set of statess0 S is the start stateF S is the set of accepting (or final) states

Page 21: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

21

Transition Graph

• An NFA can be diagrammatically represented by a labeled directed graph called a transition graph

0start a 1 32b b

a

b

S = {0,1,2,3} = {a,b}s0 = 0F = {3}

Page 22: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

22

Transition Table

• The mapping of an NFA can be represented in a transition table

StateInputa

Inputb

0 {0, 1} {0}

1 {2}

2 {3}

(0,a) = {0,1}(0,b) = {0}(1,b) = {2}(2,b) = {3}

Page 23: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

23

The Language Defined by an NFA

• An NFA accepts an input string x if and only if there is some path with edges labeled with symbols from x in sequence from the start state to some accepting state in the transition graph

• A state transition from one state to another on the path is called a move

• The language defined by an NFA is the set of input strings it accepts, such as (ab)*abb for the example NFA

Page 24: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

24

N(r2)N(r1)

From Regular Expression to -NFA (Thompson’s Construction)

fi

fai

fiN(r1)

N(r2)

start

start

start

fistart

N(r) fistart

a

r1r2

r1r2

r*

Page 25: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

25

Combining the NFAs of a Set of Regular Expressions

2a1start

6a3start

4 5b b

8b7start

a b

a { action1 }abb { action2 } a*b+ { action3 }

2a1

6a3 4 5b b

8b7

a b0

start

Page 26: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

26

Deterministic Finite Automata

• A deterministic finite automaton is a special case of an NFA– No state has an -transition– For each state s and input symbol a there is at most one

edge labeled a leaving s

• Each entry in the transition table is a single state– At most one path exists to accept a string– Simulation algorithm is simple

Page 27: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

27

Example DFA

0start a 1 32b b

bb

a

a

a

A DFA that accepts (ab)*abb

Page 28: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

28

Conversion of an NFA into a DFA

• The subset construction algorithm converts an NFA into a DFA using:

-closure(s) = {s} {t s … t}-closure(T) = sT -closure(s)move(T,a) = {t s a t and s T}

• The algorithm produces:Dstates is the set of states of the new DFA consisting of sets of states of the NFADtran is the transition table of the new DFA

Page 29: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

29

-closure and move Examples

2a1

6a3 4 5b b

8b7

a b0

start

-closure({0}) = {0,1,3,7}move({0,1,3,7},a) = {2,4,7}-closure({2,4,7}) = {2,4,7}move({2,4,7},a) = {7}-closure({7}) = {7}move({7},b) = {8}-closure({8}) = {8}move({8},a) =

0

1

3

7

2

4

7

7 8a ba a none

Also used to simulate NFAs

Page 30: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

30

Subset Construction Example 1

0start a1 10

2

b

b

a

b

3

4 5

6 7 8 9

Astart

B

C

D E

b

b

b

b

b

aa

a

a

DstatesA = {0,1,2,4,7}B = {1,2,3,4,6,7,8}C = {1,2,4,5,6,7}D = {1,2,4,5,6,7,9}E = {1,2,4,5,6,7,10}

a

Page 31: Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

31

Subset Construction Example 2

2a1

6a3 4 5b b

8b7

a b0

start

a1

a2

a3

DstatesA = {0,1,3,7}B = {2,4,7}C = {8}D = {7}E = {5,8}F = {6,8}

Astart

a

D

b

b

b

ab

bB

C

E Fa

b

a1

a3

a3 a2 a3