Top Banner
by Neng-Fa Zhou Lexical Analysis Why separate lexical and syntax analyses? simpler design efficiency portability
37

By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

Jan 05, 2016

Download

Documents

Bryce Underwood
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Lexical Analysis

Why separate lexical and syntax analyses?– simpler design– efficiency– portability

Page 2: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Tokens, Patterns, Lexemes

– Tokens• Terminal symbols in the grammar

– Patterns• Description of a class of tokens

– Lexemes• Words in the the source program

Page 3: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Languages

– Fixed and finite alphabet (vocabulary)– Finite length sentences– Possibly infinite number of sentences

Examples– Natural numbers {1,2,3,...10,11,...}– Strings over {a,b} anban

Terms on parts of a string– prefix, suffix, substring, proper ....

Page 4: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Operations on Languages

Page 5: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Examples

L = {A,B,...,Z,a,b,...,z}D = {0,1,...,9}

L D : the set of letters and digitsLD : a letter followed by a digitL4 : four-letter stringsL* : all strings of letters, including L(L D)* : strings of letters and digits beginning with a letterD+ : strings of one or more digits

Page 6: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Regular Expression(RE)

is a RE a symbol in is a RE Let r and s be REs.

– (r) | (s) : or– (r)(s) : concatenation– (r)* : zero or more instances– (r)+ : one or more instances– (r)? : zero or one instance

Page 7: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Precedence of Operators

high

low

r* r+ r?

rs

r|s

all left associative Examples

= {a,b}1. a|b2. (a|b)(a|b)3. a*4. (a|b)*5. a| a*b

Page 8: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Algebraic Properties of RE

Page 9: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

d1 r1

d2 r2

dn rn

....di is a RE over {d1,d2,...,di-1}

Regular Definitions

not recursive

Page 10: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

Example-1

by Neng-Fa Zhou

%{ int num_lines = 0, num_chars = 0;%} %% \n ++num_lines; ++num_chars; . ++num_chars;

%%main(){ yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars );}

yywrap(){return 0;}

Page 11: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Example-2D [0-9]INT {D}{D}*

%%{INT}("."{INT}((e|E)("+"|-)?{INT})?)? {printf("valid %s\n",yytext);}. {printf("unrecognized %s\n",yytext);}%%int main(int argc, char *argv[]){

++argv, --argc;if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin;yylex();

}

yywrap(){return 0;}

Page 12: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

java.util.regex

by Neng-Fa Zhou

import java.util.regex.*;

class Number { public static void main(String[] args){

String regExNum = "\\d+(\\.\\d+((e|E)(\\+|-)?\\d+)?)?";if (Pattern.matches(regExNum,args[0])) System.out.println("valid");else System.out.println("invalid");

}}

Page 13: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

String Pattern Matching in Perl

by Neng-Fa Zhou

print "Input a string :";$_ = <STDIN>;chomp($_);if (/^[0-9]+(\.[0-9]+((e|E)(\+|-)?[0-9]+)?)?$/){ print "valid\n";} else { print "invalid\n"; }

Page 14: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Finite Automata

Nondeterministic finite automaton (NFA)

NFA = (S,T,s0,F)

– S: a set of states– T: a transition mapping– s0: the start state– F: final states or accepting states

Page 15: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Example

Page 16: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Deterministic Finite Automata (DFA)

T: a transition function There is only one arc going out from each node on each symbol.

Page 17: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Simulating a DFA

s = s0;c = nextchar;while (c != eof) {

s = move(s,c);c = nextchar;

}if (s is in F)

return "yes";else

return "no";

Page 18: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

From RE to NFA

– a in

– s|t

Page 19: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

From RE to NFA (cont.)

– st

– s*

Page 20: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Example

(a|b)*a

Page 21: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Building Lexical Analyzer

RE NFA DFA

Emulator

Algorithm 3.23(Thompson's construction)

Algorithm 3.32(Subset construction)

Page 22: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Conversion of an NFA into a DFA Intuition

– move(s,a) is a function in a DFA– move(s,a) is a mapping in a NFA

NFA DFA

A state reachable from s0 in the DFA on an input string corresponds to a set of states in NFA that are reachable on the same string.

Page 23: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Computation of -Closure

-Closure(T): Set of NFA states reachable from some NFA state s in T by transition alone.

Page 24: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

From an NFA to a DFA(The subset construction)

Page 25: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Example

NFA

DFA

Page 26: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Algorithm 3.39

F, S-F};do begin for each group G in do begin

partition G into subgroups such that two states s and tof G are in the same subgroup iff for all input symbols a, s and t have transitions on a to states in the same group;

replace G in by the set of all subgroups formed; end if () return;; end;

Page 27: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Example

a b

AC B ACB B DD B EE B AC

Page 28: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

Construct a DFA Directly from a Regular Expression

by Neng-Fa Zhou

Page 29: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Implementation Issues

Input buffering– Read in characters one by one

• Unable to look ahead

• Inefficient

– Read in a whole string and store it in memory• Requires a big buffer

– Buffer pairs

Page 30: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Buffer Pairs

Page 31: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Use Sentinels

Page 32: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Lexical Analyzer

Page 33: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Lex

A tool for automatically generating lexical analyzers

Page 34: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Lex Specifications

declarations%%

translation rules

%%auxiliary procedures

p1 {action1}p2 {action2}...pn {actionn}

Page 35: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Lex Regular Expressions

Page 36: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

yylex()

yylex(){switch (pattern_match()){ case 1: {action1} case 2: {action2}

... case n: {actionn}

}}

Page 37: By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

by Neng-Fa Zhou

Example

DIGIT [0-9]ID [a-z][a-z0-9]*%%{DIGIT}+ {printf("An integer:%s(%d)\n",yytext,atoi(yytext));}{DIGIT}+"."{DIGIT}* {printf("A float: %s (%g)\n",yytext,atof(yytext));}if|then|begin|end|procedure|function {printf("A keyword: %s\n",yytext);}{ID} {printf("An identifier %s\n",yytext);}"+"|"-"|"*"|"/" {printf("An operator %s\n",yytext);}"{"[^}\n]*"}" {/* eat up one-line comments */}[ \t\n]+ {/* eat up white space */}. {printf("Unrecognized character: %s\n", yytext);}%%int main(int argc, char *argv[]){

++argv, --argc;if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin;yylex();

}