0368-3133 Lecture 2: Lexical Analysis Noam Rinetzkymaon/teaching/2017-2018/compilation/compilation...Conceptual Structure of a Compiler Executable code exe Source text txt Semantic
Post on 15-Feb-2019
221 Views
Preview:
Transcript
Compilation0368-3133
Lecture2:
LexicalAnalysis
NoamRinetzky1
2
LexicalAnalysis
ModernCompilerDesign:Chapter2.1
3
ConceptualStructureofaCompiler
Executable code
exe
Sourcetext
txt
SemanticRepresentation
Backend
Compiler
Frontend
LexicalAnalysis
Syntax AnalysisParsing
Semantic Analysis
IntermediateRepresentation
(IR)
CodeGeneration
4
ConceptualStructureofaCompiler
Executable code
exe
Sourcetext
txt
SemanticRepresentation
Backend
Compiler
Frontend
LexicalAnalysis
Syntax AnalysisParsing
Semantic Analysis
IntermediateRepresentation
(IR)
CodeGeneration
words sentences 5
WhatdoesLexicalAnalysisdo?
• Language:fullyparenthesizedexpressionsExpr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’
( ( 23 + 7 ) * 19 )
6
WhatdoesLexicalAnalysisdo?
• Language:fullyparenthesizedexpressionsContextfreelanguage
Regularlanguages
( ( 23 + 7 ) * 19 )
Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’
7
WhatdoesLexicalAnalysisdo?
• Language:fullyparenthesizedexpressionsContextfreelanguage
Regularlanguages
( ( 23 + 7 ) * 19 )
Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’
8
WhatdoesLexicalAnalysisdo?
• Language:fullyparenthesizedexpressionsContextfreelanguage
Regularlanguages
( ( 23 + 7 ) * 19 )
Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’
9
WhatdoesLexicalAnalysisdo?
• Language:fullyparenthesizedexpressionsContextfreelanguage
Regularlanguages
( ( 23 + 7 ) * 19 )
LP LP Num Op Num RP Op Num RP
Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’
10
WhatdoesLexicalAnalysisdo?
• Language:fullyparenthesizedexpressionsContextfreelanguage
Regularlanguages
( ( 23 + 7 ) * 19 )
LP LP Num Op Num RP Op Num RPKind
Value
Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’
11
WhatdoesLexicalAnalysisdo?
• Language:fullyparenthesizedexpressionsContextfreelanguage
Regularlanguages
( ( 23 + 7 ) * 19 )
LP LP Num Op Num RP Op Num RPKind
Value
Expr® Num |LPExpr OpExpr RPNum® Dig|DigNumDig® ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’LP® ‘(’RP® ‘)’Op® ‘+’ |‘*’Token Token …
12
• Partitionstheinputintostreamoftokens– Numbers– Identifiers– Keywords– Punctuation
• Usuallyrepresentedas(kind,value)pairs– (Num,23)– (Op,‘*’)
• “word”inthesourcelanguage• “meaningful”tothesyntacticalanalysis
WhatdoesLexicalAnalysisdo?
13
Fromscanningtoparsing((23 + 7) * x)
) ?*)7+23((RPIdOPRPNumOPNumLPLP
LexicalAnalyzer
programtext
tokenstream
ParserGrammar:Expr® ...|IdId® ‘a’|...|‘z’
Op(*)
Id(?)
Num(23) Num(7)
Op(+)
AbstractSyntaxTree
validsyntaxerror
14
WhyLexicalAnalysis?
• Well,notstrictlynecessary,but …– RegularlanguagesÍ Context-Freelanguages
• Simplifiesthesyntaxanalysis(parsing)– Andlanguagedefinition
• Modularity• Reusability• Efficiency
15
Lecturegoals
• Understandrole&placeoflexicalanalysis
• Lexicalanalysistheory• Usingprogramgeneratingtools
16
LectureOutline
üRole&placeoflexicalanalysis• Whatisatoken?• Regularlanguages• Lexicalanalysis• Errorhandling• Automaticcreationoflexicalanalyzers
17
Whatisatoken?(Intuitively)
• A“word”inthesourcelanguage– Anythingthatshouldappearintheinputtosyntaxanalysis• Identifiers• Values• Languagekeywords
• Usually,representedasapairof(kind,value)
18
ExampleTokens
Type Examples
ID foo, n_14, lastNUM 73, 00, 517, 082 REAL 66.1, .5, 5.5e-10IF ifCOMMA ,NOTEQ !=LPAREN (RPAREN )
19
ExampleNonTokens
Type Examplescomment /* ignored */preprocessordirective #include <foo.h>
#define NUMS 5.6macro NUMSwhitespace \t, \n, \b, ‘ ‘
20
Somebasicterminology
• Lexeme(akasymbol)- aseriesoflettersseparatedfromtherestoftheprogramaccordingtoaconvention(space,semi-column,comma,etc.)
• Pattern - arulespecifyingasetofstrings.Example:“anidentifierisastringthatstartswithaletterandcontinueswithlettersanddigits”– (Usually)aregularexpression
• Token - apairof(pattern,attributes)
21
Examplevoid match0(char *s) /* find a zero */
{
if (!strncmp(s, “0.0”, 3))
return 0.0 ;
}
VOID ID(match0) LPAREN CHAR DEREF ID(s) RPAREN
LBRACE
IF LPAREN NOT ID(strncmp) LPAREN ID(s) COMMA STRING(0.0) COMMA NUM(3) RPAREN RPAREN
RETURN REAL(0.0) SEMI
RBRACE
EOF 22
ExampleNonTokens
Type Examplescomment /* ignored */preprocessordirective #include <foo.h>
#define NUMS 5.6macro NUMSwhitespace \t, \n, \b, ‘ ‘
• Lexemesthatarerecognizedbutgetconsumedratherthantransmittedtoparser– if– i/*comment*/f
23
LectureOutline
üRole&placeoflexicalanalysisüWhatisatoken?• Regularlanguages• Lexicalanalysis• Errorhandling• Automaticcreationoflexicalanalyzers
24
Howcanwedefinetokens?
• Keywords– easy!– if,then,else,for,while,…
• Identifiers?• NumericalValues?• Strings?
• Characterizeunboundedsetsofvaluesusingaboundeddescription?
25
Regularlanguages
• Formallanguages– Σ =finitesetofletters– Word=sequenceofletter– Language=setofwords
• Regularlanguagesdefinedequivalentlyby– Regularexpressions– Finite-stateautomata
26
Commonformatforreg-expsBasic Patterns Matching
x Thecharacterx
. Anycharacter,usuallyexceptanewline
[xyz] Anyofthecharactersx,y,z
^x Anycharacterexceptx
RepetitionOperators
R? AnRornothing(=optionallyanR)
R* Zero ormoreoccurrencesofR
R+ OneormoreoccurrencesofR
CompositionOperators
R1R2 AnR1 followedbyR2
R1|R2 Either anR1orR2
Grouping
(R) Ritself 27
Examples
• ab*|cd?=• (a|b)*=• (0|1|2|3|4|5|6|7|8|9)*=
28
Escapecharacters
• Whatistheexpressionforoneormore+symbols?– (+)+ won’twork– (\+)+ will
• backslash\ beforeanoperatorturnsittostandard character– \*, \?, \+, a\(b\+\*, (a\(b\+\*)+, …
• backslashdoublequotessurroundstext– “a(b+*”, “a(b+*”+ 29
Shorthands
• Usenamesforexpressions– letter=a|b|…|z|A|B|…|Z– letter_=letter|_– digit=0|1|2|…|9– id=letter_(letter_|digit)*
• Usehyphentodenotearange– letter=a-z|A-Z– digit=0-9
30
Examples
• if=if• then=then• relop =<|>|<=|>=|=|<>
• digit=0-9• digits=digit+
31
Example
• A number is number = ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )+
( e | \. ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )+
( e | E ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )+)
)
• Using shorthands it can be written as
number = digits (e | \.digits (e | E (e|+|-) digits ) )
32
Exercise1- Question
• Languageofrationalnumbersindecimalrepresentation(noleading,endingzeros)– 0– 123.757– .933333– Not007– Not0.30
33
Exercise1- Answer
• Languageofrationalnumbersindecimalrepresentation(noleading,endingzeros)
– Digit =1|2|…|9Digit0=0|DigitNum =DigitDigit0*Frac =Digit0*DigitPos =Num|\.Frac |0\.Frac|Num\.FracPosOrNeg =(Є|-)PosR =0|PosOrNeg
34
Exercise2- Question
• Equalnumberofopeningandclosingparenthesis:[n]n =[],[[]],[[[]]],…
35
Exercise2- Answer
• Equalnumberofopeningandclosingparenthesis:[n]n =[],[[]],[[[]]],…
• Notregular• Context-free• Grammar: S::=[] |[S]
36
Challenge:Ambiguity
• If=if• Id=Letter(Letter|Digit)*
• “if”isavalididentifiers…whatshoulditbe?• ‘’iffy”isalsoavalididentifier
• Solution– Longestmatchingtoken– Breaktiesusingorderofdefinitions…
• Keywordsshouldappearbeforeidentifiers37
Creatingalexicalanalyzer
• Givenalistoftokendefinitions(patternname,regex),writeaprogramsuchthat– Input:Stringtobeanalyzed– Output:Listoftokens
• Howdowebuildananalyzer?
38
BuildingaScanner– TakeI
• Input:String
• Output:Sequenceoftokens
39
BuildingaScanner– TakeIToken nextToken(){char c ;loop: c = getchar();switch (c){case ` `: goto loop ;case `;`: return SemiColumn;case `+`: c = getchar() ;switch (c) {case `+': return PlusPlus ;case '=’ return PlusEqual;default: ungetc(c); return Plus;
};case `<`: …case `w`: …
} 40
Theremustbeabetterway!
41
Abetterway
• Automatically generate ascanner
• Definetokensusingregularexpressions
• Usefinite-stateautomatafordetection
42
Reg-expvs.automata
• Regularexpressionsaredeclarative– Goodfor humans– Not“executable”
• Automata areoperative– Defineanalgorithm fordecidingwhetheragivenwordisinaregularlanguage
– Notanaturalnotationforhumans43
Overview
• Definetokensusingregularexpressions
• Constructanondeterministicfinite-stateautomaton(NFA)fromregularexpression
• Determinize theNFAintoadeterministicfinite-stateautomaton(DFA)
• DFAcanbedirectlyusedtoidentifytokens44
Automatatheory:abird’s-eyeview
45
DeterministicAutomata(DFA)
• M=(S,Q,d,q0,F)– S - alphabet– Q– finitesetofstate– q0Î Q– initialstate– FÍ Q– finalstates– δ:Q´ Sà Q - transitionfunction
• Forawordw,Mreachsomestatex– MacceptswifxÎ F
46
DFA inpictures
start
a
b,c
a,b
c
acceptingstate
startstate
transition
• Anautomatonisdefinedbystatesandtransitions
47
a,b,c a,b,c
AcceptingWords
• Wordsarereadleft-to-rightcba
start
a
b
c
48
• Missingtransition=non-acceptance– “Stuckstate”
• Wordsarereadleft-to-right
AcceptingWords
cba
start
a
b
c
49
• Wordsarereadleft-to-right
AcceptingWords
cba
start
a
b
c
50
• Wordsarereadleft-to-right
AcceptingWords
cba
start
a
b
c
51
RejectingWords
cbb
start
a
b
c
52
• Wordsarereadleft-to-right
start
RejectingWords
• Missingtransitionmeansnon-acceptancecbb
a
b
c
53
Non-deterministicAutomata(NFA)
• M=(S,Q,d,q0,F)– S - alphabet– Q– finitesetofstate– q0 ÎQ – initialstate
– FÍ Q– finalstates– δ:Q´ (S È {e})→2Q - transitionfunction
• DFA:δ:Q´ Sà Q
• Forawordw,McanreachanumberofstatesX– MacceptswifX∩M≠{}
• Possible:X={}
• Possiblee-transitions 54
NFA
• Allowmultipletransitionsfromgivenstatelabeledbysameletter
start
a
a
b
c
c
b
55
Acceptingwords
cba
start
a
a
b
c
c
b
56
Acceptingwords
• Maintainsetofstates
cba
start
a
a
b
c
c
b
57
Acceptingwords
cba
start
a
a
b
c
c
b
58
Acceptingwords• Acceptwordifreachedanacceptingstate
cba
start
a
a
b
c
c
b
59
NFA+Є automata
• Є transitionscan“fire”withoutreadingtheinput
Є
start a
b
c
60
NFA+Є runexample
cba
Є
start a
b
c
61
NFA+Є runexample• NowЄ transitioncannon-deterministicallytakeplace
cba
Є
start a
b
c
62
NFA+Є runexample
cba
Є
start a
b
c
63
NFA+Є runexample
cba
Є
start a
b
c
64
NFA+Є runexample
cba
Є
start a
b
c
65
• Є transitionscan“fire”withoutreadingtheinput
NFA+Є runexample
cba
• Wordaccepted
Є
start a
b
c
66
FromregularexpressionstoNFA
• Step1:assignexpressionnamesandobtainpureregularexpressionsR1…Rm
• Step2:constructanNFAMi foreachregularexpressionRi
• Step3:combineallMi intoasingleNFA
• Ambiguityresolution:preferlongestacceptingword 67
Fromreg.exp.toautomata• Theorem:thereisanalgorithmtobuildanNFA+Є automatonforanyregularexpression
• Proof:byinductiononthestructureoftheregularexpression
start
68
R = e
R = f
R = aa
Basicconstructs
69
CompositionR = R1 | R2 e M1
M2e
e
e
R = R1R2
eM1 M2
e e
70
Repetition
R = R1*
eM1
e
e
e
71
72
Naïveapproach
• Tryeachautomatonseparately
• Givenawordw:– TryM1(w)– TryM2(w)– …– TryMn(w)
• Requiresresettingaftereveryattempt73
Actually,wecombineautomata
1 2aa
3a
4 b 5 b 6
abb
7 8b a*b+ba
9a
10 b 11 a 12 b 13
abab
0
e
e
e
e
aabba*b+abab
combines
74
CorrespondingDFA
01379
8
7
b
a
a24710
a
bb
68
5811b
12 13a b
b
abba*b+a*b+
a*b+
abab
a
Combine automata: an example.
Combine a, abb, a*b+, abab.
75#
1# 2#a#
a#
3#a#
4#b#
5#b#
6#
abb#
7# 8#b#
a*b+#b#a#
9#a#
10#b#
11#a#
12#b#
13#
abab#
0#
ε#
ε#
ε#
ε#
b
75
ScanningwithDFA
• Rununtilstuck– Rememberlastacceptingstate
• Gobacktoacceptingstate• Returntoken
76
Ambiguityresolution
• Longestword• Tie-breakerbasedonorderofrules whenwordshavesamelength
77
Combine automata: an example.
Combine a, abb, a*b+, abab.
75#
1# 2#a#
a#
3#a#
4#b#
5#b#
6#
abb#
7# 8#b#
a*b+#b#a#
9#a#
10#b#
11#a#
12#b#
13#
abab#
0#
ε#
ε#
ε#
ε#
Examples
01379
8
7
b
a
a24710
a
bb
68
5811b
12 13a b
b
abba*b+a*b+
a*b+
abab
a
Combine automata: an example.
Combine a, abb, a*b+, abab.
75#
1# 2#a#
a#
3#a#
4#b#
5#b#
6#
abb#
7# 8#b#
a*b+#b#a#
9#a#
10#b#
11#a#
12#b#
13#
abab#
0#
ε#
ε#
ε#
ε#b
abaa:getsstuckafterabainstate12,backsuptostate(5811)patternisa*b+,tokenisabTokens:<a*b+,ab><a,a><a,a> 78
Examples
01379
8
7
b
a
a24710
a
bb
68
5811b
12 13a b
b
abba*b+a*b+
a*b+
abab
a
b
abba:stopsaftersecondbin(68),tokenisabb becauseitcomesfirstinspec79Tokens:<abb,abb><a,a>
Combine automata: an example.
Combine a, abb, a*b+, abab.
75#
1# 2#a#
a#
3#a#
4#b#
5#b#
6#
abb#
7# 8#b#
a*b+#b#a#
9#a#
10#b#
11#a#
12#b#
13#
abab#
0#
ε#
ε#
ε#
ε#
SummaryofConstruction
• Describetokensasregularexpressions– Decideattributes(values)tosaveforeachtoken
• RegularexpressionsturnedintoaDFA– Also,recordswhichattributes(values)tokeep
• Lexicalanalyzersimulatestherunofanautomatawiththegiventransitiontableonanyinputstring
80
AFewRemarks
• TurninganNFAtoaDFAisexpensive,but– Exponentialintheworstcase– Inpractice,worksfine
• Theconstructionisdoneonceper-language– AtCompilerconstructiontime– Not atcompilationtime
81
Implementation
82
ImplementationbyExampleif { return IF; }[a-z][a-z0-9]* { return ID; }[0-9]+ { return NUM; }[0-9]”.”[0-9]+|[0-9]*”.”[0-9]+ { return REAL; }(\-\-[a-z]*\n)|(“ “|\n|\t) { ; }. { error(); }
83
if
xy,i,zs98
3,32,032
0.55,33.1
--comm\n\n, \t,““ ID
IF
ID error REAL
NUM REAL
error w.s.errorw.s.
01
2 3
9 10 1112
int edges[][256]= { /* …, 0, 1, 2, 3, ..., -, e, f, g, h, i, j, ... */
/* state 0 */ {0, …, 0, 0, …, 0, 0, 0, 0, 0, ..., 0, 0, 0, 0, 0, 0},/* state 1 */ {13, … , 7, 7, 7, 7, …, 9, 4, 4, 4, 4, 2, 4, …, 13, 13},/* state 2 */ {0, …, 4, 4, 4, 4, ..., 0, 4, 3, 4, 4, 4, 4, …, 0, 0},/* state 3 */ {0, …, 4, 4, 4, 4, …, 0, 4, 4, 4, 4, 4, 4, , 0, 0},/* state 4 */ {0, …, 4, 4, 4, 4, …, 0, 4, 4, 4, 4, 4, 4, …, 0, 0}, /* state 5 */ {0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0},/* state 6 */ {0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0, ..., 0, 0},/* state 7 *//* state … */ .../* state 13 */ {0, …, 0, 0, 0, 0, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0}}; 84
ID
IF
ID error REAL
NUM REAL
error w.s.errorw.s.
01
2 3
9 10 1112
PseudoCodeforScannerchar* input = … ;
Token nextToken() {lastFinal = 0; currentState = 1 ;inputPositionAtLastFinal = input; currentPosition = input; while (not(isDead(currentState))) {
nextState = edges[currentState][*currentPosition];if (isFinal(nextState)) {
lastFinal = nextState ; inputPositionAtLastFinal = currentPosition;
}currentState = nextState; advance currentPosition;
}input = inputPositionAtLastFinal + 1;return action[lastFinal];
}85
Example
Input:“if--not-a-com”
86
2blanks
ID
IF
ID error REAL
NUM REAL
error w.s.errorw.s.
01
2 3
9 10 1112
final state input
0 1 if--not-a-com
2 2 if--not-a-com
3 3 if--not-a-com
3 0 if--not-a-comreturnIF
87
ID
IF
ID error REAL
NUM REAL
error w.s.errorw.s.
01
2 3
9 10 1112
foundwhitespace
final state input
0 1 --not-a-com
12 12 --not-a-com
12 12 --not-a-com
12 0 --not-a-com
88
final state input
0 1 --not-a-com
9 9 --not-a-com
9 10 --not-a-com
10 10 --not-a-com
10 10 --not-a-com
10 0 --not-a-comerror
89
ID
IF
ID error REAL
NUM REAL
error w.s.errorw.s.
01
2 3
9 10 1112
final state input
0 1 -not-a-com
9 9 -not-a-com
9 0 -not-a-com
9 0 -not-a-com
9 0 -not-a-com
error
90
ID
IF
ID error REAL
NUM REAL
error w.s.errorw.s.
01
2 3
9 10 1112
Concludingremarks
• Efficientscanner• Minimization• Errorhandling• Automaticcreationoflexicalanalyzers
91
EfficientScanners
• Efficientstaterepresentation• Inputbuffering• Usingswitchandgotosinsteadoftables
92
Minimization
• Createanon-deterministicautomaton(NDFA)fromeveryregularexpression
• Mergealltheautomatausingepsilonmoves(likethe|construction)
• Constructadeterministicfiniteautomaton(DFA)– Statepriority
• Minimizetheautomaton– separateacceptingstatesbytokenkinds
93
Exampleif { return IF; }[a-z][a-z0-9]* { return ID; }[0-9]+ { return NUM; }
94ModerncompilerimplementationinML,AndrewAppel,(c)1998,Figures2.7,2.8
IDIF
errorNUM
Exampleif { return IF; }[a-z][a-z0-9]* { return ID; }[0-9]+ { return NUM; }
95ModerncompilerimplementationinML,AndrewAppel,(c)1998,Figures2.7,2.8
IDIF
error
NUM
ID
NUM
ID
IDIF
errorNUM
Example
96
IDIF
errorNUM
IDIF
error
NUM
ID
NUM
ID
if { return IF; }[a-z][a-z0-9]* { return ID; }[0-9]+ { return NUM; }
IDIF
errorNUM
ID
ID
IF
NUM NUM
error
ModerncompilerimplementationinML,AndrewAppel,(c)1998,Figures2.7,2.8
Example
97
if { return IF; }[a-z][a-z0-9]* { return ID; }[0-9]+ { return NUM; }
IDIF
errorNUM
ID
ID
ID
IF
NUM NUM
error
ModerncompilerimplementationinML,AndrewAppel,(c)1998,Figures2.7,2.8
ErrorHandling• Manyerrorscannotbeidentifiedatthisstage• Example:“fi(a==f(x))”.Should“fi”be“if”?Orisitaroutinename?
– Wewilldiscoverthislaterintheanalysis– Atthispoint,wejustcreateanidentifiertoken
• Sometimesthelexemedoesnotmatchanypattern– Easiest:eliminatelettersuntilthebeginningofalegitimatelexeme– Alternatives:eliminate/add/replaceoneletter,replaceorderoftwoadjacent
letters,etc.
• Goal:allowthecompilationtocontinue• Problem:errorsthatspreadallover
98
Automaticallygeneratedscanners
• UseofProgram-GeneratingTools– Specificationè Partofcompiler– Compiler-Compiler
Streamoftokens
JFlexregularexpressions
inputprogram scanner99
UseofProgram-GeneratingTools
• Input:regularexpressionsandactions• Action=Javacode
• Output:ascannerprogramthat• Producesastreamoftokens• Invokeactionswhenpatternismatched
Streamoftokens
JFlexregularexpressions
inputprogram scanner100
LineCountingExample
• Createaprogramthatcountsthenumberoflinesinagiveninputtextfile
101
CreatingaScannerusingFlex
int num_lines = 0;%%\n ++num_lines;. ;%%main() {yylex();printf( "# of lines = %d\n", num_lines);
}
102
CreatingaScannerusingFlex
initial
other
newline\n
^\n
int num_lines = 0;%%\n ++num_lines;. ;%%main() {yylex();printf( "# of lines = %d\n", num_lines);
}
103
JFLex SpecFileUsercode:CopieddirectlytoJavafile
%%JFlex directives:macros,statenames
%%Lexicalanalysisrules:– Optionalstate,regularexpression,action– Howtobreakinputtotokens– Actionwhentokenmatched
Possiblesourceofjavac errorsdown
theroad
DIGIT=[0-9]LETTER=[a-zA-Z]
YYINITIAL
{LETTER}({LETTER}|{DIGIT})*
104
CreatingaScannerusingJFlex
import java_cup.runtime.*;%%%cup%{private int lineCounter = 0;
%}
%eofval{System.out.println("line number=" + lineCounter);return new Symbol(sym.EOF);
%eofval}
NEWLINE=\n%%{NEWLINE} { lineCounter++; } [^{NEWLINE}] { }
105
Catchingerrors
• Whatifinputdoesn’tmatchanytokendefinition?
• Trick:Adda“catch-all”rulethatmatchesanycharacterandreportsanerror– Addafterallotherrules
106
AJFlex specificationofCScannerimport java_cup.runtime.*;%%%cup%{private int lineCounter = 0;
%}Letter= [a-zA-Z_]Digit= [0-9]%%”\t” { }”\n” { lineCounter++; }“;” { return new Symbol(sym.SemiColumn);}“++” { return new Symbol(sym.PlusPlus); }“+=” { return new Symbol(sym.PlusEq); }“+” { return new Symbol(sym.Plus); }“while” { return new Symbol(sym.While); }{Letter}({Letter}|{Digit})*
{ return new Symbol(sym.Id, yytext() ); }“<=” { return new Symbol(sym.LessOrEqual); }“<” { return new Symbol(sym.LessThan); }
107
Missing
• Creatingalexicalanalysisbyhand• Tablecompression• SymbolTables• NestedComments• HandlingMacros
108
LexicalAnalysis:What
• Input:programtext(file)• Output:sequenceoftokens
109
LexicalAnalysis:How
• Definetokensusingregularexpressions
• Constructanondeterministicfinite-stateautomaton(NFA)fromregularexpression
• Determinize theNFAintoadeterministicfinite-stateautomaton(DFA)
• DFAcanbedirectlyusedtoidentifytokens110
LexicalAnalysis:Why
• Readinputfile• Identifylanguagekeywordsandstandardidentifiers• Handleincludefilesandmacros• Countlinenumbers• Removewhitespaces• Reportillegalsymbols
• [Producesymboltable]
111
TheRealAnatomyofaCompiler
Executable code
exe
Sourcetext
txtLexicalAnalysis
Sem.Analysis
Process text input
characters SyntaxAnalysistokens AST
Intermediate code
generation
Annotated AST
Intermediate code
optimizationIR Code
generationIR
Target code optimization
Symbolic Instructions
SI Machine code generation
Write executable
output
MI
112
LexicalAnalysis
SyntaxAnalysis
top related