Page 1
CHAPTER 5 Compiler5.1 Basic Compiler Concepts
Source program
Lexical analysis Token
Table management
Syntax analysis Parse tree
Intermediate code generation Intermediate code
Error handling
Code optimalization Intermediate code
Code generation
Machine code
編譯器執行的功能
Page 2
Basic Compiler Concepts
1. Lexical Analysis (Lexical Analyzer 或 Scanner)
Read the source program one character at a time, carving the some program into a sequence of atomic units called token.
Token (token type, token value)
Page 3
Basic Compiler Concepts
PROGRAM MAIN;VARIABLE INTEGER:U,V,M;U = 5;V = 7;CALL S1(U ,V , M );ENP;SUBPOUTINE S1( INTEGER : X , Y , M ) ;M = X + Y + 2.7;ENS;
FRANCIS語言所寫之程式
Page 4
Basic Compiler Concepts
PROGRAM MAIN;(2,21) (5,3) (1,1)
VARIABLE INTEGER: U , V , M ;(2,25) (2,14) (1,12) (5,1) (1,11) (5,5) (1,11) (5,6) (1,1)
U = 5 ;(5,1) (1,4) (3,1) (1,1)
V = 7 ;(5,5) (1,4) (3,2) (1,1)
CALL S1 ( U , V , M ) ;(2,3) (5,10) (1,2) (5,1) (1,11) (5,5) (1,11) (5,6) (1,3) (1,1)
ENP ;(2,6) (1,1)
SUBPOUTINE S1 ( INTEGER : X , Y , M ) ;(2,23) (5,10) (1,2) (2,14) (1,12) (5,8) (1,11) (5,4) (1,11) (5,9) (1,3) (1,1)
M = X + Y + 2.7 ;(5,9) (1,4) (5,8) (1,5) (5,4) (1,5) (4,1) (1,1)
ENS ;
(2,7) (1,1)FRANCIS語言所寫之程式,被轉換成記號的格式
Page 5
Basic Compiler Concepts
2. Syntax Analysis (Syntax Analyzer 或 Parser)
The grammar specified the form, or syntax, of legal
statements in the language.
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>)
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
Page 6
Basic Compiler Concepts
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
<read>
<id-list>
READ ( id )
VALUEREAD (VALUE)敘述之語法樹
Parse Tree
Page 7
Basic Compiler Concepts<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>) PASCAL語言之部份文法 <assign>
<exp>
<exp>
<term> <term>
<term> <term>
<factor> <factor> <factor> <factor>
id := id DIV int - id * id
VARIANCE SUMSQ 100 MEAN MEAN
VARIANCE:= SUMSQ DIV 100 - MEAN * MEAN敘述之語法樹
Page 8
Basic Compiler Concepts
Syntax Error
<term>
<factor> <factor>
id + / id
A B A + / B敘述之語法樹
<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>) PASCAL語言之部份文法
Page 9
Basic Compiler Concepts
3. Intermediate Code Generation
Three Address Code
(operator , operand1 , operand2 , Res
ult)
A=B+C (+ , B , C , A)
SUM : =A/B*C ,可以被分解成 T1=A/B (/ , A , B , T1)
T2=T1*C (* , T1 , C , T2)
SUM=T2 (= , T2 , , SUM)
Page 10
Basic Compiler Concepts SUM : =A/B*C ,可以被分解成 T1=A/B (/ , A , B , T1)
T2=T1*C (* , T1 , C , T2)
SUM=T2 (= , T2 , , SUM) <assign>
<exp>
<exp>
<term>
<term> <term>
<factor> <factor> <factor>
id := id DIV id * id
SUM A B C
敘述 SUM:=A/B*C之語法樹
Page 11
Basic Compiler Concepts
4. Code Optimization
Improve the intermediate code (or machine code),
so that the ultimate object program run fast
and/or takes less space
FOR I:= 1 To 10 Do A:=10;begin FOR I:= 1 To 10 Do
A:=10; begin
B[I+1]:= C[I+1]+A; J:== I + 1; end B[J]:= C[J]+A; 未最佳化 end
最佳化後
Page 12
Basic Compiler Concepts
5. Code Generation
* Allocate memory location
* Select machine code for each intermediate code
* Register allocation: utilize registers as efficientl
y as possible
(+ , B , C , A) 我們可以得到
MOV AX,B
ADD AX,C
MOV A,AX
Page 13
Basic Compiler Concepts
SUM : =A/B*C
(/ , A , B , T1) MOV AX,A
DIV B
MOV T1,AX
(* , T1 , C , T2) MOV AX,T1
MUL C
MOV T2,AX
(= , T2 , , SUM) MOV AX,T2
MOV SUM,AX
Page 14
Basic Compiler Concepts
(/ , A , B , T1) MOV AX,A DIV B MOV T1,AX (* , T1 , C , T2) MOV AX,T1 MUL C MOV T2,AX (= , T2 , , SUM) MOV AX,T2 MOV SUM,AX
再作一次碼的最佳化
Page 15
Basic Compiler Concepts
6. Table Management and Error Handling
Token, symbol table, reserved word table, delimiter tab
le, constant table,… etc.
* 五大功能之每一功能均做一次處理,如此就是五次處理。
* 也可以把幾個功能合併在同一次處理。
* 它至少是二次處理。
Page 16
Grammar
5.2 Grammar 1. Grammar Backus Naur Form Grammar consists of a set of
rules, each which defines the syntax of some
construct in the programming language.
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>)
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
Non-terminal symbol Terminal symbol
Page 17
Grammar
2. Parse Tree (Syntax Tree)
It is often convenient to display the analysis of source
statement in terms of a grammar as a tree.
<read>
<id-list>
READ ( id )
VALUEREAD (VALUE)敘述之語法樹
Page 18
Grammar
3. Precedence and associativity
Precedence *, / > +, - Associativity a + b + c ( (a + b) + c)
Left associativity
Right associativity
Page 19
Grammar
4. Ambiguous Grammar
There is more than one possible parse
tree for a given statement. <start>
<term>
<term>
<term> <term> <term>
id + id - id
<start>
<term>
<term>
<term> <term> <term>
id + id - id
Page 20
Grammar
<start>
<term>
<term>
<term> <term> <term>
id + id - id
<start>
<term>
<term>
<term> <term> <term>
id + id - id
<start> ::= <term>
<term> ::= id | <term>+<term> | <term>-<term>
Ambiguous Grammar
Page 21
Lexical Analysis5.3 Lexical Analysis
Program 內有下列幾類 Token:
a. Identifier
b. Delimiter
c. Reserved Word
d. Constant integer, float, string
1. Identifier
<ident> ::= <letter> | <ident> <letter> | <ident> <digit
>
<letter>::= A | B | C | …..
<digit>::= 0 | 1 | 2 |…..
Multiple character token
Page 22
Lexical Analysis2. Token and Tables
1 ;2 (3 )4 =5 +6 -7 *8 /9 10 ‘11 ’12 :Table 1 Delimiters
Page 23
Lexical Analysis2. Token and Tables
1. AND2. BOOLEAN3. CALL4. DIMENSION5. ELSE6. ENP7. ENS8. EQ9. GE10. GT11. GTO12. IF13. INPUT14. INTEGER15. LABEL16. LE17. LT18. NE19. OR20. OUTPUT21. PROGRAM22. REAL23. SUBROUTINE24. THEN25. VARIABLE
Table 2 (Reserved Word Table)
Page 24
Lexical Analysis2. Token and Tables
1 5
2 7
Table 3 (Integer Table)
1 2.7
Table 4 (Real Number Table)
Page 25
Lexical Analysis2. Token and Tables
Identifier Subroutine Type Pointer
1 U 323 MAIN4 Y 105 V 36 M 378 X 109 M 1010 S1
Table 5 (Identifier Table)
Page 26
Lexical Analysis2. Token and Tables
PROGRAM MAIN;(2,21) (5,3) (1,1)
VARIABLE INTEGER: U , V , M ;(2,25) (2,14) (1,12) (5,1) (1,11) (5,5) (1,11) (5,6) (1,1)
U = 5 ;(5,1) (1,4) (3,1) (1,1)
V = 7 ;(5,5) (1,4) (3,2) (1,1)
CALL S1 ( U , V , M ) ;(2,3) (5,10) (1,2) (5,1) (1,11) (5,5) (1,11) (5,6) (1,3) (1,1)
ENP ;(2,6) (1,1)
SUBPOUTINE S1 ( INTEGER : X , Y , M ) ;(2,23) (5,10) (1,2) (2,14) (1,12) (5,8) (1,11) (5,4) (1,11) (5,9) (1,3) (1,1)
M = X + Y + 2.7 ;(5,9) (1,4) (5,8) (1,5) (5,4) (1,5) (4,1) (1,1)
ENS ;
(2,7) (1,1)FRANCIS語言所寫之程式,被轉換成記號的格式
Token Specifier(Token Type, Token Value)
Table Entry
Page 27
Syntax Analysis5.4 Syntax Analysis
1. Building the Parse Tree
a. Top down method
Begin with the rule of the grammar,
and attempt to construct the tree so
that the terminal nodes match the
statements being analyzed.
b. Bottom up method
Begin with the terminal nodes of the
tree, and attempt to combine these into
successively high level nodes until the
root is reached.
Page 28
Syntax Analysis * Top down method
Begin with the rule of the grammar,
and attempt to construct the tree so
that the terminal nodes match the
statements being analyzed. <start>
<term>
<term>
<term>
id + id - id
Page 29
Syntax Analysis * Bottom up method
Begin with the terminal nodes of the
tree, and attempt to combine these into
successively high level nodes until the
root is reached.
<term>
<term> <term> <term>
id + id - id
Page 30
Syntax Analysis2. Operator Precedence Parser Bottom up parser
READ ; := + - ( ) idREAD =
; < > <:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >
id > > > >
Precedence Matrix
Page 31
Syntax AnalysisREAD ; := + - ( ) id
READ =; < > <
:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >
id > > > >
Stack input< READ(id);<READ (id)<READ = ( id)<READ = ( <id )<READ = ( <id> )<READ = ( = id-list )<READ = ( = id-list ) >read
<read>
<id-list>
READ ( id )
VALUEREAD (VALUE)敘述之語法樹
Page 32
Syntax Analysis READ ; := + - ( ) idREAD =
; < > <:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >
id > > > >Stack input< id + id - id<id + id - id<id> + id - id<term + id - id<term + < id > - id<term + term > - id<term - < id<term - <id><term - term>term
<start>
<term>
<term>
<term> <term> <term>
id + id - id
Page 33
Syntax Analysis
Stack input< id + id - id<id + id - id<id> + id - id<term + id - id<term + < id > - id<term + term > - id<term - < id<term - <id><term - term>term
Generally use a stack to save tokens that have
been scanned but not yet parsed
<start> ::= <term>
<term> ::= id | <term>+<term> | <term>-<term>
Page 34
Syntax Analysis3. Recursive Descent Parser Top down method a. leftmost derivation It must be possible to decide which
alternative to used by examining the next input token
<stmt> id, READ, WRITE
<stmt> ::= <assign> | <read> | <write>
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
Page 35
Syntax Analysis b. left recursive Top down parser can not be used with
grammar that contains left recursive. Because unable to decide between its alternatives tokens.
both id and <id-list> can begin with id.
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>)
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
Page 36
Syntax AnalysisModified for recursive descent parser
<id-list> ::= id {, id}
<assign> ::= id:=<exp>
<exp> ::= <term> { +<term> | -<term> }
<term> ::= <factor> { *<factor> | DIV<factor> }
<factor> ::= id | int | (<exp>)
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
Page 37
Code Generation5.5 Code Generation
When the parser recognizes a portion of the source program according to some rule of grammar, the corresponding routine is executed.
Semantic Routine or Code Generation Routines
1.Operator precedence parser When sub-string is reduced to nonterminal
2.Recursive descent parser When procedure return to its caller, indicating su
ccess.
Page 38
Code Generation<start> ::= <term>
<term> ::= id | <term>+<term> | <term>-<term>
<start>
<term>
<term>
<term> <term> <term>
id + id - id
<term> ::= <term>1 + <term>2 MOV AX, <term>1 ADD AX, <term>2 MOV <term>, AX
<term> ::= <term>1 - <term>2 MOV AX, <term>1 SUB AX, <term>2 MOV <term>, AX
<term> ::= id add id to <term>
Page 39
Code Generation
直接產生 Assembly instructions 或 Machine codes 太細
故先翻成 Intermediate Form
Page 40
Intermediate Form
5.6 Intermediate Form
Three Address Code (Quadruple Form) (operator , operand1 , operand2 , Result)
<term> ::= <term>1 + <term>2
(+, <term>1, <term>2, <term>)
<term> ::= <term>1 - <term>2
(-, <term>1, <term>2, <term>)
<term> ::= id
add id to <term>
Page 41
Intermediate Form
Variance := sumsq DIV 100 - mean * mean
(DIV, sumsq, #100, i1)
(*, mean, mean, i2)
(-, i1, i2, i3)
(:=, i3, , variance)
Page 42
Machine Independent Compiler Features
5.7 Machine Independent Compiler Features
1. Storage Allocation
a. Storage Allocation
* Static Allocation
Allocate at compiler time
* Dynamic Allocation
Allocate at run time
Auto : Function call STACK
Controlled : malloc( ), free( ) HEAP
Page 43
Machine Independent Compiler Features2. Activation Record
Each function call creates an activation record that contains storage for all the variables used by the function, return address,… etc.
Variables
Return Address
Next
Previous
Variables
Return Address
Next
Previous
Stack
Page 44
Machine Independent Compiler FeaturesActivation Record
MAIN
Call SUB
MAIN Variables
Return Address
Next
Previous
Stack
MAIN
To OS
Page 45
Machine Independent Compiler FeaturesActivation Record
SUB Variables
MAIN
Return Address
Next
Previous Call SUB
MAIN Variables
Return Address SUB
Next Call SUB
Previous
Stack
MAIN
SUB
To OS
Page 46
Machine Independent Compiler FeaturesActivation Record
Return Address
SUB Variables
MAIN
Return Address
Next
Previous Call SUB
MAIN Variables
Return Address SUB
Next Call SUB
Previous
Stack
MAIN
SUB
SUB To OS
Page 47
Machine Independent Compiler Features
3. Prologue and Epilogue
The compiler must generate additional code to manage the activation records themselves.
a. Prologue
The code to create a new activation record
b. Epilogue
The code to delete the current activation record
Page 48
Machine Independent Compiler Features
4. Structure Variables
Array, Record, String, Set …..
B:array[0..3,0..1] of integer
B[0][0] B[0][1]
B[1][0] B[1][1]
B[2][0] B[2][1]
B[3][0] B[3][1]
B[0][0] B[0][1] B[1][0] B[1][1] B[2][0] B[2][1] B[3][0] B[3][1]
此陣列為列優先
B[0][0] B[1][0] B[2][0] B[3][0] B[0][1] B[1][1] B[2][1] B[3][1]
此陣列為行優先
Page 49
Machine Independent Compiler Features
Type B[a-b] [c-d]
Address of B[s][t]
Row Major
[(s - a) *(d - c +1) + (t - c) ] * sizeof(Type) + Base address
Column Major
[(t - c) *(b - a +1) + (s - a) ] * sizeof(Type) + Base address
B[0][0] B[0][1] B[1][0] B[1][1] B[2][0] B[2][1] B[3][0] B[3][1]
此陣列為列優先
B[0][0] B[1][0] B[2][0] B[3][0] B[0][1] B[1][1] B[2][1] B[3][1]
此陣列為行優先
Page 50
Machine Independent Compiler Features
5. Code Optimization
For I:= 1 to 10 Begin x[I, 2*J-1] := T[I, 2*J]; Table[I] := 2**I; END
T1:= 2 *J;T2 := T1 - 1;K := 1;For I:= 1 to 10 Begin x[I, T2] := T[I, T1]; K := K * 2; Table[I] := K; END
a. Common Sub-expression
b. Loop In-variants
c. Reduction in Strength
Page 51
Compiler Design Option
5.8 Compiler Design Option
1. Interpreter
An interpreter processes a source program written
in a high level language, just as a compiler does.
The main difference is that interpreters execute a
version of the source directly.
An interpreter can be viewed as a set of functions,
the execution of these functions is driven by the
internal form of the program.
Page 52
Compiler Design Option
2. P Code Compiler
* P Code 就是 Byte Code, 是一種與機器無關 (Machine Independent) 的語言
* 可以跨平台在不同種類的電腦內執行。
Source Java Byte
Program Interpreter Code
Byte Java
Code Run Module Run
Page 53
Compiler Design Option3. Compiler-Compiler
A software tool that can be used to help in the task of compiler construction.
Uses Finite State Automata
YACC Parser Generator
LEX Scanner GeneratorUnix
Page 54
Compiler Design Option
4. Cross Compiler
Program Cross 80XX Machine
Source Compiler Code 工作站
80XX Machine 個人電腦 Run
Code