LAB # 4: SYNTAX ANALYSIS (PARSING) COMPILER ENGINEERING
Jun 27, 2015
L A B # 4 : S Y N TA X A N A LY S I S ( PA R S I N G )
COMPILER ENGINEERING
Department of Computer Science - Compiler Engineering Lab
2
LEXICAL ANALYSIS SUMMARY
1. Start New Token2. Read 1st character to start recognizing its type
according to the algorithm specified in 3. Slide 3
3. Pass its Token (Lexeme Type) and Value Attribute send to Parser
4. Repeat Steps (1-3)5. Repeat Until End
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
310-14/3/12
Start New TOKEN
Read 1st Character
If is Digit?
If is Letter?
If any is digit or _?
If all letters?
Read Following Characters
TOKEN = NUM
TOKEN = ID
TOKEN=KEYWORD
Is a Keyword
?
Is RELOP?>, <, !, =
Is AROP?+, -. /, *,
=
TOKEN=RELOP
TOEKN = AROP
Is 2nd Char (=)?
Department of Computer Science - Compiler Engineering Lab
4
SYNTAX ANALYSIS (PARSING)
• is the process of analyzing a text, made of a sequence of tokens• to determine its grammatical structure with
respect to a given (more or less) formal grammar.• Builds Abstract Syntax Tree (AST)• Part from an Interpreter or a Compiler• Creates some form of Internal Representation (IR)• Programming Languages tend to be written in
Context-free grammar Efficient + fast Parsers can be written for them
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
5
PHASE 2 : SYNTAX ANALYSIS
• also called sometimes Syntax Checking• Ensures that:• the code is valid grammatically (without worrying about
the meaning)• and will sequence into an executable program.
• The syntactical analyzer applies rules to the code; For example:• checking to make sure that each opening brace has a
corresponding closing brace,• and that each declaration has a type,• and that the type exists .. etc
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
6
CONTEXT-FREE GRAMMAR
• Defines the components that forms an expression + defines the order they must appear in• A context-free grammar is a set of rules
specifying how syntactic elements in some language can be formed from simpler ones • The grammar specifies allowable ways to
combine tokens(called terminals), into higher-level syntactic elements (called non-terminal)
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
7
CONTEXT-FREE GRAMMAR
• Ex.:• Any ID is an expression (Preferred to say TOKEN)• Any Number is an expression (Preferred to say TOKEN)• If Expr1 and Expr2 are expressions then: • Expr1+ Expr2 are expressions• Expr1* Expr2 are expressions
• If id1 and Expr2 are expressions then: • Id1 = Expr2 is a statement
• If Expr1and Statement 2 then• While (Expr1) do Statement 2,• If (Expr1) then Statement 2are statements
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
8
GRAMMAR & AST
TOKEN (terminals) = LeafExpressions, Statements (Non-Terminals) = Nodes
10-14/3/12
Lexical Analysis
Stream of Characters
Stream of TOKENs
Stream of TOKENs Abstract Syntax Tree (AST)
SyntaxAnalysis
Department of Computer Science - Compiler Engineering Lab
9
PHASE 2 : SYNTAX ANALYSIS
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
10
PHASE 2 : SYNTAX ANALYSIS
10-14/3/12
Syntax Analyzer
(Parser)
Token
Token
Tokens
Department of Computer Science - Compiler Engineering Lab
11
SYMBOL TABLE
• A Symbol Table is a data structure containing a record for each identifier with fields for the attributes of an ID• Tokens formed are recorded in the ST• Purpose:• To analyze expressions\statements, that is a hierarchal or
nesting structure is required• Data structure allows us to: find, retrieve, store a record
for an ID quickly.• For example: in Semantic Analysis Phase + Code Generation
phase retrieve ID Type to Type Check and Implementation purposes
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
12
SYMBOL TABLE MANAGEMENT
• The Symbol Table may contain any of the following information:• For an ID:• The storage allocated for an ID,• its TYPE,• Its Scope (Where it’s valid in the program)
• For a function also:• Number of Arguments• Types of Arguments• Passing Method (By Reference or By Value)• Return Type
• Identifiers will be added if they don’t already exist
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
13
SYMBOL TABLE MANAGEMENT
• Not all attributes can always be determined by a lexical analyzer because of its linear nature• E.g. dim a, x as integer• In this example the analyzer at the time when seeing the
IDs has still unreached the type keyword
• So, following phases will complete filling IDs attributes and using them as well• For example: the storage location attribute is assigned by
the code generator phase
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
14
ERROR DETECTION & REPORTING
• In order the Compilation process proceed correctly, Each phase must:• Detect any error• Deal with detected error(s)
• Errors detection:• Most in Syntax + Semantic Analysis• In Lexical Analysis: if characters aren’t legal for token
formation• In Syntax Analysis: violates structure rules• In Semantic Analysis: correct structure but wrong invalid
meaning (e.g. ID = Array Name + Function Name)
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
15
CO
MPILE
R P
HA
SES
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
16
LEX
ICA
L AN
ALY
ZER
&
SYM
BO
L TAB
LE
10-14/3/12
TokenID
Token
Type
TokenValue
Location
Id1 ID position
expr1 AROP ASS
1d2 ID Initial
Expr2 AROP SUM
Id3 ID Rate
Expr3 AROP MUL
N1 Num 60
Lexical Analyzer
Department of Computer Science - Compiler Engineering Lab
17
SYN
TAX
AN
ALY
ZER
&
SYM
BO
L TAB
LE
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
18
SYN
TAX
AN
ALY
ZER
&
SYM
BO
L TAB
LE
10-14/3/12
TokenID
TokenType
TokenValue Location
Id1 ID position
expr1 AROP ASS
1d2 ID Initial
Expr2 AROP SUM
Id3 ID Rate
Expr3 AROP MUL
N1 NUM 60
A LEAF is a record with two or more fieldsOne to identify the TOKEN and others to
identify info attributes
Department of Computer Science - Compiler Engineering Lab
19
SYN
TAX
AN
ALY
ZER
&
SYM
BO
L TAB
LE
10-14/3/12
Operator
Left Child
(Pointer)
Right Child
(Pointer)
Expr1 id1 Expr2
Expr2 id2 Expr3
Expr3 id3 N1
An interior NODE is a record with a field for the operator and two fields of pointers to the left
and right children
Department of Computer Science - Compiler Engineering Lab
20
TASK 1: THINK AS A COMPILER!
• Analyze the following program syntactically:
int main(){
std::cout << "hello world" << std::endl;
return 0;}
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
21
LEXICAL ANALYZER OUTPUT
• 1 = string "int”• 2 = string "main”• 3 = opening parenthesis• 4 = closing parenthesis• 5 = opening brace• 6 = string "std”• 7 = namespace operator
8 = string "cout”• 9 = << operator• 10 = string ""hello world"”• 11 = string "endl”• 12 = semicolon• 13 = string "return”• 14 = number 0• 15 = closing brace
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
22
TASK 2: A STATEMENT AST
• Create an abstract syntax tree for the following code for the Euclidean algorithm:
while b ≠ 0 if a > b
a := a − b else
b := b − a return a
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
23
TASK 2: A STATEMENT AST
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
24
LAB ASSIGNMENT
Write the Syntax Analyzer Components and Ensure fulfilling the following :• Create a Symbol Table (for all types including IDs, Functions, .. Etc)• Fill the Symbol Table with Tokens extracted from the Lexical Analysis phase•Differentiate between Node and Leaf• Applying grammar rules (tokens, expressions, statements)
10-14/3/12
Department of Computer Science - Compiler Engineering Lab
25
QUESTIONS?
Thank you for listening
10-14/3/12