Top Banner
LAB # 4: SYNTAX ANALYSIS (PARSING) COMPILER ENGINEERING
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 4 compiler lab - Syntax Ana

L A B # 4 : S Y N TA X A N A LY S I S ( PA R S I N G )

COMPILER ENGINEERING

Page 2: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

2

LEXICAL ANALYSIS SUMMARY

1. Start New Token2. Read 1st character to start recognizing its type

according to the algorithm specified in 3. Slide 3

3. Pass its Token (Lexeme Type) and Value Attribute send to Parser

4. Repeat Steps (1-3)5. Repeat Until End

10-14/3/12

Page 3: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

310-14/3/12

Start New TOKEN

Read 1st Character

If is Digit?

If is Letter?

If any is digit or _?

If all letters?

Read Following Characters

TOKEN = NUM

TOKEN = ID

TOKEN=KEYWORD

Is a Keyword

?

Is RELOP?>, <, !, =

Is AROP?+, -. /, *,

=

TOKEN=RELOP

TOEKN = AROP

Is 2nd Char (=)?

Page 4: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

4

SYNTAX ANALYSIS (PARSING)

• is the process of analyzing a text, made of a sequence of tokens• to determine its grammatical structure with

respect to a given (more or less) formal grammar.• Builds Abstract Syntax Tree (AST)• Part from an Interpreter or a Compiler• Creates some form of Internal Representation (IR)• Programming Languages tend to be written in

Context-free grammar Efficient + fast Parsers can be written for them

10-14/3/12

Page 5: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

5

PHASE 2 : SYNTAX ANALYSIS

• also called sometimes Syntax Checking• Ensures that:• the code is valid grammatically (without worrying about

the meaning)• and will sequence into an executable program.

• The syntactical analyzer applies rules to the code; For example:• checking to make sure that each opening brace has a

corresponding closing brace,• and that each declaration has a type,• and that the type exists .. etc

10-14/3/12

Page 6: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

6

CONTEXT-FREE GRAMMAR

• Defines the components that forms an expression + defines the order they must appear in• A context-free grammar is a set of rules

specifying how syntactic elements in some language can be formed from simpler ones • The grammar specifies allowable ways to

combine tokens(called terminals), into higher-level syntactic elements (called non-terminal)

10-14/3/12

Page 7: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

7

CONTEXT-FREE GRAMMAR

• Ex.:• Any ID is an expression (Preferred to say TOKEN)• Any Number is an expression (Preferred to say TOKEN)• If Expr1 and Expr2 are expressions then: • Expr1+ Expr2 are expressions• Expr1* Expr2 are expressions

• If id1 and Expr2 are expressions then: • Id1 = Expr2 is a statement

• If Expr1and Statement 2 then• While (Expr1) do Statement 2,• If (Expr1) then Statement 2are statements

10-14/3/12

Page 8: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

8

GRAMMAR & AST

TOKEN (terminals) = LeafExpressions, Statements (Non-Terminals) = Nodes

10-14/3/12

Lexical Analysis

Stream of Characters

Stream of TOKENs

Stream of TOKENs Abstract Syntax Tree (AST)

SyntaxAnalysis

Page 9: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

9

PHASE 2 : SYNTAX ANALYSIS

10-14/3/12

Page 10: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

10

PHASE 2 : SYNTAX ANALYSIS

10-14/3/12

Syntax Analyzer

(Parser)

Token

Token

Tokens

Page 11: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

11

SYMBOL TABLE

• A Symbol Table is a data structure containing a record for each identifier with fields for the attributes of an ID• Tokens formed are recorded in the ST• Purpose:• To analyze expressions\statements, that is a hierarchal or

nesting structure is required• Data structure allows us to: find, retrieve, store a record

for an ID quickly.• For example: in Semantic Analysis Phase + Code Generation

phase retrieve ID Type to Type Check and Implementation purposes

10-14/3/12

Page 12: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

12

SYMBOL TABLE MANAGEMENT

• The Symbol Table may contain any of the following information:• For an ID:• The storage allocated for an ID,• its TYPE,• Its Scope (Where it’s valid in the program)

• For a function also:• Number of Arguments• Types of Arguments• Passing Method (By Reference or By Value)• Return Type

• Identifiers will be added if they don’t already exist

10-14/3/12

Page 13: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

13

SYMBOL TABLE MANAGEMENT

• Not all attributes can always be determined by a lexical analyzer because of its linear nature• E.g. dim a, x as integer• In this example the analyzer at the time when seeing the

IDs has still unreached the type keyword

• So, following phases will complete filling IDs attributes and using them as well• For example: the storage location attribute is assigned by

the code generator phase

10-14/3/12

Page 14: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

14

ERROR DETECTION & REPORTING

• In order the Compilation process proceed correctly, Each phase must:• Detect any error• Deal with detected error(s)

• Errors detection:• Most in Syntax + Semantic Analysis• In Lexical Analysis: if characters aren’t legal for token

formation• In Syntax Analysis: violates structure rules• In Semantic Analysis: correct structure but wrong invalid

meaning (e.g. ID = Array Name + Function Name)

10-14/3/12

Page 15: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

15

CO

MPILE

R P

HA

SES

10-14/3/12

Page 16: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

16

LEX

ICA

L AN

ALY

ZER

&

SYM

BO

L TAB

LE

10-14/3/12

TokenID

Token

Type

TokenValue

Location

Id1 ID position

expr1 AROP ASS

1d2 ID Initial

Expr2 AROP SUM

Id3 ID Rate

Expr3 AROP MUL

N1 Num 60

Lexical Analyzer

Page 17: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

17

SYN

TAX

AN

ALY

ZER

&

SYM

BO

L TAB

LE

10-14/3/12

Page 18: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

18

SYN

TAX

AN

ALY

ZER

&

SYM

BO

L TAB

LE

10-14/3/12

TokenID

TokenType

TokenValue Location

Id1 ID position

expr1 AROP ASS

1d2 ID Initial

Expr2 AROP SUM

Id3 ID Rate

Expr3 AROP MUL

N1 NUM 60

A LEAF is a record with two or more fieldsOne to identify the TOKEN and others to

identify info attributes

Page 19: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

19

SYN

TAX

AN

ALY

ZER

&

SYM

BO

L TAB

LE

10-14/3/12

Operator

Left Child

(Pointer)

Right Child

(Pointer)

Expr1 id1 Expr2

Expr2 id2 Expr3

Expr3 id3 N1

An interior NODE is a record with a field for the operator and two fields of pointers to the left

and right children

Page 20: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

20

TASK 1: THINK AS A COMPILER!

• Analyze the following program syntactically:

int main(){

std::cout << "hello world" << std::endl;

return 0;}

10-14/3/12

Page 21: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

21

LEXICAL ANALYZER OUTPUT

• 1 = string "int”• 2 = string "main”• 3 = opening parenthesis• 4 = closing parenthesis• 5 = opening brace• 6 = string "std”• 7 = namespace operator

8 = string "cout”• 9 = << operator• 10 = string ""hello world"”• 11 = string "endl”• 12 = semicolon• 13 = string "return”• 14 = number 0• 15 = closing brace

10-14/3/12

Page 22: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

22

TASK 2: A STATEMENT AST

• Create an abstract syntax tree for the following code for the Euclidean algorithm:

while b ≠ 0 if a > b

a := a − b else

b := b − a return a

10-14/3/12

Page 23: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

23

TASK 2: A STATEMENT AST

10-14/3/12

Page 24: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

24

LAB ASSIGNMENT

Write the Syntax Analyzer Components and Ensure fulfilling the following :• Create a Symbol Table (for all types including IDs, Functions, .. Etc)• Fill the Symbol Table with Tokens extracted from the Lexical Analysis phase•Differentiate between Node and Leaf• Applying grammar rules (tokens, expressions, statements)

10-14/3/12

Page 25: 4 compiler lab - Syntax Ana

Department of Computer Science - Compiler Engineering Lab

25

QUESTIONS?

Thank you for listening

10-14/3/12