6th Sem Cs CD Ct1 11 Solution

Silicon Institute of TechnologyClass Test – I (6th Sem. B. Tech- CS), 2011

Sub : Compiler Design

Time – 60 mins. Max. Marks – 10 Date : 17.02.11

(Answer any Four including Q.1)

Q.1. Short type (any Five) [0.5 x 5 = 2.5]

a) Differentiate between Compiler and Interpreter.

b) Define regular expression.

c) Draw a parse tree for the following statement:

IF (5 .EQ. MAX) GOTO 100

d) What do you mean by ‘Common Sub-expression Elimination’ in code

optimization?

e) What is a cross compiler and its advantage?

Q.2. Consider the following while statement: [2.5]

While A > B && A <= 2 * B – 5 do

A = A + B

Identify the Tokens. Generate the Parse tree, intermediate code and optimized

code.

Q.3. Discuss the different phases of a compiler with suitable examples. [2.5]

Q.4. Using Thomson Construction Rule, construct -NFA for the following regular

expressions. [2.5]

a) (a b)* b) ab (a b)* c) (a* b*)* a d) (a b)* a (a b)+

e) (a b)+ abb

Q5. Give an equivalent DFA for the regular expression (a|b)*abb. [2.5]

Good Luck

Solution

to

Sub – Compiler DesignClass Test – I, 2011

1)a) Compiler and Interpreter both are translators for high level languages. But they

have differences which are given below.

Compiler

i. Compiler translates the whole source code at a time and produces a list of errors.

ii. After removal of all errors the compiler produces the object code for the given source code.

iii. Compiler is faster.iv. Compiler generates an object

code of the source code.v. The generated object code is

stored in permanent storage device

vi. The object code is used for execution of the program without any need of the source code.

Interpreter

i. Interpreter translates the source code line-by-line and halts when an error is found.

ii. After removal of the error the interpreter moves with the translation of the next line of the source code.

iii. Interpreter is slower.iv. Interpreter generates an

executable code of the source code.

v. The generated executable code is stored in temporary storage device i.e. primary memory.

vi. The source code is always needed for execution of the program as the executable code is lost after the execution.

b) Regular Expression: Regular expressions over can be defined recursively as follows:

1. Any terminal symbol (i.e. an element of ), and are regular expressions.

2. The union of two regular expressions R1 and R2, written as R1 + R2, is also a regular expression.

3. The concatenation of two regular expressions R1 and R2, written as R1 R2, is also a regular expression.

4. The iteration (or closure) of a regular expression R, written as R*, is also a regular expression.

5. If R is a regular expression, then (R) is also a regular expression.6. A recursive application of the rules 1-5 once or several times results into a

regular expression.c) The statement is IF (5 .EQ. MAX) GOTO 100. The parse tree for the statement is

given below.

d) Consider the following two assignment statements,

A B + C + D

E B + C + F

We can have an equivalent set of assignment statements as follows:

T1 B + C

A T1 + D

E T1 + F

statement

if-statement

if ( conditional ) non-if-statement

relation

expression relational-op expression

goto-statement

goto

const (5) eq id (Max)

label (100)

Here the number of addition operation has been reduced in the second set of

statements. This has been achieved by taking the advantage of common sub-

expression B + C. The common subexpression has been eliminated from the

different assignment statements. This process of removal of common

subexpression is known as common subexpression elimination.

e) A compiler may run on one machine and produce object code for another

machine. Such a compiler is often called a cross-compiler. Suppose we have

written a compiler (where is a compiler for language L, written in

language L which produces object code in language B) and we have a compiler

. Now to bootstrap a compiler we have to compile through

to get . Then again is compiled through to provide .

This has been shown in the following steps:

In the above process, the compiler is a cross compiler. The cross compiler

helps us to bootstrap a compiler for another machine with less effort. We do not

have to undergo the difficulty of writing the compiler from very scratch.

2) The given while statement is while A > B & A ≤ 2*B – 5 do

A := A + B;The list of tokens present in the given string are:

while [id, n1] > [id, n2] & [id, n1] ≤ [const, n3] * [id, n2] – [const, n4] do [id, n1] [id, n1] + [id, n2];

Here n1, n2, n3 and n4 stand for pointers to the symbol table entries for A, B, 2, and 5, respectively.

The parse tree for the given statement is given below.

The intermediate code for the given string is as follows:

L1: if A > B goto L2goto L3

L2: T1 := 2 * BT2 := T1 – 5if A ≤ T2 goto L4goto L3

L4: A := A + Bgoto L1

L3:

statement

while-statement

while condition do statement

condition condition&

relation

exp relop exp

relation

exp exprelop

exp exp-

assignment

location exp

exp exp+

id(A) id(B) id(A)

id(B)

id(A)

exp exp*

const(2)

> ≤

const(5)

id(B)

id(A)

In an attempt to code improvement we can have local transformations. Here we are having two instances of jumps over jumps in the intermediate code.

if A > B goto L2goto L3

L2:This sequence of code can be replaced by the single statement

if A ≤ B goto L3.

By applying such replacement the optimized code will be as follows

L1: if A ≤ B goto L2T1 := 2 * BT2 := T1 – 5if A > T2 goto L2A := A + Bgoto L1

L2:

3) The compilation process can be divided into two groups: analysis and synthesis.

The analysis part breaks up the source program into constituent pieces and

imposes a grammatical structure on them. It then uses this structure to create an

intermediate representation of the source program. If the analysis part detects that the

source program is either syntactically ill formed or semantically unsound, then it must

provide informative messages, so the user can take corrective action. The analysis part

also collects information about the source program and stores it in a data structure called

a symbol table, which is passed along with the intermediate representation to the

synthesis part.

The synthesis part constructs the desired target program from the intermediate

representation and the information in the symbol table. The analysis part is often called

the front end of the compiler; the synthesis part is the back end.

The compilation process operates as a sequence of phases, each of which

transforms one representation of the source program to another. A typical decomposition

of a compiler into phases is shown in the following figure. The symbol table, which stores

information about the entire source program, is used by all phases of the compiler.

Some compilers have a machine-independent optimization phase between the

front end and back end. The purpose of this optimization phase is to perform

transformations on the intermediate representation, so that the back end can produce a

better target program than it would have otherwise produced from an unoptimized

intermediate representation.

Lexical Analysis:

The first phase of a compiler is called lexical analysis or scanning. The lexical

analyzer reads the stream of characters making up the source program and groups the

characters into meaningful sequences called lexemes. For each lexeme, the lexical

analyzer produces as output a token of the form

(token-name, attribute-value)

character stream

Syntax Analyzer

Semantic Analyzer

Intermediate Code Generator

Lexical Analyzer

token stream

syntax tree

Machine-Dependent Code Optimizer

Code-Generator

Machine-Independent Code Optimizer

syntax tree

intermediate representation

intermediate representation

target-machine code

target-machine code

Symbol Table

that it passes on to the subsequent phase, syntax analysis. In the token, the first

component token-name is an abstract symbol that is used during syntax analysis, and the

second component attribute-value points to an entry in the symbol table for the token.

For example, suppose a source program contains the assignment statement

position = initial + rate * 60

The characters in this assignment could be grouped into the following lexemes and

mapped into the following tokens passed on to the syntax analyzer:

position is a lexeme that would be mapped into a token id, 1, where id is an

abstract symbol standing for identifier and 1 points to the symbol table entry for

position. The symbol table entry for an identifier holds information about the

identifier, such as its name and type.

The assignment symbol = is a lexeme that is mapped into the token =. Since this

token needs no attribute-value, the second component is omitted.

initial is a lexeme that is mapped into the token id, 2, where 2 points to the

symbol table entry for initial.

+ is a lexeme that is mapped into the token +. rate is a lexeme that is mapped into the token id,3, where 3 points to the symbol

table entry for rate.

* is a lexeme that is mapped into the token *. 60 is a lexeme that is mapped into the token 60.

Blanks separating the lexemes would be discarded by the lexical analyzer.

After lexical analysis of the above assignment statement the sequence of tokens

generated are:

id, 1 = id, 2 + id, 3 * 60In this representation, the token names =, +, and * are abstract symbols for the

assignment, addition, and multiplication operators, respectively.

Syntax Analysis:

The second phase of the compiler is syntax analysis or parsing. The parser uses the first

components of the tokens produced by the lexical analyzer to create a tree-like

intermediate representation that depicts the grammatical structure of the token stream. A

typical representation is a syntax tree in which each interior node represents an operation

and the children of the node represent the arguments of the operation. A syntax tree for

the token stream is the output of the syntactic analyzer. The tree shows the order in which

the operations in the assignment


are to be performed. The tree has an interior node labeled * with id, 3 as its left child

and the integer 60 as its right child. The node id, 3 represents the identifier rate. The

node labeled * makes it explicit that we must first multiply the value of rate by 60. The

Lexical Analyzer

id, 1 = id, 2 + id, 3 * 60

Syntax Analyzer

=

id, 1 +

id, 2 *

id, 3 60

Semantic Analyzer

=

id, 1 +

id, 2 *

id, 3

60

inttofloat

Intermediate Code Generator


t1 = inttofloat (60)t2 = id3 * t1t3 = id2 + t2id1 = t3

Code Optimizer

t1 = id3 * 60.0id1 = id2 + t1

Code Generator

LDF R2, id3MULF R2, R2, #60.0LDF R1, id2ADDF R1, R1, R2STF id1, R1

position initial rate

123

node labeled + indicates that we must add the result of this multiplication to the value of

initial. The root of the tree, labeled =, indicates that we must store the result of this

addition into the location for the identifier position.

Semantic Analysis:

The semantic analyzer uses the syntax tree and the information in the symbol table

to check the source program for semantic consistency with the language definition. It also

gathers type information and saves it in either the syntax tree or the symbol table, for

subsequent use during intermediate-code generation.

An important part of semantic analysis is type checking, where the compiler

checks that each operator has matching operands. The language specification may permit

some type conversions called coercions. For example, a binary arithmetic operator may

be applied to either a pair of integers or to a pair of floating-point numbers. If the operator

is applied to a floating-point number and an integer, the compiler may convert or coerce

the integer into a floating-point number.

Intermediate Code Generation:

In the process of translating a source program into target code, a compiler may

construct one or more intermediate representations, which can have a variety of forms.

Syntax trees are a form of intermediate representation; they are commonly used during

syntax and semantic analysis.

Another intermediate form called three-address code, which consists of a

sequence of assembly-like instructions with three operands per instruction. Each operand

can act like a register. The output of the intermediate code generator for the assignment

statement given above consists of the three-address code sequence as follows:

t1 = inttofloat(60)

t2 = id3 * t1

t3 = id2 + t2

id1 = t3

Three-address instructions give the following information. First, each three-address

assignment instruction has at most one operator on the right side. Thus, these instructions

fix the order in which operations are to be done. Second, the compiler must generate a

temporary name to hold the value computed by a three-address instruction. Third, some

“three-address instruction” like the first and last in the sequence above, have fewer than

three operands.

Code Optimization:

The machine-independent code-optimization phase attempts to improve the

intermediate code so that better target code will result. Here better means faster, but other

objectives are also desired such as shorter code, or target code that consumes less power.

A simple intermediate code generation algorithm followed by code optimization is

a reasonable way to generate good target code. The optimizer can deduce that the

conversion of 60 from integer to floating point can be done once and for all at compile

time, so the inttofloat operation can be eliminated by replacing the integer 60 by the

floating-point number 60.0. Moreover, t3 is used only once to transmit its value to id1 so

the optimizer can transform into the shorter sequence

t1 = id3 * 60.0

id1 = id2 + t1

The compilers those do a lot of code optimizations are called “optimizing compilers” and

they spend a lot of time on this phase. There are simple optimizations that significantly

improve the running time of the target program without slowing down compilations too

much. There are two types of optimizations that are applied: machine-independent and

machine-dependent optimizations.

Code Generation:

The code generator takes as input an intermediate representation of the source

program and maps it into the target language. If the target language is machine code,

registers or memory locations are selected for each of the variables used by the program.

Then, the intermediate instructions are translated into sequences of machine instructions

that perform the same task. A crucial aspect of code generation is the judicious

assignment of registers to hold variables.

For example, using registers R1 and R2, the intermediate code generated by code

optimizer might get translated into the machine code as follows:

LDF R2, id3

MULF R2, R2, #60.0

LDF R1, id2

ADDF R1, R1, R2

STF id1, R1

The first operand of each instruction specifies a destination. The F in each

instruction tells us that it deals with floating-point numbers. The code given above loads

the contents of address id3 into register R2, then multiplies it with floating-point constant

60.0. The # signifies that 60.0 is to be treated as an immediate constant. The third

instruction moves id2 into register R1 and the fourth adds to it the value previously

computed in register R2. Finally, the value in register R1 is stored into the address of id1,

so the code correctly implements the assignment statement.

The above are different phases of the compiler and all the phases interact with the symbol

table. The analysis part stores different information in the table which are later used by

synthesis part.

Symbol-table Management:

An essential function of a compiler is to record the variable names used in the

source program and collect information about various attributes of each name. These

attributes may provide information about the storage allocated for a name, its type, its

scope (where in the program its value may be used), and in the case of procedure names,

such things as the number and types of its arguments, the method of passing each

argument(for example, by value or by reference), and the type returned.

The symbol table is a data structure containing a record for each variable name,

with fields for the attributes of the name. The data structure should be designed to allow

the compiler to find the record for each name quickly and to store or retrieve data from

that record quickly.

4)a) The regular expression is (a | b)*

The -NFA for the given regular expression is as below:

b) The regular expression is ab(a | b)*.

q1

q2

q4

q3

q5

q6q0q7

a

b


c) The regular expression is (a* | b*)*a

q1

q0

a b q3

q4

q6

q5

q7

q8q2q9

a

b


d) The regular expression is (a | b)*a (a | b)+

q0

q2

q6

q1

q9

q8

q7

q5

q4

q3

q10

q11

q12

a

a

b


or, this can be done as follows:

q0

q2

q4

q1

q9

q7

q5

q6

q3

q8

q13

a

b

a

a

b

q10

q11

q12

q17

q16

q15

q14

a b

q19

q18

q20

e) The regular expression is (a | b)+abb.

q0

q2

q4

q1

q9

q7

q5

q6

q3

q8

q13

a

b

a

a

b

q10

q11

q12


or, this can be done as follows:

q1

q3

q0

q8

q6

q4

q5

q2

q7

q12

a

b

a

a

b

q9

q10

q11

q14q13b

q15b

5) The given regular expression is (a | b)*abb. The -NFA for the given regular expression is given below:

q3

q1

q2

q7

a

ba

q4

q5

q6

q9q8b

q10b

q7

q1

q2

q4

q3

q5

q6q0

a

b

q8

q9

q10

a

b

b

Here the -closure of {q0}={q0, q1, q2, q4, q7}= A. Here all the states in A are equivalent. Now applying ‘a’ as the input to the states in A, we get {q3, q8}. Now to get all the equivalent states of {q3, q8}, we compute the -closure of {q3, q8}.-closure {q3, q8} = {q1, q2, q3, q4, q6, q7, q8} = B.Applying ‘b’ as the input to the states in A, we get {q5}.Now to get all the equivalent states of {q5}, we compute the -closure of {q5}.-closure {q5} = {q1, q2, q4, q5, q6, q7} = C.Similarly, applying ‘a’ to B, we get, {q3, q8}.-closure {q3, q8} = B.Applying ‘b’ to B, we get, {q5, q9}.-closure {q5, q9} = {q1, q2, q4, q5, q6, q7, q9} = D.Now applying ‘a’ to C, we get, {q3, q8}.-closure {q3, q8} = B.Applying ‘b’ to C, we get, {q5}.-closure {q5} = C.Now applying ‘a’ to D, we get, {q3, q8}.-closure {q3, q8} = B.Applying ‘b’ to D, we get, {q5, q10}.-closure {q5, q10} = {q1, q2, q4, q5, q6, q7, q10} = E.Now applying ‘a’ to E, we get, {q3, q8}.-closure {q3, q8} = B.Applying ‘b’ to E, we get, {q5}.-closure {q5} = C.Now the transition table for the DFA is shown belowState Input

a b A B C

B B DC B CD B EE* B C

Now let us minimize the above DFA to get an equivalent minimized DFA.Here the 0-equivalent classes will be given by Q1

0={E}, Q20={A, B, C, D}, which is a

set of all final states and a set of all non-final states. Hence the set of 0-equivalent classes is given by0={ Q1

0, Q20}.

Now for 1-equivalent classes we got,Q1

1={E}, Q21={A, B, C}, Q3

1={D}.Hence the set of 1-equivalent classes is given by1={ Q1

1, Q21, Q3

1}, where Q11, Q2

1, Q31 are given above.


2={E}, Q22={A, C}, Q3

2={B}, Q42={D}.

Hence the set of 2-equivalent classes is given by2={ Q1

2, Q22, Q3

2, Q42}, where Q1

2, Q22, Q3



3={E}, Q23={A, C}, Q3

3={B}, Q43={D}.

Hence the set of 3-equivalent classes is given by3={ Q1

3, Q23, Q3

3, Q43}, where Q1

3, Q23, Q3


We can see that 2=3. Hence 2 is the set of equivalence classes.

So, now the transition table for minimized DFA is given below.

State Input a b

A B AB B DD B EE* B A

The transition diagram for the DFA is shown below.

A B D

E

a

b

b

b

b

aa

a

6th Sem Cs CD Ct1 11 Solution

Documents

token id

b goto l3

b goto l2 goto l3 l2

symbol table entry

b goto l2 t1

b goto l1 l2

b goto l2 goto l3 t1

assignment symbol