Silicon Institute of Technology Class Test – I (6 th Sem. B. Tech- CS), 2011 Sub : Compiler Design Time – 60 mins. Max. Marks – 10 Date : 17.02.11 (Answer any Four including Q.1) Q.1. Short type (any Five) [0.5 x 5 = 2.5] a) Differentiate between Compiler and Interpreter. b) Define regular expression. c) Draw a parse tree for the following statement: IF (5 .EQ. MAX) GOTO 100 d) What do you mean by ‘Common Sub-expression Elimination’ in code optimization? e) What is a cross compiler and its advantage? Q.2. Consider the following while statement: [2.5] While A > B && A <= 2 * B – 5 do A = A + B Identify the Tokens. Generate the Parse tree, intermediate code and optimized code. Q.3. Discuss the different phases of a compiler with suitable examples. [2.5] Q.4. Using Thomson Construction Rule, construct -NFA for the following regular expressions. [2.5]
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Silicon Institute of TechnologyClass Test – I (6th Sem. B. Tech- CS), 2011
Sub : Compiler Design
Time – 60 mins. Max. Marks – 10 Date : 17.02.11
(Answer any Four including Q.1)
Q.1. Short type (any Five) [0.5 x 5 = 2.5]
a) Differentiate between Compiler and Interpreter.
b) Define regular expression.
c) Draw a parse tree for the following statement:
IF (5 .EQ. MAX) GOTO 100
d) What do you mean by ‘Common Sub-expression Elimination’ in code
optimization?
e) What is a cross compiler and its advantage?
Q.2. Consider the following while statement: [2.5]
While A > B && A <= 2 * B – 5 do
A = A + B
Identify the Tokens. Generate the Parse tree, intermediate code and optimized
code.
Q.3. Discuss the different phases of a compiler with suitable examples. [2.5]
Q.4. Using Thomson Construction Rule, construct -NFA for the following regular
expressions. [2.5]
a) (a b)* b) ab (a b)* c) (a* b*)* a d) (a b)* a (a b)+
e) (a b)+ abb
Q5. Give an equivalent DFA for the regular expression (a|b)*abb. [2.5]
Good Luck
Solution
to
Sub – Compiler DesignClass Test – I, 2011
1)a) Compiler and Interpreter both are translators for high level languages. But they
have differences which are given below.
Compiler
i. Compiler translates the whole source code at a time and produces a list of errors.
ii. After removal of all errors the compiler produces the object code for the given source code.
iii. Compiler is faster.iv. Compiler generates an object
code of the source code.v. The generated object code is
stored in permanent storage device
vi. The object code is used for execution of the program without any need of the source code.
Interpreter
i. Interpreter translates the source code line-by-line and halts when an error is found.
ii. After removal of the error the interpreter moves with the translation of the next line of the source code.
iii. Interpreter is slower.iv. Interpreter generates an
executable code of the source code.
v. The generated executable code is stored in temporary storage device i.e. primary memory.
vi. The source code is always needed for execution of the program as the executable code is lost after the execution.
b) Regular Expression: Regular expressions over can be defined recursively as follows:
1. Any terminal symbol (i.e. an element of ), and are regular expressions.
2. The union of two regular expressions R1 and R2, written as R1 + R2, is also a regular expression.
3. The concatenation of two regular expressions R1 and R2, written as R1 R2, is also a regular expression.
4. The iteration (or closure) of a regular expression R, written as R*, is also a regular expression.
5. If R is a regular expression, then (R) is also a regular expression.6. A recursive application of the rules 1-5 once or several times results into a
regular expression.c) The statement is IF (5 .EQ. MAX) GOTO 100. The parse tree for the statement is
given below.
d) Consider the following two assignment statements,
A B + C + D
E B + C + F
We can have an equivalent set of assignment statements as follows:
T1 B + C
A T1 + D
E T1 + F
statement
if-statement
if ( conditional ) non-if-statement
relation
expression relational-op expression
goto-statement
goto
const (5) eq id (Max)
label (100)
Here the number of addition operation has been reduced in the second set of
statements. This has been achieved by taking the advantage of common sub-
expression B + C. The common subexpression has been eliminated from the
different assignment statements. This process of removal of common
subexpression is known as common subexpression elimination.
e) A compiler may run on one machine and produce object code for another
machine. Such a compiler is often called a cross-compiler. Suppose we have
written a compiler (where is a compiler for language L, written in
language L which produces object code in language B) and we have a compiler
. Now to bootstrap a compiler we have to compile through
to get . Then again is compiled through to provide .
This has been shown in the following steps:
In the above process, the compiler is a cross compiler. The cross compiler
helps us to bootstrap a compiler for another machine with less effort. We do not
have to undergo the difficulty of writing the compiler from very scratch.
2) The given while statement is while A > B & A ≤ 2*B – 5 do
A := A + B;The list of tokens present in the given string are:
node labeled + indicates that we must add the result of this multiplication to the value of
initial. The root of the tree, labeled =, indicates that we must store the result of this
addition into the location for the identifier position.
Semantic Analysis:
The semantic analyzer uses the syntax tree and the information in the symbol table
to check the source program for semantic consistency with the language definition. It also
gathers type information and saves it in either the syntax tree or the symbol table, for
subsequent use during intermediate-code generation.
An important part of semantic analysis is type checking, where the compiler
checks that each operator has matching operands. The language specification may permit
some type conversions called coercions. For example, a binary arithmetic operator may
be applied to either a pair of integers or to a pair of floating-point numbers. If the operator
is applied to a floating-point number and an integer, the compiler may convert or coerce
the integer into a floating-point number.
Intermediate Code Generation:
In the process of translating a source program into target code, a compiler may
construct one or more intermediate representations, which can have a variety of forms.
Syntax trees are a form of intermediate representation; they are commonly used during
syntax and semantic analysis.
Another intermediate form called three-address code, which consists of a
sequence of assembly-like instructions with three operands per instruction. Each operand
can act like a register. The output of the intermediate code generator for the assignment
statement given above consists of the three-address code sequence as follows:
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
Three-address instructions give the following information. First, each three-address
assignment instruction has at most one operator on the right side. Thus, these instructions
fix the order in which operations are to be done. Second, the compiler must generate a
temporary name to hold the value computed by a three-address instruction. Third, some
“three-address instruction” like the first and last in the sequence above, have fewer than
three operands.
Code Optimization:
The machine-independent code-optimization phase attempts to improve the
intermediate code so that better target code will result. Here better means faster, but other
objectives are also desired such as shorter code, or target code that consumes less power.
A simple intermediate code generation algorithm followed by code optimization is
a reasonable way to generate good target code. The optimizer can deduce that the
conversion of 60 from integer to floating point can be done once and for all at compile
time, so the inttofloat operation can be eliminated by replacing the integer 60 by the
floating-point number 60.0. Moreover, t3 is used only once to transmit its value to id1 so
the optimizer can transform into the shorter sequence
t1 = id3 * 60.0
id1 = id2 + t1
The compilers those do a lot of code optimizations are called “optimizing compilers” and
they spend a lot of time on this phase. There are simple optimizations that significantly
improve the running time of the target program without slowing down compilations too
much. There are two types of optimizations that are applied: machine-independent and
machine-dependent optimizations.
Code Generation:
The code generator takes as input an intermediate representation of the source
program and maps it into the target language. If the target language is machine code,
registers or memory locations are selected for each of the variables used by the program.
Then, the intermediate instructions are translated into sequences of machine instructions
that perform the same task. A crucial aspect of code generation is the judicious
assignment of registers to hold variables.
For example, using registers R1 and R2, the intermediate code generated by code
optimizer might get translated into the machine code as follows:
LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1
The first operand of each instruction specifies a destination. The F in each
instruction tells us that it deals with floating-point numbers. The code given above loads
the contents of address id3 into register R2, then multiplies it with floating-point constant
60.0. The # signifies that 60.0 is to be treated as an immediate constant. The third
instruction moves id2 into register R1 and the fourth adds to it the value previously
computed in register R2. Finally, the value in register R1 is stored into the address of id1,
so the code correctly implements the assignment statement.
The above are different phases of the compiler and all the phases interact with the symbol
table. The analysis part stores different information in the table which are later used by
synthesis part.
Symbol-table Management:
An essential function of a compiler is to record the variable names used in the
source program and collect information about various attributes of each name. These
attributes may provide information about the storage allocated for a name, its type, its
scope (where in the program its value may be used), and in the case of procedure names,
such things as the number and types of its arguments, the method of passing each
argument(for example, by value or by reference), and the type returned.
The symbol table is a data structure containing a record for each variable name,
with fields for the attributes of the name. The data structure should be designed to allow
the compiler to find the record for each name quickly and to store or retrieve data from
that record quickly.
4)a) The regular expression is (a | b)*
The -NFA for the given regular expression is as below:
b) The regular expression is ab(a | b)*.
q1
q2
q4
q3
q5
q6q0q7
a
b
The -NFA for the given regular expression is as below:
c) The regular expression is (a* | b*)*a
q1
q0
a b q3
q4
q6
q5
q7
q8q2q9
a
b
The -NFA for the given regular expression is as below:
d) The regular expression is (a | b)*a (a | b)+
q0
q2
q6
q1
q9
q8
q7
q5
q4
q3
q10
q11
q12
a
a
b
The -NFA for the given regular expression is as below:
or, this can be done as follows:
q0
q2
q4
q1
q9
q7
q5
q6
q3
q8
q13
a
b
a
a
b
q10
q11
q12
q17
q16
q15
q14
a b
q19
q18
q20
e) The regular expression is (a | b)+abb.
q0
q2
q4
q1
q9
q7
q5
q6
q3
q8
q13
a
b
a
a
b
q10
q11
q12
The -NFA for the given regular expression is as below:
or, this can be done as follows:
q1
q3
q0
q8
q6
q4
q5
q2
q7
q12
a
b
a
a
b
q9
q10
q11
q14q13b
q15b
5) The given regular expression is (a | b)*abb. The -NFA for the given regular expression is given below:
q3
q1
q2
q7
a
ba
q4
q5
q6
q9q8b
q10b
q7
q1
q2
q4
q3
q5
q6q0
a
b
q8
q9
q10
a
b
b
Here the -closure of {q0}={q0, q1, q2, q4, q7}= A. Here all the states in A are equivalent. Now applying ‘a’ as the input to the states in A, we get {q3, q8}. Now to get all the equivalent states of {q3, q8}, we compute the -closure of {q3, q8}.-closure {q3, q8} = {q1, q2, q3, q4, q6, q7, q8} = B.Applying ‘b’ as the input to the states in A, we get {q5}.Now to get all the equivalent states of {q5}, we compute the -closure of {q5}.-closure {q5} = {q1, q2, q4, q5, q6, q7} = C.Similarly, applying ‘a’ to B, we get, {q3, q8}.-closure {q3, q8} = B.Applying ‘b’ to B, we get, {q5, q9}.-closure {q5, q9} = {q1, q2, q4, q5, q6, q7, q9} = D.Now applying ‘a’ to C, we get, {q3, q8}.-closure {q3, q8} = B.Applying ‘b’ to C, we get, {q5}.-closure {q5} = C.Now applying ‘a’ to D, we get, {q3, q8}.-closure {q3, q8} = B.Applying ‘b’ to D, we get, {q5, q10}.-closure {q5, q10} = {q1, q2, q4, q5, q6, q7, q10} = E.Now applying ‘a’ to E, we get, {q3, q8}.-closure {q3, q8} = B.Applying ‘b’ to E, we get, {q5}.-closure {q5} = C.Now the transition table for the DFA is shown belowState Input
a b A B C
B B DC B CD B EE* B C
Now let us minimize the above DFA to get an equivalent minimized DFA.Here the 0-equivalent classes will be given by Q1
0={E}, Q20={A, B, C, D}, which is a
set of all final states and a set of all non-final states. Hence the set of 0-equivalent classes is given by0={ Q1
0, Q20}.
Now for 1-equivalent classes we got,Q1
1={E}, Q21={A, B, C}, Q3
1={D}.Hence the set of 1-equivalent classes is given by1={ Q1
1, Q21, Q3
1}, where Q11, Q2
1, Q31 are given above.
Now for 2-equivalent classes we got,Q1
2={E}, Q22={A, C}, Q3
2={B}, Q42={D}.
Hence the set of 2-equivalent classes is given by2={ Q1
2, Q22, Q3
2, Q42}, where Q1
2, Q22, Q3
2, Q42 are given above.
Now for 3-equivalent classes we got,Q1
3={E}, Q23={A, C}, Q3
3={B}, Q43={D}.
Hence the set of 3-equivalent classes is given by3={ Q1
3, Q23, Q3
3, Q43}, where Q1
3, Q23, Q3
3, Q43 are given above.
We can see that 2=3. Hence 2 is the set of equivalence classes.
So, now the transition table for minimized DFA is given below.
State Input a b
A B AB B DD B EE* B A
The transition diagram for the DFA is shown below.