COMPILER CONSTRUCTION WEEK- 4: INTRODUCTION TO COMPILER & INTERPRETER.

COMPILER CONSTRUCTION

WEEK- 4:

INTRODUCTION TO COMPILER & INTERPRETER

An Overview The main purpose of a compiler and interpreter is to

translate a program written in a high-level programming language like Pascal into a form that a computer can understand in order to execute the program.

In the context of this translation, the high-level language is called the source language.

A compiler translates a program written in the source language into a low-level object language, which can be the machine language of a particular computer.

The program that we write in the source language is called the source program, which we edit, in one or more source files.

An Overview The compiler translates each source file into an object

file. If the object files contains assembly language, we must

next run an assembler (another type of a program translator) to convert them into machine language.

We then run a utility program called a linker to combine the object files (along with any needed runtime library routines) into the object program.

Once created, an object program is a separate program in its own right.

We can load it into the computer’s memory and then execute it.

An Overview

For example, if we are programming in C, we would edit the source files and save them using names ending in .c or .cpp The C compiler from Borland or Microsoft then

translates each source file into a machine language object file, which it saves using a name ending in .obj

The linker combines the separate object files into the final object program, which is saved using a name ending in .exe or .com (what is

difference?) Then we can load and run the object program. Following figure summarizes the compiler

translation process:

An Overview

An OverviewOn the other hand interpreter does not produce an

object program.

It may translate the source program into an internal intermediate code that it can execute more efficiently, or it may simply execute the source program’s statements directly.

The net result is that an interpreter translates a program into the actions specified by the

program.

Interpreters are often used for the BASIC, LISP languages etc.

Following figure summarizes the interpreter translation process:

Note: a compiler may also first translate the source program into intermediate code, and then translate the intermediate code into object language.

Compiler Vs. Interpreters:

What an interpreter does with a source program is very similar to what we would do with the program

if we had to figure out what it does without using a computer.

For example, if we are handling a C program.First we look it over to check for syntax errors.We then locate the start of the main program, and from there we execute the statements one at a time by hand. We might use a pencil and scratch pad to keep track

of the values of the variables (manual debug).If we encounter the statement i=j+k; in the program

Compiler Vs. Interpreters:We would look up the current values of j and k on our

scratch pad, add the values, and write down the sum as the new value for i.

An Interpreter essentially does what we just did. It is itself a program that runs on the computer. A BASIC interpreter reads in a BASIC source

program, looks it over for syntax errors, and executes the source statements one at a time.

Using some of its own variables as a scratch pad, the interpreter keeps track of the values of the source program’s variables.

On the other hand compiler is also a program that runs on the computer.

Compiler Vs. Interpreters:

A BASIC compiler read in a BASIC source program and checks it for syntax errors.

But then, instead of executing the source program, it translates the source program into the object program.

The compiler generates a machine language object program; the output is more cryptic.

There is discussion going on which one is better, a compiler or interpreter?

To execute a source program with an interpreter, we simply feed the source program into the

interpreter, and it takes over to check and execute the program.

Advantages & Disadvantages of Compilers and Interpreters:

A compiler, however, checks the source program and then produces an object program.

After running the compiler, we may need to run the linker, and then we have to load the object program into memory in order to execute it.

So, an interpreter definitely has advantages over a compiler when it comes to the effort required to execute a source program.

Interpreter can be more versatile than compiler.

Remember that they are themselves programs, and like any other programs, they can be made to run on different computers.


One can write a Pascal or C interpreter that runs on both an IBM PC and an Apple Macintosh, so that it will execute Pascal or C source program on either computer.

A compiler, however, generated object programs for a particular computer.

Therefore, even if we took a Pascal or C compiler originally written for the PC and make it run on the Mac, it would still generate object program for the PC not for the Mac.

To make the compiler generate object program for the Mac, we would have to rewrite substantial portions of the compiler.


What happened if the source program contains a “logical error” that doesn’t show up until runtime, such as an attempt to divide by variable whose value is zero?

Since an interpreter is in control when it is executing the source program, it can stop and tell us the line number of the offending statement and the name of the variable.

It can even prompt us for some corrective action (like changing the value of the variable) before resuming execution.

The object program generated by a compiler, on the other hand, usually runs by itself.

Information from the source program, such as line number and names of variable, might not be present in the object program.


When a runtime error occurs, the program may simply abort and perhaps print a message containing the address of the bad instruction.

Then it’s up to us to figure out which source statement that address corresponds to, and which variable was zero.

When it comes to the debugging, an interpreter is generally theway to go.

However, many modern program development environments now give compilers debugging capabilities that are almost as good as those of interpreters. (hybrid approach, e.g. VB etc.)

We compile the program and run it under the control of the environment.


If runtime error occurs, we are given the information and control we need to correct the error.

Then we can resume the execution of the program, or compile and run it again.

Such compilers usually generate extra information or instructions in the object program to keep the environment informed of the current state of the program’s execution.

This often caused the object program to be less efficient than it otherwise could be.

Most people turn off the debugging features when they are about to generate the final “production” version of their program.

Suppose we have successfully debugged our program, and now our most important concern is how fast it executes.


Remember that an interpreter executes the statement of the source program pretty much the way we would be by hand.

Each time it executes a statement, it looks it over to figure out what operations the statement says to do.

With a compiler, the computer executes a machine-language program, generated either directly by the compiler or indirectly

with an assembler.

Since a computer executes a machine language program at top speed, such a program can run 10 to 100 times faster than the interpreted source program.


A compiler is definitely the winner when it comes to the speed.

This is certainly true in the case of an optimizing compiler that knows how to generate especially efficient code.

So we see that compilers and interpreters have advantages and disadvantages.

It depends on what aspect of program development and execution we consider.

A compromise may be to have both a compiler and an interpreter for the same source language.

Then we have the best of both worlds, easy development and fast execution.

Model of a Compiler:

Compiler can be described in a modular fashion.

The task of constructing a compiler for a particular source language is complex.

The complexity and nature of the compilation process depend, to a large extent, on the source language.

Compiler complexity can often be reduced if a programming language designer takes various design factors into consideration.

Since we are dealing with high-level source language such as PASCAL and C.

Such a model is given in following figure.

Although this model may vary for the compilation of different high-level languages, it is nevertheless representative of the compilation process.


A compiler must perform two major tasks: the analysis of a source program and the synthesis of its corresponding object program.

The analysis task deals with the decomposition of the source program into its basic parts.

Using these parts, the synthesis task builds their equivalent object program modules.

The performance of these tasks is realized more easily by building and maintaining several tables.


A source program is a string of symbols each of which is generally a letter, a digit, or certain special symbols such as +, - and ( , ).

A source program contains elementary language constructs such as variable names, labels, constants, keywords, and operators.

It is therefore desirable for the compiler to identify these various types as classes.

These language constructs are given in the definition of the language.

The source program is input to a lexical analyzer or scanner whose purpose is to separate the incoming text into pieces or token such as constants, variable names, keywords (do, if and for etc), and operators (+, -, etc).


In essence, the lexical analyzer performs low-level syntax analysis.

For efficiency reasons, each class of tokens is given a unique internal representation number.

For example, a variable name may be given a representation number of 1, a constant a value of 2, a label the number 3, the addition operator (+) a value of 4 etc.

For example in C:

TEST: if a > b

x=y ;

Would be translated by the lexical analyzer into the following sequence of token:


TEST 3

: 26

if 20

a 1

> 15

b 1

x 1

= 10

y 1

; 27


Note that in scanning the source statement and generating the representation number of each token we have ignored spaces (or blanks) in the statement.

The lexical analyzer must, in general, process blanks and comments.

Certain programming languages allow the continuation of statements over multiple lines.

Lexical analyzers must then handle the input processing of such multiple-line statements.

Also, some scanners place constants, labels, and variable names in appropriate tables.


A table entry for a variable, for example, may contain its name, type (i.e. int, float etc), object program address, value, and line in which it is declared.

The lexical analyzer supplies tokens to the syntax analyzer.

These tokens may take the from of a pair of items.

The first item gives the address or location of the token in some symbol table.

The second item is the representation number of the token.

Such an approach offers a distinct advantage to the syntax analyzer; namely, all token are represented by fixed-length information: an

address (or pointer) and an integer.


The syntax analyzer is much more complex than the lexical analyzer.

Its function is to take the source program (in the form of tokens) from lexical analyzer and determine the manner in which it is to be

decomposed into its constituent parts.

In syntax analysis we are concerned with grouping tokens into larger syntactic classes such as expression, statement, and procedure.

The syntax analyzer (or parser) outputs a syntax tree (or its equivalent) in which its leaves are the tokens and every non-leaf node represents a syntactic class type.


The syntax tree produces by the syntax analyzer is used by the semantic analyzer.

The function of the semantic analyzer is to determine the meaning (or semantics) of the source program.

The semantic analyzer actions may involve the generation of an intermediate form of source code.

For the expression (A + B) * (C + D), the intermediate source code might be the following set of quadruples:

(+, A, B, T1)

(+, C, D, T2)

(*, T1, T2, T3)

Where (+, A, B, T1) is interpreted to mean “add A and B and place the result in temporary T1” and so on.


An infix expression may be converted to an intermediate form called Polish Notation (Assignment)

The output of the semantic analyzer is passed on to the code generator.

An this point the intermediate form of the source language program is usually translated to either assembly language or machine language.


Above expression will be in assembly like:

LDA A

ADD B

STO T1

LDA C

ADD D

STO T2

LDA T1

MUL T2

STO T3


The topic of code generation is passed on to a code optimizer.

This process is present in more sophisticated compilers.

Its purpose is to produce a more efficient object program.


COMPILER CONSTRUCTION WEEK- 4: INTRODUCTION TO COMPILER & INTERPRETER.

Documents

program slide

c program

basic source program

separate program

main program

program translator

utility program

final object program