Principles of Programming Languages Full Note

Principles of Programming Languages

Page 1

Unit-1

Introduction:

A programming language is an artificial language designed to communicate instructions to a

machine, particularly a computer. Programming languages can be used to create programs that

control the behavior of a machine and/or to express algorithms precisely.

The earliest programming languages predate the invention of the computer, and were used to

direct the behavior of machines such as Jacquard looms and player pianos. Thousands of

different programming languages have been created, mainly in the computer field, with many

more being created every year. Most programming languages describe computation in an

imperative style, i.e., as a sequence of commands, although some languages, such as those that

support functional programming or logic programming, use alternative forms of description.

The description of a programming language is usually split into the two components of syntax

(form) and semantics (meaning). Some languages are defined by a specification document (for

example, the C programming language is specified by an ISO Standard), while other languages,

such as Perl 5 and earlier, have a dominant implementation that is used as a reference.

1. What is a name in programming languages? A name is a mnemonic character string used to represent something else.

Names are a central feature of all programming languages. In the earliest programs, numbers

were used for all purposes, including machine addresses. Replacing numbers by symbolic names

was one of the first major improvements to program notation.

e.g. variables, constants, executable code (methods, procedures, subroutines, functions, even

whole programs), data types, classes, etc.

In general, names are of two different types:

1. Special symbols: +, -, *

2. Identifiers: sequences of alphanumeric characters (in most cases beginning with a letter), plus

in many cases a few special characters such as '_' or '$'.

2. What is binding? A binding is an association between two things, such as a name and the thing it names

E.g. the binding of a class name to a class or a variables name to a variable

Static and Dynamic binding:

A binding is static if it first occurs before run time and remains unchanged throughout program

execution.

A binding is dynamic if it first occurs during execution or can change during execution of the

program.

3. What is binding time? Binding time is the time when an association is established.

In programming, a name may have several attributes, and they may be bound at different times.


Page 2

Example1:

int n;

n = 6;

first line binds the type int to n and the second line binds the value 6 to n. The first binding

occurs when the program is compiled. The second binding occurs when the program is executed.

Examlpe2:

void f()

{

int n=7;

printf("%d", n);

}void main ()

{ int k;

scanf("%d", &k);

if (k>0)

f();

}

In FORTRAN, addresses are bound to variable names at compile time. The result is that, in the

compiled code, variables are addressed directly, without any indexing or other address

calculations. (In reality, the process is somewhat more complicated. The compiler assigns an

address relative to a compilation unit. When the program is linked, the address of the unit within

the program is added to this address. When the program is loaded, the address of the program is

added to the address. The important point is that, by the time execution begins, the absolute address of the variable is known.)

FORTRAN is efficient, because absolute addressing is used. It is inflexible, because all

addresses are assigned at load time. This leads to wasted space, because all local variables

occupy space whether or not they are being used, and also prevents the use of direct or indirect

recursion.

Early and Late Binding:

Early binding - efficiency

Late binding -flexibility

4. Explain about different times at which decisions may be bound? Or

Explain different types of binding times?

In the context of programming languages, there are quite a few alternatives for binding time.

1. Language design time 2. Language implementation time 3. Program writing time 4. Compile time 5. Link time


Page 3

6. Load time 7. Run time

1. Language design time:

In most languages, the control flow constructs the set of fundamental types, the available

constructors for creating complex types, and many other aspects of language semantics

are chosen when the language is designed.

Or

During the design of the programming languages, the programmers decides what symbols

should be used to represent operations

e.g. .Binding of operator symbols to operations

( * + ) (Multiplication, addition )

2. Language Implementation Time: Most language manuals leave a variety of issues to the discretion of the language

implementor. Typical examples include the precision of the fundamental types, the

coupling of I/O to the operating systems notion of files, the organization and maximum

sizes of stack and heap and the handling of run-time exceptions such as arithmetic

overflow.

Or

During language implementation programmers decides what should be the range of

values given to a data type

Bind data type, such as int in C to the range of possible values

Example: the C language does not specify the range of values for the type int. Implementations

on early microcomputers typically used 16 bits for ints, yielding a range of values from -32768

to +32767. On early large computers, and on all computers today, C implementations typically

use 32 bits for int.

3. Program writing time: Programmers of course, choose algorithms, data structures and names

Or

While writing programs, programmers bind certain names with procedure, class etc

Example: many names are bound to specific meanings when a person writes a program

4. Compile time: The time when a single compilation unit is compiled, while compiling the type of a

variable can be identified

Example: int c; [at compile time int, c forms an association]


Page 4

5. Link time: The time when all the compilation units comprising a single program are linked as the

final step in building the program

[The separate modules of a single program will be bound only at link time]

6. Load time: Load time refers to the point at which the operating system loads the program into

memory so that it can run

7. Run time: Run time is actually a very broad term that covers the entire span from the beginning to

the end of execution. If we give the value of a variable during runtime, it is known as

runtime binding

Ex. Printf (Enter the value of X); Scanf (%f, &X);

5. What is Scope?

The textual region of the program in which a binding is active is its scope.

The scope of a name binding is the portion of the text of the source program in which that

binding is in effect - i.e. the name can be used to refer to the corresponding object. Using a name

outside the scope of a particular binding implies one of two things: either it is undefined, or it

refers to a different binding. In C++, for example, the scope of a local variable starts at the

declaration of the variable and ends at the end of the block in which the declaration appears.

Scope is a static property of a name that is determined by the semantics of the PL and the text of

the program.

Example:

class Foo

{

private int n;

void foo() {

// 1

}

void bar() {

int m,n;

...

// 2

}

...


Page 5

A reference to m at point 1 is undefined

A reference to n at point 1 refers to an instance variable of Foo;

6. Describe the difference between static and dynamic scoping(scope rules)

The scope rules (static, dynamic) of a language determine how references to names are

associated with variables

Static Scoping:

In a language with static scoping, the bindings between names and objects can be

determined at compile time by examining the text of the program, without consideration of the

flow of control at run time.

Scope rules are somewhat more complex in FORTRAN, though not much more. FORTRAN

distinguishes between global and local variables. The scope of a local variable is limited to the

subroutine in which it appears; it is not visible elsewhere. Variable declarations are optional. If a

variable is not declared, it is assumed to be local to the current subroutine and to be of type

integer if its name begins with the letters I-N, or real otherwise.

Global variables in FORTRAN may be partitioned into common blocks, which are then imported

by subroutines. Common blocks are designed for separate compilation: they allow a subroutine

to import only the sets of variables it needs. Unfortunately, FORTRAN requires each subroutine

to declare the names and types of the variables in each of the common blocks it uses, and there is

no standard mechanism to ensure that the common blocks it uses, and there is no standard

mechanism to ensure that the declarations in different subroutines are the same.

Nested scopes- Many programming languages allow scopes to be nested inside each other.

Example: Java actually allows classes to be defined inside classes or even inside methods, which

permits multiple scopes to be nested.

class Outer

{

int v1; // 1

void methodO()

{

float v2; // 2

class Middle

{

char v3; // 3

void methodM()


Page 6

{

boolean v4; // 4

class Inner

{

double v5; // 5

void methodI()

{

String v6; // 6

}

}

}

}

}

}

The scope of the binding for v1 the whole program

The scope of the binding for v2 is methodO and all of classes

Middle and Inner, including their methods

The scope of the binding for v3 is all of classes Middle and

Inner, including their methods

The scope of the binding for v4 is methodM and all of class

Inner, including its method

The scope of the binding for v5 is all of class Inner, including

its method

The scope of the binding for v6 is just methodI

Some programming languages - including Pascal and its descendants (e.g. Ada) allow

procedures to be nested inside procedures. (C and its descendants do not allow this)

Declaration order

- A field or method declared in a Java class can be used anywhere in the class, even before its

declaration.

- A local variable declared in a method cannot be used before the point of its declaration

Example: class Demo

{

public void method()

{

// Point 1

int y;

}

private int x;

}


Page 7

The instance variable x can be used at Point 1, but not y

Example: Java

class Demo

{

public void method1()

{

...

method2();

...

}

public void method2()

{

...

method1();

...

}

}

Example: C/C++:

void method2(); // Incomplete declaration

void method1();

{

...

method2();

...

}

void method2() // Definition completes the abov

{

...

method1();

...

}


Page 8

Dynamic Scoping

In a language with dynamic scoping, the bindings between names and objects

depend on the flow of control at run time, and in particular on the order in which subroutines are

called.

Dynamic scope rules are generally quite simple: the current binding for a given name is the one

encountered most recently during execution, and not yet destroyed by returning from its scope.

Languages with dynamic scoping include APL, Snobol, Perl etc. Because the flow of control

cannot in general be predicted in advance, the binding between names and objects in a language

with dynamic scoping cannot in general be determined by a compiler. As a result, many semantic

rules in a language with dynamic scoping become a matter of dynamic semantics rather than

static semantics.

Ex. Procedure Big is

X:integer;

Procedure sub1 is

X:integer

begin.end Procedure sub2 is

begin. X;

end;

begin------

end;

In dynamic scoping the X inside sub2 may refer either to the X in Big or X in sub2 based on the

calling sequence of procedures

7. What is Local Scope? In block-structured languages, names declared in a block are local to the block. In Algol 60 and

C++, local scopes can be as small as the programmer needs. Any statement context can be

instantiated by a block that contains local variable declarations. In other languages, such as

Pascal, local declarations can be used only in certain places. Although a few people do not like

the fine control offered by Algol 60 and C++, it seems best to provide the programmer with as

much freedom as possible and to keep scopes as small as possible.

Local scope in C++

{

// x not accessible.

t x;

// x accessible

. . . .

{ // x still accessible in inner block

. . . .

}// x still accessible

}// x not accessible


Page 9

8. What is Global Scope? The name is visible throughout the program. Global scope is useful for pervasive entities, such as

library functions and fundamental constants (_ = 3.1415926) but is best avoided for application

variables.

FORTRAN does not have global variables, although programmers simulate them by over using

COMMON declarations. Subroutine names in FORTRAN are global.

Names declared at the beginning of the outermost block of Algol 60 and Pascal programs have

global scope. (There may be holes in these global scopes if the program contains local declarations of the same names.)

9. Explain about Object Lifetime? The word "lifetime" is used in two slightly different ways

1. To refer to the lifetime of an object

2. The lifetime of the binding of a name to an object.

a. An object can exist before the binding of a particular name to it

Example (Java):

void something(Object o) {

// 2

}

....

Object p = new Object()

// 1

....

something(p);

....

// 3

(The object named by p exists at point 1, but the name o is not

bound to it until point 2)

b. An object can exist after the binding of a particular name to it has ceased

Example:

void something(Object o) {

// 2

}

....

Object p = new Object()


Page 10

// 1

....

something(p);

....

// 3

(The object named by p continues to exist at point 3, even though the binding of the name

o to it ended when method something() completed)

c. A name can be bound to an object that does not yet exist

Example (Java)

Object o;

// 1

....

o = new Object(); // 2

(At point 1, the name o is bound to an object that does not come into existence until point

2)

d. A name can be bound to an object that has actually ceased to exist

Example (C++ - not possible in Java)

Object o = new Object();

...

delete o; // 1

...

// 2

At 2, the name o is bound to an object that has ceased to exist.

(The technical name for this is a dangling reference).

e. It is also possible for an object to exist without having any name at all bound to it.

Example (Java or C++)

Object o = new Object();

// 1

...

o = new Object();

// 2

At point 2, the object that o was bound to at point 1 still exists, but o no longer is bound

to it. In the absence of any other name bindings to it between points 1 and 2, this object

now has no name referring to it. (The technical name for this is garbage).


Page 11

10. Storage Allocation Mechanisms?

Static (permanent) Allocation:- The object exists during the entire time the program is

running.

a. Global variables

- The precise way of declaring global variables differs from language to language

- In some languages (e.g. BASIC) any variable

- In FORTRAN any variable declared in the main program or in

a COMMON block

- In Java, any class field explicitly declared static

- In C/C++, any variable declared outside of a class, or (C++)

declared static inside a class

b. Static local variables - only available in some languages

C/C++ local variables explicitly declared static

int foo() {

static int i;

...

A static local variable retains its value from one call of a routine to another

c. Many constants (but not constants that are actually read-only variables)

Advantages: efficiency (direct addressing),history-sensitive subprogram support (static

variables retain values between calls ofsubroutines).

Disadvantage: lack of flexibility (does notsupport recursion)

Stack Based Allocation:- Storage bindings are created for variables when their declaration

statements are elaborated

Typically, the local variables and parameters of a method have stack lifetime. This name comes

from the normal way of implementing routines, regardless of language.

Since routines obey a LIFO call return discipline, they can be managed by using a stack

composed of stack frames - one for each currently active routine.

Example:

void d() { /* 1 */ }


Page 12

void c() { ... d() ... }

void b() { ... c() ... }

void a() { ... b() ... }

int main() { ... a() ... }

Stack at point 1:

-------------------

| Frame for d |


Page 13

The shaded blocks are in use, the clear blocks are free. Cross hatched space at the ends of in use

blocks represents internal fragmentation. The discontiguous free blocks indicate external

fragmentation.

Internal fragmentation occurs when a storage management algorithm allocates a block that is

larger than required to hold a given object, the extra space is then unused. External fragmentation

occurs when the blocks that have been assigned to active objects are scattered through the heap

in such a way that the remaining, unused space is composed of multiple blocks: there may be

quite a lot of free space, but no one piece of it may be large enough to satisfy some future

request.

As the program runs, Heap space grows as objects are created. However, to prevent growth

without limit, there must be some mechanism for recycling the storage used by objects that are

no longer alive.

A language implementation typically uses one of three approaches to "recycling" space used by

objects that are no longer alive:

Explicit - the program is responsible for releasing space needed by objects that are no

longer needed using some construct such as delete (C++)

Reference counting: each heap object maintains a count of the number of external

pointers/references to it. When this count drops to zero, the space utilized by the object

can be reallocated

Garbage collection: The process of deallocating the memory given to a variable is known

as garbage collection. The system can work in proper order only by proper garbage

collection. C, C++ does not do garbage collection implicitly but java has implicit garbage

collector.

Advantage: Provides for dynamic storage management

Disadvantage: Inefficient (instead of static) and unreliable

11. What are internal and external fragmentations? A heap is a region of storage in which subblocks can be allocated and deallocated at arbitrary

times. Heaps are required for the dynamically allocated pieces of linked data structures, and for

object like fully general character strings, lists, and sets, whose size may change as a result of an

assignment statement or other update operation.

Heap


Page 14

Allocation Req.

The shaded blocks are in use, the clear blocks are free. Cross hatched space at the ends of in use

blocks represents internal fragmentation. The discontiguous free blocks indicate external

fragmentation.

Internal fragmentation occurs when a storage management algorithm allocates a block that is

larger than required to hold a given object, the extra space is then unused. External fragmentation

occurs when the blocks that have been assigned to active objects are scattered through the heap

in such a way that the remaining, unused space is composed of multiple blocks: there may be

quite a lot of free space, but no one piece of it may be large enough to satisfy some future

request.

12. What is garbage collection? The process of deallocating the memory given to a variable is known as garbage collection (GC).

The system can work in proper order only by proper garbage collection. C, C++ does not do

garbage collection implicitly but java has implicit garbage collector.

The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by

objects that are no longer in use by the program. Garbage collection was invented by John

McCarthy around 1959 to solve problems in Lisp.

Garbage collection does not traditionally manage limited resources other than memory that

typical programs use, such as network sockets, database handles, user interaction windows, and

file and device descriptors. Methods used to manage such resources, particularly destructors,

may suffice as well to manage memory, leaving no need for GC. Some GC systems allow such

other resources to be associated with a region of memory that, when collected, causes the other

resource to be reclaimed; this is called finalization. Finalization may introduce complications

limiting its usability, such as intolerable latency between disuse and reclaim of especially limited

resources, or a lack of control over which thread performs the work of reclaiming.

13. What is Aliasing? Two or more names that refer to the same object at the same point in the program are said to be

aliases.

1. Aliases can be created by assignment of pointers/references

Example: Java: Robot karel = ...

Robot foo = karel;

foo and karel are aliases for the same Robot object

2. Aliases can be created by passing reference parameters

Example: C++

void something(int a [], int & b)

{

// 1


Page 15

...

}

int x [100];

int y;

something(x, y);

After the call to something(x, y), at point 1 x and a are aliases for the same array

14. What is Overloading? A name is said to be overloaded if, in some scope, it has two or more meanings, with the actual

meaning being determined by how it is used.

Example: C++

void something(char x)

...

void something(double x)

...

// 1

At point 1, something can refer to one or the other of the two methods, depending

on its parameter.

15. What is Polymorphism? Polymorphism is the concept that supports the capability of an object of a class to behave

differently in response to a message or action.

1. Compile-time (static) polymorphism: the meaning of a name is determined at compile-time

from the declared types of what it uses

2. Run-time (dynamic) polymorphism: the meaning of a name is determined when the program is

running from the actual types of what it uses.

Example: Java has both types in different contexts:

a. When a name is overloaded in a single class, static polymorphism is used to determine which

declaration is meant.

b. When a name is overridden in a subclass, dynamic polymorphism is used to determine which

version to use.


Page 16

16. What is control flow? The order in which operations are executed in a program

e.g. in C++ like language,

a = 1;

b = a + 1;

if a > 100 then b = a - 1; else b = a + 1;

a - b + c

17. Name eight major categories of control flow mechanisms? a. Sequencing: - Statements are to be executed in a certain specified order- usually the order

in which they appear in the program text.

b. Selection: - Depending on some run time condition, a choice is to be made among two or more statements or expressions. The most common selection condition are if and case

statements. Selection is also sometimes referred to as alternation.

c. Iteration: - A given fragment of code is to be executed repeatedly, either a certain number of times, or until a certain run- time condition is true. Iteration constructs include for/do,

while, and repeat loops.

d. Procedural abstraction: - A potentially complex collection of control constructs is encapsulated in a way that allows it to be treated as a single unit, usually subject to

parameterization.

e. Recursion: - An expression is defined in terms of itself, either directly or indirectly; the computational model requires a stack on which to save information about partially

evaluated instances of the expression. Recursion is usually defined by means of self-

referential subroutines.

f. Concurrency:- Two or more program fragments are to be executed/evaluated at the same time, either in parallel on separate processors, or interleaved on a single processor in a

way that achieves the same effect.

g. Exception handling and speculation:- A program fragment is executed optimistically, on the assumption that some expected condition will be true. If that condition turns out to be

false, execution branches to a handler that executes in place of the remainder of the

protected fragment or in place of the entire protected fragment. For speculation, the

language implementation must be able to undo, or roll back any visible effects of the

protected code.

h. Nondeterminacy: - The ordering or choice among statements or expressions is deliberately left unspecified, implying that any alternative will lead to correct results.

Some languages require the choice to be random, or fair, in some formal sense of the

word.

18. What distinguishes operators from other sort of functions? An expression generally consists of either a simple object or an operator or function applied to a

collection of operands or arguments, each of which in turn is an expression. It is conventional to

use the term operator for built in functions that use special, simple syntax, and to use the term

operand for an argument of an operator. In most imperative languages, function call consists of a

function name followed by a parenthesized, comma-separated list of arguments, as in


Page 17

my_func (A, B, C)

Operators are typically simpler, taking only one or two arguments, and dispensing with the

parentheses and commas:

a + b

-c

In general, a language may specify that function calls employ prefix, infix, or postfix notation.

These terms indicate, respectively, whether the function name appears before, among, or after its

several arguments:

Prefix: op a b

Infix: a op b

Postfix: a b op

19. Explain the difference between prefix, infix, and postfix notation. What is Cambridge polish notation?

Most imperative languages use infix notation for binary operators and prefix notation for unary

operators and other functions. Lisp uses prefix notation for all notation for all functions.

Cambridge polish notation places the function name inside the parentheses:

( * ( + 1 3) 2)

(append a b c my_list)

20. What is an L- value? An r-value? Consider the following assignments in C:

d= a;

a= b+c;

In the first statement, the right- hand side of the assignment refers to the value of a, which we

wish to place into d. In the second statement, the left hand side refers to the location of a, where

we want to put the sum of b and c. Both interpretations-value and location-are possible because a

variable in C is a named container for a value. We sometimes say that languages like C use a

value model of variables. Because of their use on the left hand side of assignment statements,

expressions that denote locations are referred to as L-values. Expressions that denote values are

referred to as r- values. Under a value model of variables, a given expression can be either an L-

value or an r-value, depending on the context in which it appears.

Of course, not all expressions can be L-values, because not all values have a location, and not all

names are variables. In most languages it makes no sense to say 2+3 =a, or even a= 2+3, if a is

the name of a constant. By the same token, not all L-values are simple names; both L-values and

r-values can be complicated expressions. In C one may write

(f (a) +3) -> b [c]=2;

In this expression f (a) returns a pointer to some element of an array of pointers to structures. The

assignment places the value 2 into the c-th element of field b of the structure pointed at by the

third array element after the one to which fs return value points. In C++ it is even possible for a function to return a reference to a structure, rather than a pointer

to it , allowing one to write

g(a) . b[c]= 2;


Page 18

21. Define orthogonality in the context of programming language design? One of the principal design goals of Algol 68 was to make the various features of the languages

as orthogonal as possible. Orthogonality means that features can be used in any combination, the

combinations all make sense, and the meaning of a given feature is consistent, regardless of the

other features with which it is combined.

Algol 68 was one of the first languages to make orthogonality a principal design goal, and in fact

few languages since have given the goal such weight. Among other things, Algol 68 is said to be

expression oriented: it has no separate notion of statement. Arbitrary expressions can appear in

contexts that would call for a statement in a language like Pascal, and constructs that are

considered to be statements in other languages can appear within expressions. The following, for

example is valid in Algol 68:

begin

a:=if b < c then d else e;

a:= begin f(b); g(c) end;

g(d);

2+3

End

22. Expression Evaluation

An expression consists of

o A simple object, e.g. number or variable

o An operator applied to a collection of operands or arguments which are

expressions

Common syntactic forms for operators:

o Function call notation, e.g. somefunc(A, B, C), where A, B, and C are expressions

o Infix notation for binary operators, e.g. A + B

o Prefix notation for unary operators, e.g. -A

o Postfix notation for unary operators, e.g. i++

o Cambridge Polish notation, e.g. (* (+ 1 3) 2) in Lisp =(1+3)*2=8

Expression Evaluation Ordering: Precedence and Associativity:-

Precedence rules specify that certain operators, in the absence of parentheses, group more

tightly than other operators. In most languages multiplication and division group more tightly

than addition and subtraction, so 2+3*4 is 14 and not 20.

In Java all binary operators except assignments are left associative

3 - 3 + 5

x = y = f()

(assignments evaluate to the value being assigned)

In C++ arithmetic operators (+, -, *, ...) have higher precedence than relational operators (


Page 19

The use of infix, prefix, and postfix notation leads to ambiguity as to what is an operand

of what, e.g. a+b*c**d**e/f in Fortran

The choice among alternative evaluation orders depends on

o Operator precedence: higher operator precedence means that a (collection of)

operator(s) group more tightly in an expression than operators of lower

precedence

o Operator associativity: determines evaluation order of operators of the same

precedence

left associative: operators are evaluatated left-to-right (most common)

right associative: operators are evaluated right-to-left (Fortran power

operator **, C assignment operator = and unary minus)

non-associative: requires parenthesis when composed (Ada power

operator **)

Evaluation Order in Expressions:

Precedence and associativity define rules for structuring expressions

But do not define operand evaluation order

o Expression a-f(b)-c*d is structured as (a-f(b))-(c*d) by compiler, but either (a-

f(b)) or (c*d) can be evaluated first at run-time

Knowing the operand evaluation order is important

o Side effects: e.g. if f(b) above modifies d (i.e. f(b) has a side effect) the expression

value will depend on the operand evaluation order

o Code improvement: compilers rearrange expressions to maximize efficiency

Improve memory loads:

a:=B[i]; load a from memory

c:=2*a+3*d; compute 3*d first, because waiting for a to arrive in

processor

Common subexpression elimination:

a:=b+c;

d:=c+e+b; rearranged as d:=b+c+e, it can be rewritten into d:=a+e

Expression Reordering Problems

Rearranging expressions may lead to arithmetic overflow or different floating point

results

o Assume b, d, and c are very large positive integers, then if b-c+d is rearranged

into (b+d)-c arithmetic overflow occurs

o Floating point value of b-c+d may differ from b+d-c

o Most programming languages will not rearrange expressions when parenthesis are

used, e.g. write (b-c)+d to avoid problems

Java: expressions evaluation is always left to right and overflow is always detected

Pascal: expression evaluation is unspecified and overflows are always detected

C and C++: expression evaluation is unspecified and overflow detection is

implementation dependent


Page 20

Short-Circuit Evaluation

Short-circuit evaluation of Boolean expressions means that computations are skipped

when logical result of a Boolean operator can be determined from the evaluation of one

operand

C, C++, and Java use conditional and/or operators: && and ||

o If a in a&&b evaluates to false, b is not evaluated

o If a in a||b evaluates ot true, b is not evaluated

o Avoids the Pascal problem

o Useful to increase program efficiency, e.g.

if (unlikely_condition && expensive_condition()) ...

Pascal does not use short-circuit evaluation

o The program fragment below has the problem that element a[11] can be accessed

resulting in a dynamic semantic error:

o var a:array [1..10] of integer;

...

i:=1;

while i


Page 21

o Compiler produces better code, because the address of a variable is only

calculated once

Multiway assignments in Clu, ML, and Perl

o a,b := c,d assigns c to a and d to b simultaneously, e.g. a,b := b,a swaps a with b

a,b := 1 assigns 1 to both a and b

23. What is short circuit Boolean evaluation? Why is it useful?

Short-circuit evaluation of Boolean expressions means that computations are skipped when

logical result of a Boolean operator can be determined from the evaluation of one operand

C, C++, and Java use conditional and/or operators: && and ||

a. If a in a&&b evaluates to false, b is not evaluated b. If a in a||b evaluates ot true, b is not evaluated c. Avoids the Pascal problem d. Useful to increase program efficiency, e.g.

if (unlikely_condition && expensive_condition()) ...

Pascal does not use short-circuit evaluation

e. The program fragment below has the problem that element a[11] can be accessed resulting in a dynamic semantic error:

f. var a:array [1..10] of integer; ...

i:=1;

while i


Page 22

Iteration: for and while loop statements

Subroutine calls and recursion

Sequencing One statement appearing after another

- A list of statements in a program text is executed in top-down order - A compound statement is a delimited list of statements

o A compund statement is a block when it includes variable declarations

o C, C++, and Java use { and } to delimit a block

o Pascal and Modula use begin ... end

o Ada uses declare ... begin ... end

- C, C++, and Java: expressions can be used where statements can appear

In pure functional languages, sequencing is impossible (and not desired!)

Selection Selects which statements to execute next

Forms of if-then-else selection statements:

o C and C++ EBNF syntax:

if () [else ]

Condition is integer-valued expression. When it evaluates to 0, the else-clause

statement is executed otherwise the then-clause statement is executed. If more

than one statement is used in a clause, grouping with { and } is required

o Java syntax is like C/C++, but condition is Boolean type

o Ada syntax allows use of multiple elsif's to define nested conditions:

if then

elsif then

elsif then

...

else

end if


Page 23

Case/switch statements are different from if-then-else statements in that an expression

can be tested against multiple constants to select statement(s) in one of the arms of the

case statement:

o C, C++, and Java syntax:

switch ()

{ case : break;

case : break;

...

default:

}

o break is necessary to transfer control at the end of an arm to the end of the switch

statement

The use of a switch statement is much more efficient compared to nested if-then-else statements

Iteration

Iteration means the act of repeating a process usually with the aim of approaching a desired goal

or target or result. Each repetition of the process is also called an "iteration," and the results of

one iteration are used as the starting point for the next iteration.

A conditional that keeps executing as long as the condition is true

e.g: while, for, loop, repeat-until, ...

Iteration and recursion are the two mechanisms that allow a computer to perform similar

operations repeatedly. Without at least one of these mechanisms, the running time of a program

would be a linear function of the size of the program text. In a very real sense, it is iteration and

recursion that make computer useful.

Enumeration-Controlled Loops

Enumeration controlled iteration originated with the do loop of Fortran I. Similar mechanisms have been

adopted in some form by almost every subsequent language, but syntax and semantics vary widely.

Fortran-IV:

DO 20 i = 1, 10, 2

...

20 CONTINUE

which is defined to be equivalent to

i = 1

20 ...

i = i + 2

IF i.LE.10 GOTO 20

Algol-60 combines logical conditions:


Page 24

o for := do

where the EBNF syntax of is

-> [, enumerator]*

->

| step until

| while

Difficult to understand and too many forms that behave the same:

for i := 1, 3, 5, 7, 9 do ...

for i := 1 step 2 until 10 do ...

for i := 1, i+2 while i < 10 do ...

Pascal has simple design:

o for := to do

for := downto do

o Can iterate over any discrete type, e.g. integers, chars, elements of a set

o Index variable cannot be assigned and its terminal value is undefined

Ada for loop is much like Pascal's:

o for in .. loop

end loop for in reverse .. loop

end loop o Index variable has a local scope in loop body, cannot be assigned, and is not

accessible outside of the loop

C, C++, and Java do not have enumeration-controlled loops although the logically-

controlled for statement can be used to create an enumeration-controlled loop:

o for (i = 1; i


Page 25

Problems with Enumeration-Controlled Loops:

C/C++:

o This C program never terminates:

#include

main()

{ int i;

for (i = 0; i


Page 26

Logically-Controlled Post test Loops:

Logically-controlled post test loops test an exit condition after each loop iteration

Not available in Fortran-77 (!)

Pascal:

o repeat [; ]* until

where the condition is a Boolean expression and the loop will terminate when the

condition is true

C, C++:

o do while ()

where the loop will terminate when the expression evaluates to 0 and multiple

statements need to be enclosed in { and }

Java is like C++, but condition is a Boolean expression

Logically-Controlled Mid test Loops:

Logically-controlled mid test loops test exit conditions within the loop

Ada:

o loop

exit when ;

exit when ;

...

end loop o Also allows exit of outer loops using labels:

outer: loop

... for i in 1..n loop

... exit outer when cond;

...

end loop;

end outer loop;

C, C++:

o Use break statement to exit loops

o Use continue to jump to beginning of loop to start next iteration

Java is like C++, but combines Ada's loop label idea to allow jumps to outer loops


Page 27

Recursion When a function may directly or indirectly call itself

Can be used instead of loops Functional languages frequently have no loops but only recursion

Iteration and recursion are equally powerful: iteration can be expressed by recursion and

vice versa

Recursion can be less efficient, but most compilers for functional languages will optimize

recursion and are often able to replace it with iterations

Recursion can be more elegant to use to solve a problem that is recursively defined

Tail Recursive Functions

Tail recursive functions are functions in which no computations follow a recursive call in

the function

A recursive call could in principle reuse the subroutine's frame on the run-time stack and

avoid deallocation of old frame and allocation of new frame

This observation is the key idea to tail-recursion optimization in which a compiler

replaces recursive calls by jumps to the beginning of the function

For the gcd example, a good compiler will optimize the function into:

int gcd(int a, int b)

{ start:

if (a==b) return a;

else if (a>b) { a = a-b; goto start; }

else { b = b-a; goto start; }

}

which is just as efficient as the iterative implementation of gcd:

int gcd(int a, int b)

{ while (a!=b)

if (a>b) a = a-b;

else b = b-a;

return a;

}

Continuation-Passing-Style:

Even functions that are not tail-recursive can be optimized by compilers for functional

languages by using continuation-passing style:

o With each recursive call an argument is included in the call that is a reference

(continuation function) to the remaining work


Page 28

The remaining work will be done by the recursively called function, not after the call, so the

function appears to be tail-recursive

Other Recursive Function Optimizations

Another function optimization that can be applied by hand is to remove the work after the

recursive call and include it in some other form as an argument to the recursive call

For example:

typedef int (*int_func)(int);

int summation(int_func f, int low, int high)

{ if (low==high) return f(low)

else return f(low)+summation(f, low+1, high);

}

can be rewritten into the tail-recursive form:

int summation(int_func f, intlow, int high, int subtotal)

{ if (low==high) return subtotal+f(low)

else return summation(f, low+1, high, subtotal+f(low));

}

This example in Scheme:

(define summation (lambda (f low high)

(if (= low high) ;condition

(f low) ;then part

(+ (f low) (summation f (+ low 1) high))))) ;else-part rewritten:

(define summation (lambda (f low high subtotal)

(if (=low high)

(+ subtotal (f low))

(summation f (+ low 1) high (+ subtotal (f low))))))

Nondeterminacy: Our final category of control flow is nondeterminacy. A nondeterministic construct is one in

which the choice between alternatives is deliberately unspecified. Some languages, notably

Algol 68 and various concurrent languages, provide more extensive nondeterministic

mechanisms, which cover statements as well.

25. Data Types

Most programming languages require the programmer to declare the data type of every data

object, and most database systems require the user to specify the type of each data field. The

available data types vary from one programming language to another, and from one database

application to another, but the following usually exist in one form or another:


Page 29

integer : In more common parlance, whole number; a number that has no fractional

part.

floating-point : A number with a decimal point. For example, 3 is an integer, but 3.5

is a floating-point number.

character (text ): Readable text

26. What purpose do types serve in a programming language? a. Types provide implicit context for many operations, so that the programmer does

not have to specify that context explicitly. In C, for instance, the expression a+b

will use integer addition if a and b are of integer type, it will use floating point

addition if a and b are of double type.

b. Types limit the set of operations that may be performed in a semantically valid

program. They prevent the programmer from adding a character and a record, for

example, or from taking the arctangent of a set, or passing a file as a parameter to

a subroutine that expects an integer.

27. Discuss about Type Systems

A type system consist of (1) a mechanism to define types and associate them with certain

language constructs, and (2) a set of rules for type equivalence, type compatibility, and type

inference. The constructs that must have types are precisely those that have values or that can

refer to objects that have values. These constructs include named constants, variables, record

field, parameters, and sometimes subroutines; literal constants. Type equivalence rules determine

(a) when the types of two values are the same. Type compatibility rules determine (a) when a

value of a given type can be used in a given context. Type inference rules define the type of an

expression based on the types of its constituent parts or the surrounding context.

Type checking:-

Type checking is the process of ensuring that a program obeys the languages type compatibility

rules. A violation of the rules is known as a type clash. A language is said to be strongly typed if

it prohibits, in a way that the language implementation can enforce, the application of any

operation to any object that is not intended to support that operation. A language is said to be

statically typed if it is strongly typed and type checking can be performed at compile time.

Ex: Ada is strongly typed and for the most part statically typed. A Pascal implementation can

also do most of its type checking at compile time, though the language is not quite strongly

typed: untagged variant records are its only loophole.

Dynamic type checking is a form of late binding, and tends to be found in languages that delay

other issues until run time as well. Lisp and smalltalk are dynamically typed. Most scripting

languages are also dynamically typed; some (Python, Ruby) are strongly typed.


Page 30

Classification of Types:-

The terminology for types varies some from one language to another. Most languages provide

built in types similar to those supported in hardware by most processors: integers, characters,

Boolean, and real (floating point) numbers.

Booleans are typically implemented as single byte quantities with 1 representing true and 0

representing false.

Characters have traditionally been implemented as one byte quantities as well, typically using the

ASCII encoding. More recent languages use a two byte representation designed to accommodate

the Unicode character set.

Numeric Types:-

A few languages (C, Fortran) distinguish between different lengths of integers and real numbers;

most do not, and leave the choice of precision to the implementation. Unfortunately, differences

in precision across language implementations lead to a lack of portability: programs that run

correctly on one system may produce run-time errors or erroneous results on another.

A few languages, including C,C++,C# and Modula-2,provide both signed and unsigned integers.

A few languages (Fortran,C99,Common Lisp) provide a built in complex type, usually

implemented as a pair of floating point numbers that represent the real and imaginary Cartesian

coordinates; other languages support these as a standard library class. Most scripting languages

support integers of arbitrary precision. Ada supports fixed point types, which are represented

internally by integers. Integers, Booleans, characters are examples of discrete types.

Enumeration Types:-

Enumerations were introduced by Wirth in the design of Pascal. They facilitate the creation of

readable programs, and allow the compiler to catch certain kinds of programming errors. An

enumeration type consists of a set of named elements. In Pascal one can write:

Type weekday=(sun, mon, tue, ordered, so comparisons are generally valid(mon


Page 31

In Ada one would write

Type test_score is new integer range 0..100;

Subtype workday is weekday range mon..fri;

The range portion of the definition in Ada is called a type constraint. test_score is a derived type,incompatible with integers.

The workday can be more or less freely intermixed.

Composite Types:-

Nonscalar types are usually called composite, or constructed types. They are generally created by

applying a type constructor to one or more simpler types. Common composite types include

records, variant records, arrays, sets, pointers, lists, and files.

Records-Introduced by Cobol, and have been supported by most languages since the 1960s.A record consists of collection of fields, each of which belongs to a

simpler type.

Variant records-It differs from normal records in that only one of a variant records field is valid at any given time.

Arrays-Are the most commonly used composite types. An array can be thought of as a function that maps members of an index type to members of a component

type.

Sets-Introduced by Pascal. A set type is the mathematical powerset of its base type, which must often be discrete.

Pointers-A pointer value is a reference to an object of the pointers base type. Pointers are often but not always implemented as addresses. They are most often

used to implement recursive data types

Lists-Contain a sequence of elements, but there is no notion of mapping or indexing. Rather, a list is defined recursively as either an empty list or a pair

consisting of a head element and a reference to a sublist. To find a given element

of a list ,a program must examine all previous elements, recursively or iteratively,

starting at the head. Because of their recursive definition, lists are fundamental to

programming in most functional languages.

Files-Are intended to represent data on mass storage devices, outside the memory in which other program objects reside.

28. Discuss about Type checking

In most statically typed languages, every definition of an object must specify the objects type.

Type equivalence:-

In a language in which the user can define new types, there are two principal ways of defining

type equivalence. Structural equivalence is based on the content of type definitions. Name

equivalence is based on the lexical occurrence of type definitions. Structural equivalence is used

in Algol-68,Modula-3,C.Name equivalence is the more popular approach in recent languages. It

is used in Java, C#, standard Pascal and Ada


Page 32

The exact definition of structural equivalence varies from one language to another.

Structural equivalence in Pascal:

Type R2=record

a,b : integer

end;

should probably be considered the same as

type R3 = record

a : integer;

b : integer

end;

But what about

Type R4 = record

b : integer;

a : integer

end;

should the reversal of the order of the fields change the type? Most languages say yes.

In a similar vein,consider the following arrays,again in a Pascal like notation:

type str = array [1..10] of char; type str = array [09] of char;

Here the length of the array is the same in both cases, but the index values are different. Should

these be considered equivalent? Most languages say no, but some (Fortran, Ada) consider them

compatible.

Type compatibility:-

Most languages do not require equivalence of types in every context. Instead, they merely say

that a values type must be compatible with that of the context in which it appears. In an

assignment statement, the type of the right hand side must be compatible with that of the left-

hand side. The types of the operands of + must both be compatible with some common type that

supports addition .In a subroutine call, the types of any arguments passed into the subroutine

must be compatible with the types of the corresponding formal parameters, and the types of any

formal parameters passed back to the caller must be compatible with the types of the

corresponding arguments.

Coercion:-

Whenever a language allows a value of one type to be used in a context that expects another, the

language implementation must perform an automatic, implicit conversion to the expected type.

This conversion is called a type coercion.


Page 33

29. Discuss about Records(Structures) and Variants(Unions)

Record types allow related data of heterogeneous types to be stored and manipulated together.

Some languages (Algol 68, C,C++,Common Lisp) use the term structure instead of record.

Fortran 90 simply calls its records types. Structures in C++ are defined as a special form of class.

Syntax and Operations:-

In c a simple record might be defined as follows.

struct element {

char name[2];

int atomic_number;

double atomic_weight;

_Bool metallic;

};

In Pascal the corresponding declarations would be

Type two_chars = packed array [1..2] of char;

Type element = record

name : two_chars;

atomic_number : integer;

atomic_weight : real;

metallic : Boolean

end;

Memory layout and its impact:

The fields of a record are usually stored in adjacent locations in memory. In its symbol table, the

compiler keeps track of the offset of each field within each record type. When it needs to access

a field, the compiler typically generates a load or store instruction with displacement addressing.

For a local object, the base register is the frame pointer; the displacement is the sum of the

records offset from the register and the fields offset within the record.

Variant Records (Unions):

Programming language of the 1960s and 1970s were designed in an era of severe memory

constraints. Many allowed the programmer to specify that certain variables should be allocated

on top of one another, sharing the same bytes in memory. Cs syntax heavily influenced by Algol 68.

Union {

int i;

double d;

_Bool b;

};


Page 34

In practice, unions have been used for two main purposes. The first arises in systems programs,

where unions allow the same set of bytes to be interpreted in different ways at different times.

The canonical example occurs in memory management, where storage may sometimes be treated

as unallocated space, sometimes as bookkeeping information, and sometimes as user allocated

data of arbitrary type.

The second common purpose for unions is to represent alternative sets of fields within a record.

A record representing an employee, for example, might have several common fields (name,

address, phone, department) and various other fields such as salaried, hourly or consulting basis.

30. Briefly describe two purposes for unions/ variant records Programming language of the 1960s and 1970s were designed in an era of severe memory

constraints. Many allowed the programmer to specify that certain variables should be allocated

on top of one another, sharing the same bytes in memory. Cs syntax heavily influenced by Algol 68.

union {

int i;

double d;

_Bool b;

};

In practice, unions have been used for two main purposes. The first arises in systems programs,

where unions allow the same set of bytes to be interpreted in different ways at different times.

The canonical example occurs in memory management, where storage may sometimes be treated

as unallocated space, sometimes as bookkeeping information, and sometimes as user allocated

data of arbitrary type.

The second common purpose for unions is to represent alternative sets of fields within a record.

A record representing an employee, for example, might have several common fields (name,

address, phone, department) and various other fields such as salaried, hourly or consulting basis.

31. What is an Array? Arrays are the most common and important composite data types. They have been a fundamental

part of almost every high-level language, beginning with FortranI. Unlike records, which group

related fields of disparate types, arrays are usually homogeneous. Semantically, they can be

thought of as a mapping from an index type to a component or element type.

Syntax and operations:

Most languages refer to an element of an array by appending a subscript delimited by

parentheses or square brackets-to the name of the array. In Fortran and Ada ,one says A(3);in

Pascal and C, one says A[3].Since parentheses are generally used to delimit the arguments to a

subroutine call, square bracket subscript notation has the advantage of distinguishing between

the two. The difference in notation makes a program easier to compile and arguably easier to

read.

Declarations:

In some languages one declares an array by appending subscript notation to the syntax that

would be used to declare a scalar. In C:


Page 35

Char upper[26];

In Fortran:

character, dimension (1:26)::upper

character (26) upper //shorthand notation

In C the lower bound of an index range is always zero; the indices of an n-element array are

0n-1. In Fortran the lower bound of the index range is one by default.

32. What is an Array Slices A slice or section is a rectangular portion of an array. Fortran 90 and single assignment C

provide extensive facilities for slicing, as do many scripting languages including Perl, Python,

Ruby and R. A slice is simply a contiguous range of elements in a one- dimensional array.

Fortran 90 has a very rich set of array operations: built in operations that take entire arrays as

arguments. Because Fortran uses structural type equivalence, the operands of an array operator

need have the same element type and shape. In particular, slices of the same shape can be

intermixed in array operations, even if the arrays from which they were sliced have very different

shapes. Any of the built in arithmetic operators will take arrays as operands; the result is an

array,of the same shape as the operands, whose elements are the result of applying the operator

to corresponding elements.

Array slices (sections) in Fortran 90.

[ a : b : c in a subscript indicates positions a, a + c, . . . through b. If a or b is omitted, the

corresponding bound is assumed. If c is omitted, 1 is assumed. If c is negative, then we select

positions in reverse order. The slashes in the second subscript of the lower-right example delimit

an explicit list of positions.]


Page 36

33. How arrays are allocated? Storage management is more complex for arrays whose shape is not known until elaboration

time, or whose shape may change during execution. For these the compiler must arrange not only

to allocate space, but also to make shape information available at run time. Some dynamically

typed languages allow run time binding of both the number and bounds of dimensions. Compiled

languages may allow the bounds to be dynamic, but typically require the number of dimensions

to be static. A local array whose shape is known at elaboration time may still be allocated in the

stack. An array whose size may change during execution must generally be allocated in the heap.

Global lifetime, static shape: allocate space for the array in static global memory

Local lifetime, static shape: space can be allocated in the subroutines stack frame at run time

Local lifetime, shape bound at elaboration time: an extra level of indirection is required to place the space for the array in the stack frame of its subroutine (Ada, C)

Arbitrary lifetime, shape bound at elaboration time: at elaboration time either space is allocated or a preexistent reference from another array is assigned (Java, C#)

Arbitrary lifetime, dynamic shape: must generally be allocated from the heap. A pointer to the array still resides in the fixed-size portion of the stack frame (if local lifetime).

Allocation in Ada of local arrays whose shape is bound at elaboration time

//Ada:

Procedure foo (size : integer ) is

M : array (1size, 1..size) of real; .. begin

.. end foo;

//C99:

void foo (int size)

{

double M[size][size];

}


Page 37

[The compiler arranges for a pointer to M to reside at a static offset from the frame pointer. M

cannot be placed among the other local variables because it would prevent those higher in the

frame from having static offsets.]

34. Discuss about Memory Layout of Arrays Arrays in most language implementations are stored in contiguous locations in memory. In a one

dimensional array the second element of the array is stored immediately after the first; the third

is stored immediately after the second, and so forth. For arrays of records, it is common for each

subsequent element to be aligned at an address appropriate for any type; small holes between

consecutive records may result.

For multidimensional arrays, there are two layouts: row-major order and column-major order

In row-major order, consecutive locations in memory hold elements that differ by one in the final subscript (except at the ends of rows).

In column-major order, consecutive locations hold elements that differ by one in the initial subscript


Page 38

[ In row major order, the elements of a row are contiguous in memory; in column-major order,

the elements of a column are contiguous. The second cache line of each array is shaded, on the

assumption that each element is an eight-byte floating point number, that cache lines are 32 bytes

long and that the array begins at a cache line boundary. If the array is indexed from A[0,0] to

A[9,9],then in the row major case elements A[0,4] through A[0,7] share a cache line; in the

column-major case elements A[4,0] through A[7,0] share a cache line.]

Row-Pointer Layout:

Allow the rows of an array to lie anywhere in memory, and create an auxiliary array of pointers to the rows.

Technically speaking, only the contiguous layout is a true multidimensional array

This row-pointer memory layout requires more space in most cases but has three potential advantages.

It sometimes allows individual elements of the array to be accessed more quickly, especially on CISC machines with slow multiplication instructions

It allows the rows to have different lengths, without devoting space to holes at the ends of the rows; the lack of holes may sometimes offset the increased space for

pointers

It allows a program to construct an array from preexisting rows (possibly scattered throughout memory) without copying

C, C++, and C# provide both contiguous and row-pointer organizations for multidimensional arrays

Java uses the row-pointer layout for all arrays

Row-Pointer Layout in C:

char days [ ][10]={

Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday };


Page 39

. days [2] [3] = =s; /*in Tuesday */

[ It is a true two dimensional array. The slashed boxes are NUL bytes; the shaded areas are

holes.]

char *days [ ] = {

Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday };

.. days [2] [3] = = s; /* in Tuesday */

[ It is a ragged array of pointers to arrays of characters.]


Page 40

35. What is a dope vector? What purposes does it serve? A dope vector will contain the lower bound of each dimension and the size of each dimension

other than the last. If the language implementation performs dynamic semantic checks for out of

bounds subscripts in array references, then the dope vector may contain upper bounds as well.

Given lower bounds and sizes, the upper bound information is redundant, but it is usually

included anyway, to avoid computing it repeatedly at run time.

The contents of the dope vector are initialized at elaboration time, or whenever the number or

bounds of dimensions change. In a language like Fortran 90, whose notion of shape includes

dimension sizes but not lower bounds, an assignment statement may need to copy not only the

data of an array, but dope vector contents as well.

In a language that provides both a value model of variables and arrays of dynamic shape, we

must consider the possibility that a record will contain a field whose size is not statically known.

In this case the compiler may use dope vectors not only for dynamic shape arrays, but also for

dynamic shape records. The dope vector for a record typically indicates the offset of each field

from the beginning of the record.

36. Discuss about String representations in programming languages In many languages, a string is simply an array of characters. In other languages, strings

have special status, with operations that are not available for arrays of other sorts.

It is easier to provide special features for strings than for arrays in general because strings are one-dimensional

Manipulation of variable-length strings is fundamental to a huge number of computer applications

Particularly powerful string facilities are found in various scripting languages such as Perl, Python and Ruby.

C, Pascal, and Ada require that the length of a string-valued variable be bound no later than elaboration time, allowing contiguous space allocation in the current stack frame

Lisp, Icon, ML, Java, C# allow the length of a string-valued variable to change over its lifetime, requiring that space be allocated by a block or chain of blocks in the heap

37. Sets

A set is an unordered collection of an arbitrary number of distinct values of a common type

Introduced by Pascal, and are found in many more recent languages as well. Pascal supports sets of any discrete type, and provides union, intersection, and difference

operations:

var A,B,C :set of char;

D,E : set of weekday;

. A := B + C;

A := B * C;


Page 41

A := B - C;

Many ways to implement sets, including arrays, hash tables, and various forms of trees The most common implementation employs a bit vector whose length (in bits) is the

number of distinct values of the base type

Operations on bit-vector sets can make use of fast logical instructions on most machines. Union is bit-wise or; intersection is bit-wise and; difference is bit-wise not, followed by

bit-wise and

38. Discuss the tradeoffs between Pointers and the Recursive Types that arise naturally in a language with a reference model of variables.

A recursive type is one whose objects may contain one or more references to other objects of the

type. Most recursive types are records, since they need to contain something in addition to the

reference, implying the existence of heterogeneous fields. Recursive types are used to build a

wide variety of linked data structures, including lists and trees.

In some languages (Pascal, Ada 83,Modula-3) pointers are restricted to point only to objects in

the heap. The only way to create a new pointer value is to call a built-in function that allocates a

new object in the heap and returns a pointer to it. In other languages ( PL/I, Algol 68, C, C++,

Ada 95) one can create a pointer to a nonheap object by using an address of operator.

Syntax and Operations:-

Operations on pointers include allocation and deallocation of objects in the heap, dereferencing

of pointers to access the objects to which they point, and assignment of one pointer into another.

The behavior of these operations depends heavily on whether the language is functional or

imperative and on whether it employs a reference or value model for variables.

Functional languages generally employ a reference model for names. Objects in a functional

language tend to be allocated automatically as needed, with a structure determined by the

language implementation. Variables in an imperative language may use either a value or a

reference model, or some combination of the two. In C, Pascal, or Ada which employ a value

model, the assignment A: = B puts the value of B into A. If we want B to refer to an object and

we want A: = B to make A refer to the object to which B refers, then A and B must be pointers.

Reference Model:

In Lisp, which uses a reference model of variables but is not statically typed, tree could be

specified textually as (# \ R (# \X ( ) ( ) ) ( # \ Y (# \ Z ( ) ( ) ) (# \ W ( ) ( ) ))).


Page 42

[Implementation of a tree in Lisp, A diagonal slash through a box indicates a null pointer. The C

and A tags serve to distinguish the two kinds of memory blocks:cons cells and blocks containing

atoms ]

In Pascal tree data types would be declared as follows:

type chr_tree_ptr = ^chr_tree;

chr_tree = record

left,right : chr_tree_ptr;

val : char

end;

In Ada:

type chr_tree;

type chr_tree_ptr is access chr_tree;

type chr_tree is record

left,right : chr_tree_ptr;

val : character;

end record;

In C:

struct chr_tree

{

struct chr_tree * left, *right;

char val;

};

39. What are Dangling References? How are they created? When a heap allocated object is no longer live, a long running program needs to reclaim the

objects space. Stack objects are reclaimed automatically as part of the subroutine calling

sequence. There are two alternatives to reclaime heap objects. Languages like Pascal, C, and

C++ require the programmer to reclaim an object explicitly.

In Pascal:

dispose(my_ptr);

In C:

free (my_ptr);

In C++:

delete my_ptr;

A dangling reference is a live pointer that no longer points to a valid object.

Dangling reference to a stack variable in C++:

int i=3;

int *p = &i;


Page 43

. void foo( )

{

int n=5;

p=&n;

}

.. cout


Page 44

[ Reference counts and circular lists]

41. Summarize the differences among mark- and sweep, stop- and-copy, pointer reversal and generational garbage collection.

1). Mark and-Sweep: - The classic mechanism to identify useless blocks. It proceeds in there main steps, executed by the garbage collector when the amount of free space remaining in the

heap falls below some minimum threshold.

a). The collector walks through the heap, tentatively marking every block as useless.

b). Beginning with all pointers outside the heap, the collector recursively explores all linked

data structures in the program, marking each newly discovered block as useful.

c). The collector again walks through the heap, moving every block that is still marked useless

to the free list.

2). Pointer Reversal :- When the collector explores the path to a given block, it reverses the

pointers it follows, so that each points back to the previous block instead of forward to the next.

As it explores, the collector keeps track of the current block and the block from where it came.

3). Stop and Copy: - In a language with variable size heap blocks, the garbage collector can

reduce external fragmentation by performing storage compaction. Many garbage collector

employ a technique known as stop and copy that achieves compaction. Specifically they divide

the heap into two regions of equal size. All allocation happens in the first half. When this half is

full, the collector begins its exploration of reachable data structures. Each reachable blocks is

copied into the second half of the heap, is overwritten with a useful flag and a pointer to the new

location. Any other pointer that refers to the same block is set to point to the new location. When

the collector finishes its exploration, all useful objects have been moved into the second half of

the heap, and nothing in the first half is needed anymore. The collector can therefore swap its

notion of first and second halves, and the program can continue.

4). Generational collection: - The heap is divided into multiple regions. When space runs low the

collector first examines the youngest region, which it assumes is likely to have the highest


Page 45

proportion of garbage. Only if it is unable to reclaim sufficient space in this region does the

collector examine the next older region. To avoid leaking storage in long running systems, the

collector must be prepared, if necessary, to examine the entire heap. In most cases, however, the

overhead of collection will be proportional to the size of the youngest region only.

Any object that survives some small number of collections in its current region is promoted to

the next older region, in a manner reminiscent of stop and copy. Promotion requires, of course,

that pointers from old objects to new objects be updated to reflect the new locations. While such

old space to new space pointers tends to be rare, a generational collector must be able to find

them all quickly. At each pointer assignment, the compiler generates code to check whether the

new value is an old to new pointer, if so it adds the pointer to a hidden list accessible to the

collector.

5). Conservative Collection:- When space runs low, the collector tentatively marks all blocks in

the heap as useless. It then scans all word aligned quantities in the stack and in global storage. If

any of these words appears to contain the address of something in the heap, the collector marks

the block that contains that address as useful. Recursively, the collector then scans all word-

aligned quantities in the block, and marks as useful any other blocks whose addresses are found

therein. Finally the collector reclaims any blocks that are still marked useless.

42. Why are Lists so heavily used in functional programming languages A list is defined recursively as either the empty list or a pair consisting of an object and another

list. Lists are ideally suited to programming in functional and logic languages, which do most of

their work via recursion and higher order functions. In Lisp, in fact a program is a list, and can

extended itself at run time by constructing a list and executing it.

Lists in ML and Lisp:

Lists in ML are homogeneous: every element of the list must have the same type. Lisp

lists, by contrast are heterogeneous: any object may be placed in a list, so long as it is

never used in an inconsistent fashion. An ML list is usually a chain of blocks, each of

which contains an element and a pointer to the next block. A Lisp list is a chain of cons

cells, each of which contains two pointers, one to the element and one to the next cons

cell.

An ML list is enclosed in square brackets, with elements separated by commas:[a, b, c, d]

A Lisp list is enclosed in parentheses, with elements separated by white space: (a b c d).

In both cases, the notation represents a proper list- one whose innermost pair consists of

the final element and the empty list. In Lisp it is also possible to construct an improper

list, whose final pair contains two elements.

The most fundamental operations on lists are those that construct them from their

components or extract their components from them.

In Lisp:

( cons a (b)) => (a b) (car (a b)) => a (car nil ) => ??

( cdr (a b c)) => (b c) (cdr (a)) => nil


Page 46

(cdr nil) =>??

(append (a b) (c d)) => (a b c d)

Here we have used => to mean evaluates to. The car and cdr of the empty list (nil) are defined to be nil in Common Lisp.

In ML the equivalent operations are written as follows:

a :: [b] => [a, b]

hd [a, b] => a

hd [ ] => run-time exception

t1 [a, b, c] => [b, c]

t1 [a] => nil

t1[ ] => run-time exception

[a, b] @ [c, d] => [a, b, c, d]

43. Discuss about Files and Input/Output Input/output facilities allow a program to communicate with the outside world. Interactive I/O

generally implies communication with human users or physical devices, which work in parallel

with the running program, and whose input to the program may depend on earlier output from

the program. Files generally refer to off-line storage implemented by the operating system. Files

may be further categorized into those that are temporary and those that are persistent. Temporary

files exist for the duration of a single program run; their purpose is to store information that is

too large to fit in the memory available to the program. Persistent files allow a program to read

data that existed before the program began running, and to write data that will continue to exist

after the program has ended. Some languages provide built in file data types and special syntactic

constructs for I/O. Others relegate I/O entirely to library packages, which export a file type and a

variety of input and output subroutines. The principal advantage of language integration is the

ability to employ non-subroutine call syntax, and to perform operations that may not otherwise

be available to library routines.


Page 47

Unit-2

44. Discuss about Static and Dynamic links In a language with nested subroutines and static scoping, objects that lie in surrounding

subroutines, and that are thus neither local nor global, can be found by maintaining a static chain.

Each stack frame contains a reference to the frame of the lexically surrounding subroutine. This

reference is called the static link. By analogy, the saved value of the frame pointer, which will be

restored on subroutine return, is called the dynamic link. The static and dynamic links may or

may not be the same, depending on whether the current routine was called by its lexically

surrounding routine, or by some other routine nested in that surrounding routine.

Whether or not a subroutine is called directly by the lexically surrounding routine, we can be

sure that the surrounding routine is active; there is no other way that the current routine could

have been visible, allowing it to be called.

If subroutine D is called directly from B, then clearly Bs frame will already be on the stack. D is nested inside of B, when control enters B that D comes into view. It can therefore be called by C,

or by any other routine that is nested inside C or D, but only because these are also within B.


Page 48

45. Calling Sequences

Maintenance of the subroutine call stack is the responsibility of the calling sequence. Sometimes

the term calling sequence is used to refer to the combined operations of the caller, the prologue,

and the epilogue.

Tasks that must be accomplished on the way into a subroutine include passing parameters,

saving the return address, changing the program counter, changing the stack pointer to allocate

space, saving registers that contain important values and that may be overwritten by the callee,

changing the frame pointer to refer to the new frame, and executing initialization code for any

objects in the new frame that require it. Tasks that must be accomplished on the way out include

passing return parameters or function values, executing finalization code for any local objects

Principles of Programming Languages Full Note

Documents