Top Banner
Principles of Programming Languages Page 1 Unit-1 Introduction: A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely. The earliest programming languages predate the invention of the computer, and were used to direct the behavior of machines such as Jacquard looms and player pianos. Thousands of different programming languages have been created, mainly in the computer field, with many more being created every year. Most programming languages describe computation in an imperative style, i.e., as a sequence of commands, although some languages, such as those that support functional programming or logic programming, use alternative forms of description. The description of a programming language is usually split into the two components of syntax (form) and semantics (meaning). Some languages are defined by a specification document (for example, the C programming language is specified by an ISO Standard), while other languages, such as Perl 5 and earlier, have a dominant implementation that is used as a reference. 1. What is a name in programming languages? A name is a mnemonic character string used to represent something else. Names are a central feature of all programming languages. In the earliest programs, numbers were used for all purposes, including machine addresses. Replacing numbers by symbolic names was one of the first major improvements to program notation. e.g. variables, constants, executable code (methods, procedures, subroutines, functions, even whole programs), data types, classes, etc. In general, names are of two different types: 1. Special symbols: +, -, * 2. Identifiers: sequences of alphanumeric characters (in most cases beginning with a letter), plus in many cases a few special characters such as '_' or '$'. 2. What is binding ? A binding is an association between two things, such as a name and the thing it names E.g. the binding of a class name to a class or a variables name to a variable Static and Dynamic binding: A binding is static if it first occurs before run time and remains unchanged throughout program execution. A binding is dynamic if it first occurs during execution or can change during execution of the program. 3. What is binding time ? Binding time is the time when an association is established. In programming, a name may have several attributes, and they may be bound at different times.
139

Principles of Programming Languages Full Note

Oct 04, 2015

Download

Documents

francjohny

Principles Of Programming!
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Principles of Programming Languages

    Page 1

    Unit-1

    Introduction:

    A programming language is an artificial language designed to communicate instructions to a

    machine, particularly a computer. Programming languages can be used to create programs that

    control the behavior of a machine and/or to express algorithms precisely.

    The earliest programming languages predate the invention of the computer, and were used to

    direct the behavior of machines such as Jacquard looms and player pianos. Thousands of

    different programming languages have been created, mainly in the computer field, with many

    more being created every year. Most programming languages describe computation in an

    imperative style, i.e., as a sequence of commands, although some languages, such as those that

    support functional programming or logic programming, use alternative forms of description.

    The description of a programming language is usually split into the two components of syntax

    (form) and semantics (meaning). Some languages are defined by a specification document (for

    example, the C programming language is specified by an ISO Standard), while other languages,

    such as Perl 5 and earlier, have a dominant implementation that is used as a reference.

    1. What is a name in programming languages? A name is a mnemonic character string used to represent something else.

    Names are a central feature of all programming languages. In the earliest programs, numbers

    were used for all purposes, including machine addresses. Replacing numbers by symbolic names

    was one of the first major improvements to program notation.

    e.g. variables, constants, executable code (methods, procedures, subroutines, functions, even

    whole programs), data types, classes, etc.

    In general, names are of two different types:

    1. Special symbols: +, -, *

    2. Identifiers: sequences of alphanumeric characters (in most cases beginning with a letter), plus

    in many cases a few special characters such as '_' or '$'.

    2. What is binding? A binding is an association between two things, such as a name and the thing it names

    E.g. the binding of a class name to a class or a variables name to a variable

    Static and Dynamic binding:

    A binding is static if it first occurs before run time and remains unchanged throughout program

    execution.

    A binding is dynamic if it first occurs during execution or can change during execution of the

    program.

    3. What is binding time? Binding time is the time when an association is established.

    In programming, a name may have several attributes, and they may be bound at different times.

  • Principles of Programming Languages

    Page 2

    Example1:

    int n;

    n = 6;

    first line binds the type int to n and the second line binds the value 6 to n. The first binding

    occurs when the program is compiled. The second binding occurs when the program is executed.

    Examlpe2:

    void f()

    {

    int n=7;

    printf("%d", n);

    }void main ()

    { int k;

    scanf("%d", &k);

    if (k>0)

    f();

    }

    In FORTRAN, addresses are bound to variable names at compile time. The result is that, in the

    compiled code, variables are addressed directly, without any indexing or other address

    calculations. (In reality, the process is somewhat more complicated. The compiler assigns an

    address relative to a compilation unit. When the program is linked, the address of the unit within

    the program is added to this address. When the program is loaded, the address of the program is

    added to the address. The important point is that, by the time execution begins, the absolute address of the variable is known.)

    FORTRAN is efficient, because absolute addressing is used. It is inflexible, because all

    addresses are assigned at load time. This leads to wasted space, because all local variables

    occupy space whether or not they are being used, and also prevents the use of direct or indirect

    recursion.

    Early and Late Binding:

    Early binding - efficiency

    Late binding -flexibility

    4. Explain about different times at which decisions may be bound? Or

    Explain different types of binding times?

    In the context of programming languages, there are quite a few alternatives for binding time.

    1. Language design time 2. Language implementation time 3. Program writing time 4. Compile time 5. Link time

  • Principles of Programming Languages

    Page 3

    6. Load time 7. Run time

    1. Language design time:

    In most languages, the control flow constructs the set of fundamental types, the available

    constructors for creating complex types, and many other aspects of language semantics

    are chosen when the language is designed.

    Or

    During the design of the programming languages, the programmers decides what symbols

    should be used to represent operations

    e.g. .Binding of operator symbols to operations

    ( * + ) (Multiplication, addition )

    2. Language Implementation Time: Most language manuals leave a variety of issues to the discretion of the language

    implementor. Typical examples include the precision of the fundamental types, the

    coupling of I/O to the operating systems notion of files, the organization and maximum

    sizes of stack and heap and the handling of run-time exceptions such as arithmetic

    overflow.

    Or

    During language implementation programmers decides what should be the range of

    values given to a data type

    Bind data type, such as int in C to the range of possible values

    Example: the C language does not specify the range of values for the type int. Implementations

    on early microcomputers typically used 16 bits for ints, yielding a range of values from -32768

    to +32767. On early large computers, and on all computers today, C implementations typically

    use 32 bits for int.

    3. Program writing time: Programmers of course, choose algorithms, data structures and names

    Or

    While writing programs, programmers bind certain names with procedure, class etc

    Example: many names are bound to specific meanings when a person writes a program

    4. Compile time: The time when a single compilation unit is compiled, while compiling the type of a

    variable can be identified

    Example: int c; [at compile time int, c forms an association]

  • Principles of Programming Languages

    Page 4

    5. Link time: The time when all the compilation units comprising a single program are linked as the

    final step in building the program

    [The separate modules of a single program will be bound only at link time]

    6. Load time: Load time refers to the point at which the operating system loads the program into

    memory so that it can run

    7. Run time: Run time is actually a very broad term that covers the entire span from the beginning to

    the end of execution. If we give the value of a variable during runtime, it is known as

    runtime binding

    Ex. Printf (Enter the value of X); Scanf (%f, &X);

    5. What is Scope?

    The textual region of the program in which a binding is active is its scope.

    The scope of a name binding is the portion of the text of the source program in which that

    binding is in effect - i.e. the name can be used to refer to the corresponding object. Using a name

    outside the scope of a particular binding implies one of two things: either it is undefined, or it

    refers to a different binding. In C++, for example, the scope of a local variable starts at the

    declaration of the variable and ends at the end of the block in which the declaration appears.

    Scope is a static property of a name that is determined by the semantics of the PL and the text of

    the program.

    Example:

    class Foo

    {

    private int n;

    void foo() {

    // 1

    }

    void bar() {

    int m,n;

    ...

    // 2

    }

    ...

  • Principles of Programming Languages

    Page 5

    A reference to m at point 1 is undefined

    A reference to n at point 1 refers to an instance variable of Foo;

    6. Describe the difference between static and dynamic scoping(scope rules)

    The scope rules (static, dynamic) of a language determine how references to names are

    associated with variables

    Static Scoping:

    In a language with static scoping, the bindings between names and objects can be

    determined at compile time by examining the text of the program, without consideration of the

    flow of control at run time.

    Scope rules are somewhat more complex in FORTRAN, though not much more. FORTRAN

    distinguishes between global and local variables. The scope of a local variable is limited to the

    subroutine in which it appears; it is not visible elsewhere. Variable declarations are optional. If a

    variable is not declared, it is assumed to be local to the current subroutine and to be of type

    integer if its name begins with the letters I-N, or real otherwise.

    Global variables in FORTRAN may be partitioned into common blocks, which are then imported

    by subroutines. Common blocks are designed for separate compilation: they allow a subroutine

    to import only the sets of variables it needs. Unfortunately, FORTRAN requires each subroutine

    to declare the names and types of the variables in each of the common blocks it uses, and there is

    no standard mechanism to ensure that the common blocks it uses, and there is no standard

    mechanism to ensure that the declarations in different subroutines are the same.

    Nested scopes- Many programming languages allow scopes to be nested inside each other.

    Example: Java actually allows classes to be defined inside classes or even inside methods, which

    permits multiple scopes to be nested.

    class Outer

    {

    int v1; // 1

    void methodO()

    {

    float v2; // 2

    class Middle

    {

    char v3; // 3

    void methodM()

  • Principles of Programming Languages

    Page 6

    {

    boolean v4; // 4

    class Inner

    {

    double v5; // 5

    void methodI()

    {

    String v6; // 6

    }

    }

    }

    }

    }

    }

    The scope of the binding for v1 the whole program

    The scope of the binding for v2 is methodO and all of classes

    Middle and Inner, including their methods

    The scope of the binding for v3 is all of classes Middle and

    Inner, including their methods

    The scope of the binding for v4 is methodM and all of class

    Inner, including its method

    The scope of the binding for v5 is all of class Inner, including

    its method

    The scope of the binding for v6 is just methodI

    Some programming languages - including Pascal and its descendants (e.g. Ada) allow

    procedures to be nested inside procedures. (C and its descendants do not allow this)

    Declaration order

    - A field or method declared in a Java class can be used anywhere in the class, even before its

    declaration.

    - A local variable declared in a method cannot be used before the point of its declaration

    Example: class Demo

    {

    public void method()

    {

    // Point 1

    int y;

    }

    private int x;

    }

  • Principles of Programming Languages

    Page 7

    The instance variable x can be used at Point 1, but not y

    Example: Java

    class Demo

    {

    public void method1()

    {

    ...

    method2();

    ...

    }

    public void method2()

    {

    ...

    method1();

    ...

    }

    }

    Example: C/C++:

    void method2(); // Incomplete declaration

    void method1();

    {

    ...

    method2();

    ...

    }

    void method2() // Definition completes the abov

    {

    ...

    method1();

    ...

    }

  • Principles of Programming Languages

    Page 8

    Dynamic Scoping

    In a language with dynamic scoping, the bindings between names and objects

    depend on the flow of control at run time, and in particular on the order in which subroutines are

    called.

    Dynamic scope rules are generally quite simple: the current binding for a given name is the one

    encountered most recently during execution, and not yet destroyed by returning from its scope.

    Languages with dynamic scoping include APL, Snobol, Perl etc. Because the flow of control

    cannot in general be predicted in advance, the binding between names and objects in a language

    with dynamic scoping cannot in general be determined by a compiler. As a result, many semantic

    rules in a language with dynamic scoping become a matter of dynamic semantics rather than

    static semantics.

    Ex. Procedure Big is

    X:integer;

    Procedure sub1 is

    X:integer

    begin.end Procedure sub2 is

    begin. X;

    end;

    begin------

    end;

    In dynamic scoping the X inside sub2 may refer either to the X in Big or X in sub2 based on the

    calling sequence of procedures

    7. What is Local Scope? In block-structured languages, names declared in a block are local to the block. In Algol 60 and

    C++, local scopes can be as small as the programmer needs. Any statement context can be

    instantiated by a block that contains local variable declarations. In other languages, such as

    Pascal, local declarations can be used only in certain places. Although a few people do not like

    the fine control offered by Algol 60 and C++, it seems best to provide the programmer with as

    much freedom as possible and to keep scopes as small as possible.

    Local scope in C++

    {

    // x not accessible.

    t x;

    // x accessible

    . . . .

    { // x still accessible in inner block

    . . . .

    }// x still accessible

    }// x not accessible

  • Principles of Programming Languages

    Page 9

    8. What is Global Scope? The name is visible throughout the program. Global scope is useful for pervasive entities, such as

    library functions and fundamental constants (_ = 3.1415926) but is best avoided for application

    variables.

    FORTRAN does not have global variables, although programmers simulate them by over using

    COMMON declarations. Subroutine names in FORTRAN are global.

    Names declared at the beginning of the outermost block of Algol 60 and Pascal programs have

    global scope. (There may be holes in these global scopes if the program contains local declarations of the same names.)

    9. Explain about Object Lifetime? The word "lifetime" is used in two slightly different ways

    1. To refer to the lifetime of an object

    2. The lifetime of the binding of a name to an object.

    a. An object can exist before the binding of a particular name to it

    Example (Java):

    void something(Object o) {

    // 2

    }

    ....

    Object p = new Object()

    // 1

    ....

    something(p);

    ....

    // 3

    (The object named by p exists at point 1, but the name o is not

    bound to it until point 2)

    b. An object can exist after the binding of a particular name to it has ceased

    Example:

    void something(Object o) {

    // 2

    }

    ....

    Object p = new Object()

  • Principles of Programming Languages

    Page 10

    // 1

    ....

    something(p);

    ....

    // 3

    (The object named by p continues to exist at point 3, even though the binding of the name

    o to it ended when method something() completed)

    c. A name can be bound to an object that does not yet exist

    Example (Java)

    Object o;

    // 1

    ....

    o = new Object(); // 2

    (At point 1, the name o is bound to an object that does not come into existence until point

    2)

    d. A name can be bound to an object that has actually ceased to exist

    Example (C++ - not possible in Java)

    Object o = new Object();

    ...

    delete o; // 1

    ...

    // 2

    At 2, the name o is bound to an object that has ceased to exist.

    (The technical name for this is a dangling reference).

    e. It is also possible for an object to exist without having any name at all bound to it.

    Example (Java or C++)

    Object o = new Object();

    // 1

    ...

    o = new Object();

    // 2

    At point 2, the object that o was bound to at point 1 still exists, but o no longer is bound

    to it. In the absence of any other name bindings to it between points 1 and 2, this object

    now has no name referring to it. (The technical name for this is garbage).

  • Principles of Programming Languages

    Page 11

    10. Storage Allocation Mechanisms?

    Static (permanent) Allocation:- The object exists during the entire time the program is

    running.

    a. Global variables

    - The precise way of declaring global variables differs from language to language

    - In some languages (e.g. BASIC) any variable

    - In FORTRAN any variable declared in the main program or in

    a COMMON block

    - In Java, any class field explicitly declared static

    - In C/C++, any variable declared outside of a class, or (C++)

    declared static inside a class

    b. Static local variables - only available in some languages

    C/C++ local variables explicitly declared static

    int foo() {

    static int i;

    ...

    A static local variable retains its value from one call of a routine to another

    c. Many constants (but not constants that are actually read-only variables)

    Advantages: efficiency (direct addressing),history-sensitive subprogram support (static

    variables retain values between calls ofsubroutines).

    Disadvantage: lack of flexibility (does notsupport recursion)

    Stack Based Allocation:- Storage bindings are created for variables when their declaration

    statements are elaborated

    Typically, the local variables and parameters of a method have stack lifetime. This name comes

    from the normal way of implementing routines, regardless of language.

    Since routines obey a LIFO call return discipline, they can be managed by using a stack

    composed of stack frames - one for each currently active routine.

    Example:

    void d() { /* 1 */ }

  • Principles of Programming Languages

    Page 12

    void c() { ... d() ... }

    void b() { ... c() ... }

    void a() { ... b() ... }

    int main() { ... a() ... }

    Stack at point 1:

    -------------------

    | Frame for d |

  • Principles of Programming Languages

    Page 13

    The shaded blocks are in use, the clear blocks are free. Cross hatched space at the ends of in use

    blocks represents internal fragmentation. The discontiguous free blocks indicate external

    fragmentation.

    Internal fragmentation occurs when a storage management algorithm allocates a block that is

    larger than required to hold a given object, the extra space is then unused. External fragmentation

    occurs when the blocks that have been assigned to active objects are scattered through the heap

    in such a way that the remaining, unused space is composed of multiple blocks: there may be

    quite a lot of free space, but no one piece of it may be large enough to satisfy some future

    request.

    As the program runs, Heap space grows as objects are created. However, to prevent growth

    without limit, there must be some mechanism for recycling the storage used by objects that are

    no longer alive.

    A language implementation typically uses one of three approaches to "recycling" space used by

    objects that are no longer alive:

    Explicit - the program is responsible for releasing space needed by objects that are no

    longer needed using some construct such as delete (C++)

    Reference counting: each heap object maintains a count of the number of external

    pointers/references to it. When this count drops to zero, the space utilized by the object

    can be reallocated

    Garbage collection: The process of deallocating the memory given to a variable is known

    as garbage collection. The system can work in proper order only by proper garbage

    collection. C, C++ does not do garbage collection implicitly but java has implicit garbage

    collector.

    Advantage: Provides for dynamic storage management

    Disadvantage: Inefficient (instead of static) and unreliable

    11. What are internal and external fragmentations? A heap is a region of storage in which subblocks can be allocated and deallocated at arbitrary

    times. Heaps are required for the dynamically allocated pieces of linked data structures, and for

    object like fully general character strings, lists, and sets, whose size may change as a result of an

    assignment statement or other update operation.

    Heap

  • Principles of Programming Languages

    Page 14

    Allocation Req.

    The shaded blocks are in use, the clear blocks are free. Cross hatched space at the ends of in use

    blocks represents internal fragmentation. The discontiguous free blocks indicate external

    fragmentation.

    Internal fragmentation occurs when a storage management algorithm allocates a block that is

    larger than required to hold a given object, the extra space is then unused. External fragmentation

    occurs when the blocks that have been assigned to active objects are scattered through the heap

    in such a way that the remaining, unused space is composed of multiple blocks: there may be

    quite a lot of free space, but no one piece of it may be large enough to satisfy some future

    request.

    12. What is garbage collection? The process of deallocating the memory given to a variable is known as garbage collection (GC).

    The system can work in proper order only by proper garbage collection. C, C++ does not do

    garbage collection implicitly but java has implicit garbage collector.

    The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by

    objects that are no longer in use by the program. Garbage collection was invented by John

    McCarthy around 1959 to solve problems in Lisp.

    Garbage collection does not traditionally manage limited resources other than memory that

    typical programs use, such as network sockets, database handles, user interaction windows, and

    file and device descriptors. Methods used to manage such resources, particularly destructors,

    may suffice as well to manage memory, leaving no need for GC. Some GC systems allow such

    other resources to be associated with a region of memory that, when collected, causes the other

    resource to be reclaimed; this is called finalization. Finalization may introduce complications

    limiting its usability, such as intolerable latency between disuse and reclaim of especially limited

    resources, or a lack of control over which thread performs the work of reclaiming.

    13. What is Aliasing? Two or more names that refer to the same object at the same point in the program are said to be

    aliases.

    1. Aliases can be created by assignment of pointers/references

    Example: Java: Robot karel = ...

    Robot foo = karel;

    foo and karel are aliases for the same Robot object

    2. Aliases can be created by passing reference parameters

    Example: C++

    void something(int a [], int & b)

    {

    // 1

  • Principles of Programming Languages

    Page 15

    ...

    }

    int x [100];

    int y;

    something(x, y);

    After the call to something(x, y), at point 1 x and a are aliases for the same array

    14. What is Overloading? A name is said to be overloaded if, in some scope, it has two or more meanings, with the actual

    meaning being determined by how it is used.

    Example: C++

    void something(char x)

    ...

    void something(double x)

    ...

    // 1

    At point 1, something can refer to one or the other of the two methods, depending

    on its parameter.

    15. What is Polymorphism? Polymorphism is the concept that supports the capability of an object of a class to behave

    differently in response to a message or action.

    1. Compile-time (static) polymorphism: the meaning of a name is determined at compile-time

    from the declared types of what it uses

    2. Run-time (dynamic) polymorphism: the meaning of a name is determined when the program is

    running from the actual types of what it uses.

    Example: Java has both types in different contexts:

    a. When a name is overloaded in a single class, static polymorphism is used to determine which

    declaration is meant.

    b. When a name is overridden in a subclass, dynamic polymorphism is used to determine which

    version to use.

  • Principles of Programming Languages

    Page 16

    16. What is control flow? The order in which operations are executed in a program

    e.g. in C++ like language,

    a = 1;

    b = a + 1;

    if a > 100 then b = a - 1; else b = a + 1;

    a - b + c

    17. Name eight major categories of control flow mechanisms? a. Sequencing: - Statements are to be executed in a certain specified order- usually the order

    in which they appear in the program text.

    b. Selection: - Depending on some run time condition, a choice is to be made among two or more statements or expressions. The most common selection condition are if and case

    statements. Selection is also sometimes referred to as alternation.

    c. Iteration: - A given fragment of code is to be executed repeatedly, either a certain number of times, or until a certain run- time condition is true. Iteration constructs include for/do,

    while, and repeat loops.

    d. Procedural abstraction: - A potentially complex collection of control constructs is encapsulated in a way that allows it to be treated as a single unit, usually subject to

    parameterization.

    e. Recursion: - An expression is defined in terms of itself, either directly or indirectly; the computational model requires a stack on which to save information about partially

    evaluated instances of the expression. Recursion is usually defined by means of self-

    referential subroutines.

    f. Concurrency:- Two or more program fragments are to be executed/evaluated at the same time, either in parallel on separate processors, or interleaved on a single processor in a

    way that achieves the same effect.

    g. Exception handling and speculation:- A program fragment is executed optimistically, on the assumption that some expected condition will be true. If that condition turns out to be

    false, execution branches to a handler that executes in place of the remainder of the

    protected fragment or in place of the entire protected fragment. For speculation, the

    language implementation must be able to undo, or roll back any visible effects of the

    protected code.

    h. Nondeterminacy: - The ordering or choice among statements or expressions is deliberately left unspecified, implying that any alternative will lead to correct results.

    Some languages require the choice to be random, or fair, in some formal sense of the

    word.

    18. What distinguishes operators from other sort of functions? An expression generally consists of either a simple object or an operator or function applied to a

    collection of operands or arguments, each of which in turn is an expression. It is conventional to

    use the term operator for built in functions that use special, simple syntax, and to use the term

    operand for an argument of an operator. In most imperative languages, function call consists of a

    function name followed by a parenthesized, comma-separated list of arguments, as in

  • Principles of Programming Languages

    Page 17

    my_func (A, B, C)

    Operators are typically simpler, taking only one or two arguments, and dispensing with the

    parentheses and commas:

    a + b

    -c

    In general, a language may specify that function calls employ prefix, infix, or postfix notation.

    These terms indicate, respectively, whether the function name appears before, among, or after its

    several arguments:

    Prefix: op a b

    Infix: a op b

    Postfix: a b op

    19. Explain the difference between prefix, infix, and postfix notation. What is Cambridge polish notation?

    Most imperative languages use infix notation for binary operators and prefix notation for unary

    operators and other functions. Lisp uses prefix notation for all notation for all functions.

    Cambridge polish notation places the function name inside the parentheses:

    ( * ( + 1 3) 2)

    (append a b c my_list)

    20. What is an L- value? An r-value? Consider the following assignments in C:

    d= a;

    a= b+c;

    In the first statement, the right- hand side of the assignment refers to the value of a, which we

    wish to place into d. In the second statement, the left hand side refers to the location of a, where

    we want to put the sum of b and c. Both interpretations-value and location-are possible because a

    variable in C is a named container for a value. We sometimes say that languages like C use a

    value model of variables. Because of their use on the left hand side of assignment statements,

    expressions that denote locations are referred to as L-values. Expressions that denote values are

    referred to as r- values. Under a value model of variables, a given expression can be either an L-

    value or an r-value, depending on the context in which it appears.

    Of course, not all expressions can be L-values, because not all values have a location, and not all

    names are variables. In most languages it makes no sense to say 2+3 =a, or even a= 2+3, if a is

    the name of a constant. By the same token, not all L-values are simple names; both L-values and

    r-values can be complicated expressions. In C one may write

    (f (a) +3) -> b [c]=2;

    In this expression f (a) returns a pointer to some element of an array of pointers to structures. The

    assignment places the value 2 into the c-th element of field b of the structure pointed at by the

    third array element after the one to which fs return value points. In C++ it is even possible for a function to return a reference to a structure, rather than a pointer

    to it , allowing one to write

    g(a) . b[c]= 2;

  • Principles of Programming Languages

    Page 18

    21. Define orthogonality in the context of programming language design? One of the principal design goals of Algol 68 was to make the various features of the languages

    as orthogonal as possible. Orthogonality means that features can be used in any combination, the

    combinations all make sense, and the meaning of a given feature is consistent, regardless of the

    other features with which it is combined.

    Algol 68 was one of the first languages to make orthogonality a principal design goal, and in fact

    few languages since have given the goal such weight. Among other things, Algol 68 is said to be

    expression oriented: it has no separate notion of statement. Arbitrary expressions can appear in

    contexts that would call for a statement in a language like Pascal, and constructs that are

    considered to be statements in other languages can appear within expressions. The following, for

    example is valid in Algol 68:

    begin

    a:=if b < c then d else e;

    a:= begin f(b); g(c) end;

    g(d);

    2+3

    End

    22. Expression Evaluation

    An expression consists of

    o A simple object, e.g. number or variable

    o An operator applied to a collection of operands or arguments which are

    expressions

    Common syntactic forms for operators:

    o Function call notation, e.g. somefunc(A, B, C), where A, B, and C are expressions

    o Infix notation for binary operators, e.g. A + B

    o Prefix notation for unary operators, e.g. -A

    o Postfix notation for unary operators, e.g. i++

    o Cambridge Polish notation, e.g. (* (+ 1 3) 2) in Lisp =(1+3)*2=8

    Expression Evaluation Ordering: Precedence and Associativity:-

    Precedence rules specify that certain operators, in the absence of parentheses, group more

    tightly than other operators. In most languages multiplication and division group more tightly

    than addition and subtraction, so 2+3*4 is 14 and not 20.

    In Java all binary operators except assignments are left associative

    3 - 3 + 5

    x = y = f()

    (assignments evaluate to the value being assigned)

    In C++ arithmetic operators (+, -, *, ...) have higher precedence than relational operators (

  • Principles of Programming Languages

    Page 19

    The use of infix, prefix, and postfix notation leads to ambiguity as to what is an operand

    of what, e.g. a+b*c**d**e/f in Fortran

    The choice among alternative evaluation orders depends on

    o Operator precedence: higher operator precedence means that a (collection of)

    operator(s) group more tightly in an expression than operators of lower

    precedence

    o Operator associativity: determines evaluation order of operators of the same

    precedence

    left associative: operators are evaluatated left-to-right (most common)

    right associative: operators are evaluated right-to-left (Fortran power

    operator **, C assignment operator = and unary minus)

    non-associative: requires parenthesis when composed (Ada power

    operator **)

    Evaluation Order in Expressions:

    Precedence and associativity define rules for structuring expressions

    But do not define operand evaluation order

    o Expression a-f(b)-c*d is structured as (a-f(b))-(c*d) by compiler, but either (a-

    f(b)) or (c*d) can be evaluated first at run-time

    Knowing the operand evaluation order is important

    o Side effects: e.g. if f(b) above modifies d (i.e. f(b) has a side effect) the expression

    value will depend on the operand evaluation order

    o Code improvement: compilers rearrange expressions to maximize efficiency

    Improve memory loads:

    a:=B[i]; load a from memory

    c:=2*a+3*d; compute 3*d first, because waiting for a to arrive in

    processor

    Common subexpression elimination:

    a:=b+c;

    d:=c+e+b; rearranged as d:=b+c+e, it can be rewritten into d:=a+e

    Expression Reordering Problems

    Rearranging expressions may lead to arithmetic overflow or different floating point

    results

    o Assume b, d, and c are very large positive integers, then if b-c+d is rearranged

    into (b+d)-c arithmetic overflow occurs

    o Floating point value of b-c+d may differ from b+d-c

    o Most programming languages will not rearrange expressions when parenthesis are

    used, e.g. write (b-c)+d to avoid problems

    Java: expressions evaluation is always left to right and overflow is always detected

    Pascal: expression evaluation is unspecified and overflows are always detected

    C and C++: expression evaluation is unspecified and overflow detection is

    implementation dependent

  • Principles of Programming Languages

    Page 20

    Short-Circuit Evaluation

    Short-circuit evaluation of Boolean expressions means that computations are skipped

    when logical result of a Boolean operator can be determined from the evaluation of one

    operand

    C, C++, and Java use conditional and/or operators: && and ||

    o If a in a&&b evaluates to false, b is not evaluated

    o If a in a||b evaluates ot true, b is not evaluated

    o Avoids the Pascal problem

    o Useful to increase program efficiency, e.g.

    if (unlikely_condition && expensive_condition()) ...

    Pascal does not use short-circuit evaluation

    o The program fragment below has the problem that element a[11] can be accessed

    resulting in a dynamic semantic error:

    o var a:array [1..10] of integer;

    ...

    i:=1;

    while i

  • Principles of Programming Languages

    Page 21

    o Compiler produces better code, because the address of a variable is only

    calculated once

    Multiway assignments in Clu, ML, and Perl

    o a,b := c,d assigns c to a and d to b simultaneously, e.g. a,b := b,a swaps a with b

    a,b := 1 assigns 1 to both a and b

    23. What is short circuit Boolean evaluation? Why is it useful?

    Short-circuit evaluation of Boolean expressions means that computations are skipped when

    logical result of a Boolean operator can be determined from the evaluation of one operand

    C, C++, and Java use conditional and/or operators: && and ||

    a. If a in a&&b evaluates to false, b is not evaluated b. If a in a||b evaluates ot true, b is not evaluated c. Avoids the Pascal problem d. Useful to increase program efficiency, e.g.

    if (unlikely_condition && expensive_condition()) ...

    Pascal does not use short-circuit evaluation

    e. The program fragment below has the problem that element a[11] can be accessed resulting in a dynamic semantic error:

    f. var a:array [1..10] of integer; ...

    i:=1;

    while i

  • Principles of Programming Languages

    Page 22

    Iteration: for and while loop statements

    Subroutine calls and recursion

    Sequencing One statement appearing after another

    - A list of statements in a program text is executed in top-down order - A compound statement is a delimited list of statements

    o A compund statement is a block when it includes variable declarations

    o C, C++, and Java use { and } to delimit a block

    o Pascal and Modula use begin ... end

    o Ada uses declare ... begin ... end

    - C, C++, and Java: expressions can be used where statements can appear

    In pure functional languages, sequencing is impossible (and not desired!)

    Selection Selects which statements to execute next

    Forms of if-then-else selection statements:

    o C and C++ EBNF syntax:

    if () [else ]

    Condition is integer-valued expression. When it evaluates to 0, the else-clause

    statement is executed otherwise the then-clause statement is executed. If more

    than one statement is used in a clause, grouping with { and } is required

    o Java syntax is like C/C++, but condition is Boolean type

    o Ada syntax allows use of multiple elsif's to define nested conditions:

    if then

    elsif then

    elsif then

    ...

    else

    end if

  • Principles of Programming Languages

    Page 23

    Case/switch statements are different from if-then-else statements in that an expression

    can be tested against multiple constants to select statement(s) in one of the arms of the

    case statement:

    o C, C++, and Java syntax:

    switch ()

    { case : break;

    case : break;

    ...

    default:

    }

    o break is necessary to transfer control at the end of an arm to the end of the switch

    statement

    The use of a switch statement is much more efficient compared to nested if-then-else statements

    Iteration

    Iteration means the act of repeating a process usually with the aim of approaching a desired goal

    or target or result. Each repetition of the process is also called an "iteration," and the results of

    one iteration are used as the starting point for the next iteration.

    A conditional that keeps executing as long as the condition is true

    e.g: while, for, loop, repeat-until, ...

    Iteration and recursion are the two mechanisms that allow a computer to perform similar

    operations repeatedly. Without at least one of these mechanisms, the running time of a program

    would be a linear function of the size of the program text. In a very real sense, it is iteration and

    recursion that make computer useful.

    Enumeration-Controlled Loops

    Enumeration controlled iteration originated with the do loop of Fortran I. Similar mechanisms have been

    adopted in some form by almost every subsequent language, but syntax and semantics vary widely.

    Fortran-IV:

    DO 20 i = 1, 10, 2

    ...

    20 CONTINUE

    which is defined to be equivalent to

    i = 1

    20 ...

    i = i + 2

    IF i.LE.10 GOTO 20

    Algol-60 combines logical conditions:

  • Principles of Programming Languages

    Page 24

    o for := do

    where the EBNF syntax of is

    -> [, enumerator]*

    ->

    | step until

    | while

    Difficult to understand and too many forms that behave the same:

    for i := 1, 3, 5, 7, 9 do ...

    for i := 1 step 2 until 10 do ...

    for i := 1, i+2 while i < 10 do ...

    Pascal has simple design:

    o for := to do

    for := downto do

    o Can iterate over any discrete type, e.g. integers, chars, elements of a set

    o Index variable cannot be assigned and its terminal value is undefined

    Ada for loop is much like Pascal's:

    o for in .. loop

    end loop for in reverse .. loop

    end loop o Index variable has a local scope in loop body, cannot be assigned, and is not

    accessible outside of the loop

    C, C++, and Java do not have enumeration-controlled loops although the logically-

    controlled for statement can be used to create an enumeration-controlled loop:

    o for (i = 1; i

  • Principles of Programming Languages

    Page 25

    Problems with Enumeration-Controlled Loops:

    C/C++:

    o This C program never terminates:

    #include

    main()

    { int i;

    for (i = 0; i

  • Principles of Programming Languages

    Page 26

    Logically-Controlled Post test Loops:

    Logically-controlled post test loops test an exit condition after each loop iteration

    Not available in Fortran-77 (!)

    Pascal:

    o repeat [; ]* until

    where the condition is a Boolean expression and the loop will terminate when the

    condition is true

    C, C++:

    o do while ()

    where the loop will terminate when the expression evaluates to 0 and multiple

    statements need to be enclosed in { and }

    Java is like C++, but condition is a Boolean expression

    Logically-Controlled Mid test Loops:

    Logically-controlled mid test loops test exit conditions within the loop

    Ada:

    o loop

    exit when ;

    exit when ;

    ...

    end loop o Also allows exit of outer loops using labels:

    outer: loop

    ... for i in 1..n loop

    ... exit outer when cond;

    ...

    end loop;

    end outer loop;

    C, C++:

    o Use break statement to exit loops

    o Use continue to jump to beginning of loop to start next iteration

    Java is like C++, but combines Ada's loop label idea to allow jumps to outer loops

  • Principles of Programming Languages

    Page 27

    Recursion When a function may directly or indirectly call itself

    Can be used instead of loops Functional languages frequently have no loops but only recursion

    Iteration and recursion are equally powerful: iteration can be expressed by recursion and

    vice versa

    Recursion can be less efficient, but most compilers for functional languages will optimize

    recursion and are often able to replace it with iterations

    Recursion can be more elegant to use to solve a problem that is recursively defined

    Tail Recursive Functions

    Tail recursive functions are functions in which no computations follow a recursive call in

    the function

    A recursive call could in principle reuse the subroutine's frame on the run-time stack and

    avoid deallocation of old frame and allocation of new frame

    This observation is the key idea to tail-recursion optimization in which a compiler

    replaces recursive calls by jumps to the beginning of the function

    For the gcd example, a good compiler will optimize the function into:

    int gcd(int a, int b)

    { start:

    if (a==b) return a;

    else if (a>b) { a = a-b; goto start; }

    else { b = b-a; goto start; }

    }

    which is just as efficient as the iterative implementation of gcd:

    int gcd(int a, int b)

    { while (a!=b)

    if (a>b) a = a-b;

    else b = b-a;

    return a;

    }

    Continuation-Passing-Style:

    Even functions that are not tail-recursive can be optimized by compilers for functional

    languages by using continuation-passing style:

    o With each recursive call an argument is included in the call that is a reference

    (continuation function) to the remaining work

  • Principles of Programming Languages

    Page 28

    The remaining work will be done by the recursively called function, not after the call, so the

    function appears to be tail-recursive

    Other Recursive Function Optimizations

    Another function optimization that can be applied by hand is to remove the work after the

    recursive call and include it in some other form as an argument to the recursive call

    For example:

    typedef int (*int_func)(int);

    int summation(int_func f, int low, int high)

    { if (low==high) return f(low)

    else return f(low)+summation(f, low+1, high);

    }

    can be rewritten into the tail-recursive form:

    int summation(int_func f, intlow, int high, int subtotal)

    { if (low==high) return subtotal+f(low)

    else return summation(f, low+1, high, subtotal+f(low));

    }

    This example in Scheme:

    (define summation (lambda (f low high)

    (if (= low high) ;condition

    (f low) ;then part

    (+ (f low) (summation f (+ low 1) high))))) ;else-part rewritten:

    (define summation (lambda (f low high subtotal)

    (if (=low high)

    (+ subtotal (f low))

    (summation f (+ low 1) high (+ subtotal (f low))))))

    Nondeterminacy: Our final category of control flow is nondeterminacy. A nondeterministic construct is one in

    which the choice between alternatives is deliberately unspecified. Some languages, notably

    Algol 68 and various concurrent languages, provide more extensive nondeterministic

    mechanisms, which cover statements as well.

    25. Data Types

    Most programming languages require the programmer to declare the data type of every data

    object, and most database systems require the user to specify the type of each data field. The

    available data types vary from one programming language to another, and from one database

    application to another, but the following usually exist in one form or another:

  • Principles of Programming Languages

    Page 29

    integer : In more common parlance, whole number; a number that has no fractional

    part.

    floating-point : A number with a decimal point. For example, 3 is an integer, but 3.5

    is a floating-point number.

    character (text ): Readable text

    26. What purpose do types serve in a programming language? a. Types provide implicit context for many operations, so that the programmer does

    not have to specify that context explicitly. In C, for instance, the expression a+b

    will use integer addition if a and b are of integer type, it will use floating point

    addition if a and b are of double type.

    b. Types limit the set of operations that may be performed in a semantically valid

    program. They prevent the programmer from adding a character and a record, for

    example, or from taking the arctangent of a set, or passing a file as a parameter to

    a subroutine that expects an integer.

    27. Discuss about Type Systems

    A type system consist of (1) a mechanism to define types and associate them with certain

    language constructs, and (2) a set of rules for type equivalence, type compatibility, and type

    inference. The constructs that must have types are precisely those that have values or that can

    refer to objects that have values. These constructs include named constants, variables, record

    field, parameters, and sometimes subroutines; literal constants. Type equivalence rules determine

    (a) when the types of two values are the same. Type compatibility rules determine (a) when a

    value of a given type can be used in a given context. Type inference rules define the type of an

    expression based on the types of its constituent parts or the surrounding context.

    Type checking:-

    Type checking is the process of ensuring that a program obeys the languages type compatibility

    rules. A violation of the rules is known as a type clash. A language is said to be strongly typed if

    it prohibits, in a way that the language implementation can enforce, the application of any

    operation to any object that is not intended to support that operation. A language is said to be

    statically typed if it is strongly typed and type checking can be performed at compile time.

    Ex: Ada is strongly typed and for the most part statically typed. A Pascal implementation can

    also do most of its type checking at compile time, though the language is not quite strongly

    typed: untagged variant records are its only loophole.

    Dynamic type checking is a form of late binding, and tends to be found in languages that delay

    other issues until run time as well. Lisp and smalltalk are dynamically typed. Most scripting

    languages are also dynamically typed; some (Python, Ruby) are strongly typed.

  • Principles of Programming Languages

    Page 30

    Classification of Types:-

    The terminology for types varies some from one language to another. Most languages provide

    built in types similar to those supported in hardware by most processors: integers, characters,

    Boolean, and real (floating point) numbers.

    Booleans are typically implemented as single byte quantities with 1 representing true and 0

    representing false.

    Characters have traditionally been implemented as one byte quantities as well, typically using the

    ASCII encoding. More recent languages use a two byte representation designed to accommodate

    the Unicode character set.

    Numeric Types:-

    A few languages (C, Fortran) distinguish between different lengths of integers and real numbers;

    most do not, and leave the choice of precision to the implementation. Unfortunately, differences

    in precision across language implementations lead to a lack of portability: programs that run

    correctly on one system may produce run-time errors or erroneous results on another.

    A few languages, including C,C++,C# and Modula-2,provide both signed and unsigned integers.

    A few languages (Fortran,C99,Common Lisp) provide a built in complex type, usually

    implemented as a pair of floating point numbers that represent the real and imaginary Cartesian

    coordinates; other languages support these as a standard library class. Most scripting languages

    support integers of arbitrary precision. Ada supports fixed point types, which are represented

    internally by integers. Integers, Booleans, characters are examples of discrete types.

    Enumeration Types:-

    Enumerations were introduced by Wirth in the design of Pascal. They facilitate the creation of

    readable programs, and allow the compiler to catch certain kinds of programming errors. An

    enumeration type consists of a set of named elements. In Pascal one can write:

    Type weekday=(sun, mon, tue, ordered, so comparisons are generally valid(mon

  • Principles of Programming Languages

    Page 31

    In Ada one would write

    Type test_score is new integer range 0..100;

    Subtype workday is weekday range mon..fri;

    The range portion of the definition in Ada is called a type constraint. test_score is a derived type,incompatible with integers.

    The workday can be more or less freely intermixed.

    Composite Types:-

    Nonscalar types are usually called composite, or constructed types. They are generally created by

    applying a type constructor to one or more simpler types. Common composite types include

    records, variant records, arrays, sets, pointers, lists, and files.

    Records-Introduced by Cobol, and have been supported by most languages since the 1960s.A record consists of collection of fields, each of which belongs to a

    simpler type.

    Variant records-It differs from normal records in that only one of a variant records field is valid at any given time.

    Arrays-Are the most commonly used composite types. An array can be thought of as a function that maps members of an index type to members of a component

    type.

    Sets-Introduced by Pascal. A set type is the mathematical powerset of its base type, which must often be discrete.

    Pointers-A pointer value is a reference to an object of the pointers base type. Pointers are often but not always implemented as addresses. They are most often

    used to implement recursive data types

    Lists-Contain a sequence of elements, but there is no notion of mapping or indexing. Rather, a list is defined recursively as either an empty list or a pair

    consisting of a head element and a reference to a sublist. To find a given element

    of a list ,a program must examine all previous elements, recursively or iteratively,

    starting at the head. Because of their recursive definition, lists are fundamental to

    programming in most functional languages.

    Files-Are intended to represent data on mass storage devices, outside the memory in which other program objects reside.

    28. Discuss about Type checking

    In most statically typed languages, every definition of an object must specify the objects type.

    Type equivalence:-

    In a language in which the user can define new types, there are two principal ways of defining

    type equivalence. Structural equivalence is based on the content of type definitions. Name

    equivalence is based on the lexical occurrence of type definitions. Structural equivalence is used

    in Algol-68,Modula-3,C.Name equivalence is the more popular approach in recent languages. It

    is used in Java, C#, standard Pascal and Ada

  • Principles of Programming Languages

    Page 32

    The exact definition of structural equivalence varies from one language to another.

    Structural equivalence in Pascal:

    Type R2=record

    a,b : integer

    end;

    should probably be considered the same as

    type R3 = record

    a : integer;

    b : integer

    end;

    But what about

    Type R4 = record

    b : integer;

    a : integer

    end;

    should the reversal of the order of the fields change the type? Most languages say yes.

    In a similar vein,consider the following arrays,again in a Pascal like notation:

    type str = array [1..10] of char; type str = array [09] of char;

    Here the length of the array is the same in both cases, but the index values are different. Should

    these be considered equivalent? Most languages say no, but some (Fortran, Ada) consider them

    compatible.

    Type compatibility:-

    Most languages do not require equivalence of types in every context. Instead, they merely say

    that a values type must be compatible with that of the context in which it appears. In an

    assignment statement, the type of the right hand side must be compatible with that of the left-

    hand side. The types of the operands of + must both be compatible with some common type that

    supports addition .In a subroutine call, the types of any arguments passed into the subroutine

    must be compatible with the types of the corresponding formal parameters, and the types of any

    formal parameters passed back to the caller must be compatible with the types of the

    corresponding arguments.

    Coercion:-

    Whenever a language allows a value of one type to be used in a context that expects another, the

    language implementation must perform an automatic, implicit conversion to the expected type.

    This conversion is called a type coercion.

  • Principles of Programming Languages

    Page 33

    29. Discuss about Records(Structures) and Variants(Unions)

    Record types allow related data of heterogeneous types to be stored and manipulated together.

    Some languages (Algol 68, C,C++,Common Lisp) use the term structure instead of record.

    Fortran 90 simply calls its records types. Structures in C++ are defined as a special form of class.

    Syntax and Operations:-

    In c a simple record might be defined as follows.

    struct element {

    char name[2];

    int atomic_number;

    double atomic_weight;

    _Bool metallic;

    };

    In Pascal the corresponding declarations would be

    Type two_chars = packed array [1..2] of char;

    Type element = record

    name : two_chars;

    atomic_number : integer;

    atomic_weight : real;

    metallic : Boolean

    end;

    Memory layout and its impact:

    The fields of a record are usually stored in adjacent locations in memory. In its symbol table, the

    compiler keeps track of the offset of each field within each record type. When it needs to access

    a field, the compiler typically generates a load or store instruction with displacement addressing.

    For a local object, the base register is the frame pointer; the displacement is the sum of the

    records offset from the register and the fields offset within the record.

    Variant Records (Unions):

    Programming language of the 1960s and 1970s were designed in an era of severe memory

    constraints. Many allowed the programmer to specify that certain variables should be allocated

    on top of one another, sharing the same bytes in memory. Cs syntax heavily influenced by Algol 68.

    Union {

    int i;

    double d;

    _Bool b;

    };

  • Principles of Programming Languages

    Page 34

    In practice, unions have been used for two main purposes. The first arises in systems programs,

    where unions allow the same set of bytes to be interpreted in different ways at different times.

    The canonical example occurs in memory management, where storage may sometimes be treated

    as unallocated space, sometimes as bookkeeping information, and sometimes as user allocated

    data of arbitrary type.

    The second common purpose for unions is to represent alternative sets of fields within a record.

    A record representing an employee, for example, might have several common fields (name,

    address, phone, department) and various other fields such as salaried, hourly or consulting basis.

    30. Briefly describe two purposes for unions/ variant records Programming language of the 1960s and 1970s were designed in an era of severe memory

    constraints. Many allowed the programmer to specify that certain variables should be allocated

    on top of one another, sharing the same bytes in memory. Cs syntax heavily influenced by Algol 68.

    union {

    int i;

    double d;

    _Bool b;

    };

    In practice, unions have been used for two main purposes. The first arises in systems programs,

    where unions allow the same set of bytes to be interpreted in different ways at different times.

    The canonical example occurs in memory management, where storage may sometimes be treated

    as unallocated space, sometimes as bookkeeping information, and sometimes as user allocated

    data of arbitrary type.

    The second common purpose for unions is to represent alternative sets of fields within a record.

    A record representing an employee, for example, might have several common fields (name,

    address, phone, department) and various other fields such as salaried, hourly or consulting basis.

    31. What is an Array? Arrays are the most common and important composite data types. They have been a fundamental

    part of almost every high-level language, beginning with FortranI. Unlike records, which group

    related fields of disparate types, arrays are usually homogeneous. Semantically, they can be

    thought of as a mapping from an index type to a component or element type.

    Syntax and operations:

    Most languages refer to an element of an array by appending a subscript delimited by

    parentheses or square brackets-to the name of the array. In Fortran and Ada ,one says A(3);in

    Pascal and C, one says A[3].Since parentheses are generally used to delimit the arguments to a

    subroutine call, square bracket subscript notation has the advantage of distinguishing between

    the two. The difference in notation makes a program easier to compile and arguably easier to

    read.

    Declarations:

    In some languages one declares an array by appending subscript notation to the syntax that

    would be used to declare a scalar. In C:

  • Principles of Programming Languages

    Page 35

    Char upper[26];

    In Fortran:

    character, dimension (1:26)::upper

    character (26) upper //shorthand notation

    In C the lower bound of an index range is always zero; the indices of an n-element array are

    0n-1. In Fortran the lower bound of the index range is one by default.

    32. What is an Array Slices A slice or section is a rectangular portion of an array. Fortran 90 and single assignment C

    provide extensive facilities for slicing, as do many scripting languages including Perl, Python,

    Ruby and R. A slice is simply a contiguous range of elements in a one- dimensional array.

    Fortran 90 has a very rich set of array operations: built in operations that take entire arrays as

    arguments. Because Fortran uses structural type equivalence, the operands of an array operator

    need have the same element type and shape. In particular, slices of the same shape can be

    intermixed in array operations, even if the arrays from which they were sliced have very different

    shapes. Any of the built in arithmetic operators will take arrays as operands; the result is an

    array,of the same shape as the operands, whose elements are the result of applying the operator

    to corresponding elements.

    Array slices (sections) in Fortran 90.

    [ a : b : c in a subscript indicates positions a, a + c, . . . through b. If a or b is omitted, the

    corresponding bound is assumed. If c is omitted, 1 is assumed. If c is negative, then we select

    positions in reverse order. The slashes in the second subscript of the lower-right example delimit

    an explicit list of positions.]

  • Principles of Programming Languages

    Page 36

    33. How arrays are allocated? Storage management is more complex for arrays whose shape is not known until elaboration

    time, or whose shape may change during execution. For these the compiler must arrange not only

    to allocate space, but also to make shape information available at run time. Some dynamically

    typed languages allow run time binding of both the number and bounds of dimensions. Compiled

    languages may allow the bounds to be dynamic, but typically require the number of dimensions

    to be static. A local array whose shape is known at elaboration time may still be allocated in the

    stack. An array whose size may change during execution must generally be allocated in the heap.

    Global lifetime, static shape: allocate space for the array in static global memory

    Local lifetime, static shape: space can be allocated in the subroutines stack frame at run time

    Local lifetime, shape bound at elaboration time: an extra level of indirection is required to place the space for the array in the stack frame of its subroutine (Ada, C)

    Arbitrary lifetime, shape bound at elaboration time: at elaboration time either space is allocated or a preexistent reference from another array is assigned (Java, C#)

    Arbitrary lifetime, dynamic shape: must generally be allocated from the heap. A pointer to the array still resides in the fixed-size portion of the stack frame (if local lifetime).

    Allocation in Ada of local arrays whose shape is bound at elaboration time

    //Ada:

    Procedure foo (size : integer ) is

    M : array (1size, 1..size) of real; .. begin

    .. end foo;

    //C99:

    void foo (int size)

    {

    double M[size][size];

    }

  • Principles of Programming Languages

    Page 37

    [The compiler arranges for a pointer to M to reside at a static offset from the frame pointer. M

    cannot be placed among the other local variables because it would prevent those higher in the

    frame from having static offsets.]

    34. Discuss about Memory Layout of Arrays Arrays in most language implementations are stored in contiguous locations in memory. In a one

    dimensional array the second element of the array is stored immediately after the first; the third

    is stored immediately after the second, and so forth. For arrays of records, it is common for each

    subsequent element to be aligned at an address appropriate for any type; small holes between

    consecutive records may result.

    For multidimensional arrays, there are two layouts: row-major order and column-major order

    In row-major order, consecutive locations in memory hold elements that differ by one in the final subscript (except at the ends of rows).

    In column-major order, consecutive locations hold elements that differ by one in the initial subscript

  • Principles of Programming Languages

    Page 38

    [ In row major order, the elements of a row are contiguous in memory; in column-major order,

    the elements of a column are contiguous. The second cache line of each array is shaded, on the

    assumption that each element is an eight-byte floating point number, that cache lines are 32 bytes

    long and that the array begins at a cache line boundary. If the array is indexed from A[0,0] to

    A[9,9],then in the row major case elements A[0,4] through A[0,7] share a cache line; in the

    column-major case elements A[4,0] through A[7,0] share a cache line.]

    Row-Pointer Layout:

    Allow the rows of an array to lie anywhere in memory, and create an auxiliary array of pointers to the rows.

    Technically speaking, only the contiguous layout is a true multidimensional array

    This row-pointer memory layout requires more space in most cases but has three potential advantages.

    It sometimes allows individual elements of the array to be accessed more quickly, especially on CISC machines with slow multiplication instructions

    It allows the rows to have different lengths, without devoting space to holes at the ends of the rows; the lack of holes may sometimes offset the increased space for

    pointers

    It allows a program to construct an array from preexisting rows (possibly scattered throughout memory) without copying

    C, C++, and C# provide both contiguous and row-pointer organizations for multidimensional arrays

    Java uses the row-pointer layout for all arrays

    Row-Pointer Layout in C:

    char days [ ][10]={

    Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday };

  • Principles of Programming Languages

    Page 39

    . days [2] [3] = =s; /*in Tuesday */

    [ It is a true two dimensional array. The slashed boxes are NUL bytes; the shaded areas are

    holes.]

    char *days [ ] = {

    Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday };

    .. days [2] [3] = = s; /* in Tuesday */

    [ It is a ragged array of pointers to arrays of characters.]

  • Principles of Programming Languages

    Page 40

    35. What is a dope vector? What purposes does it serve? A dope vector will contain the lower bound of each dimension and the size of each dimension

    other than the last. If the language implementation performs dynamic semantic checks for out of

    bounds subscripts in array references, then the dope vector may contain upper bounds as well.

    Given lower bounds and sizes, the upper bound information is redundant, but it is usually

    included anyway, to avoid computing it repeatedly at run time.

    The contents of the dope vector are initialized at elaboration time, or whenever the number or

    bounds of dimensions change. In a language like Fortran 90, whose notion of shape includes

    dimension sizes but not lower bounds, an assignment statement may need to copy not only the

    data of an array, but dope vector contents as well.

    In a language that provides both a value model of variables and arrays of dynamic shape, we

    must consider the possibility that a record will contain a field whose size is not statically known.

    In this case the compiler may use dope vectors not only for dynamic shape arrays, but also for

    dynamic shape records. The dope vector for a record typically indicates the offset of each field

    from the beginning of the record.

    36. Discuss about String representations in programming languages In many languages, a string is simply an array of characters. In other languages, strings

    have special status, with operations that are not available for arrays of other sorts.

    It is easier to provide special features for strings than for arrays in general because strings are one-dimensional

    Manipulation of variable-length strings is fundamental to a huge number of computer applications

    Particularly powerful string facilities are found in various scripting languages such as Perl, Python and Ruby.

    C, Pascal, and Ada require that the length of a string-valued variable be bound no later than elaboration time, allowing contiguous space allocation in the current stack frame

    Lisp, Icon, ML, Java, C# allow the length of a string-valued variable to change over its lifetime, requiring that space be allocated by a block or chain of blocks in the heap

    37. Sets

    A set is an unordered collection of an arbitrary number of distinct values of a common type

    Introduced by Pascal, and are found in many more recent languages as well. Pascal supports sets of any discrete type, and provides union, intersection, and difference

    operations:

    var A,B,C :set of char;

    D,E : set of weekday;

    . A := B + C;

    A := B * C;

  • Principles of Programming Languages

    Page 41

    A := B - C;

    Many ways to implement sets, including arrays, hash tables, and various forms of trees The most common implementation employs a bit vector whose length (in bits) is the

    number of distinct values of the base type

    Operations on bit-vector sets can make use of fast logical instructions on most machines. Union is bit-wise or; intersection is bit-wise and; difference is bit-wise not, followed by

    bit-wise and

    38. Discuss the tradeoffs between Pointers and the Recursive Types that arise naturally in a language with a reference model of variables.

    A recursive type is one whose objects may contain one or more references to other objects of the

    type. Most recursive types are records, since they need to contain something in addition to the

    reference, implying the existence of heterogeneous fields. Recursive types are used to build a

    wide variety of linked data structures, including lists and trees.

    In some languages (Pascal, Ada 83,Modula-3) pointers are restricted to point only to objects in

    the heap. The only way to create a new pointer value is to call a built-in function that allocates a

    new object in the heap and returns a pointer to it. In other languages ( PL/I, Algol 68, C, C++,

    Ada 95) one can create a pointer to a nonheap object by using an address of operator.

    Syntax and Operations:-

    Operations on pointers include allocation and deallocation of objects in the heap, dereferencing

    of pointers to access the objects to which they point, and assignment of one pointer into another.

    The behavior of these operations depends heavily on whether the language is functional or

    imperative and on whether it employs a reference or value model for variables.

    Functional languages generally employ a reference model for names. Objects in a functional

    language tend to be allocated automatically as needed, with a structure determined by the

    language implementation. Variables in an imperative language may use either a value or a

    reference model, or some combination of the two. In C, Pascal, or Ada which employ a value

    model, the assignment A: = B puts the value of B into A. If we want B to refer to an object and

    we want A: = B to make A refer to the object to which B refers, then A and B must be pointers.

    Reference Model:

    In Lisp, which uses a reference model of variables but is not statically typed, tree could be

    specified textually as (# \ R (# \X ( ) ( ) ) ( # \ Y (# \ Z ( ) ( ) ) (# \ W ( ) ( ) ))).

  • Principles of Programming Languages

    Page 42

    [Implementation of a tree in Lisp, A diagonal slash through a box indicates a null pointer. The C

    and A tags serve to distinguish the two kinds of memory blocks:cons cells and blocks containing

    atoms ]

    In Pascal tree data types would be declared as follows:

    type chr_tree_ptr = ^chr_tree;

    chr_tree = record

    left,right : chr_tree_ptr;

    val : char

    end;

    In Ada:

    type chr_tree;

    type chr_tree_ptr is access chr_tree;

    type chr_tree is record

    left,right : chr_tree_ptr;

    val : character;

    end record;

    In C:

    struct chr_tree

    {

    struct chr_tree * left, *right;

    char val;

    };

    39. What are Dangling References? How are they created? When a heap allocated object is no longer live, a long running program needs to reclaim the

    objects space. Stack objects are reclaimed automatically as part of the subroutine calling

    sequence. There are two alternatives to reclaime heap objects. Languages like Pascal, C, and

    C++ require the programmer to reclaim an object explicitly.

    In Pascal:

    dispose(my_ptr);

    In C:

    free (my_ptr);

    In C++:

    delete my_ptr;

    A dangling reference is a live pointer that no longer points to a valid object.

    Dangling reference to a stack variable in C++:

    int i=3;

    int *p = &i;

  • Principles of Programming Languages

    Page 43

    . void foo( )

    {

    int n=5;

    p=&n;

    }

    .. cout

  • Principles of Programming Languages

    Page 44

    [ Reference counts and circular lists]

    41. Summarize the differences among mark- and sweep, stop- and-copy, pointer reversal and generational garbage collection.

    1). Mark and-Sweep: - The classic mechanism to identify useless blocks. It proceeds in there main steps, executed by the garbage collector when the amount of free space remaining in the

    heap falls below some minimum threshold.

    a). The collector walks through the heap, tentatively marking every block as useless.

    b). Beginning with all pointers outside the heap, the collector recursively explores all linked

    data structures in the program, marking each newly discovered block as useful.

    c). The collector again walks through the heap, moving every block that is still marked useless

    to the free list.

    2). Pointer Reversal :- When the collector explores the path to a given block, it reverses the

    pointers it follows, so that each points back to the previous block instead of forward to the next.

    As it explores, the collector keeps track of the current block and the block from where it came.

    3). Stop and Copy: - In a language with variable size heap blocks, the garbage collector can

    reduce external fragmentation by performing storage compaction. Many garbage collector

    employ a technique known as stop and copy that achieves compaction. Specifically they divide

    the heap into two regions of equal size. All allocation happens in the first half. When this half is

    full, the collector begins its exploration of reachable data structures. Each reachable blocks is

    copied into the second half of the heap, is overwritten with a useful flag and a pointer to the new

    location. Any other pointer that refers to the same block is set to point to the new location. When

    the collector finishes its exploration, all useful objects have been moved into the second half of

    the heap, and nothing in the first half is needed anymore. The collector can therefore swap its

    notion of first and second halves, and the program can continue.

    4). Generational collection: - The heap is divided into multiple regions. When space runs low the

    collector first examines the youngest region, which it assumes is likely to have the highest

  • Principles of Programming Languages

    Page 45

    proportion of garbage. Only if it is unable to reclaim sufficient space in this region does the

    collector examine the next older region. To avoid leaking storage in long running systems, the

    collector must be prepared, if necessary, to examine the entire heap. In most cases, however, the

    overhead of collection will be proportional to the size of the youngest region only.

    Any object that survives some small number of collections in its current region is promoted to

    the next older region, in a manner reminiscent of stop and copy. Promotion requires, of course,

    that pointers from old objects to new objects be updated to reflect the new locations. While such

    old space to new space pointers tends to be rare, a generational collector must be able to find

    them all quickly. At each pointer assignment, the compiler generates code to check whether the

    new value is an old to new pointer, if so it adds the pointer to a hidden list accessible to the

    collector.

    5). Conservative Collection:- When space runs low, the collector tentatively marks all blocks in

    the heap as useless. It then scans all word aligned quantities in the stack and in global storage. If

    any of these words appears to contain the address of something in the heap, the collector marks

    the block that contains that address as useful. Recursively, the collector then scans all word-

    aligned quantities in the block, and marks as useful any other blocks whose addresses are found

    therein. Finally the collector reclaims any blocks that are still marked useless.

    42. Why are Lists so heavily used in functional programming languages A list is defined recursively as either the empty list or a pair consisting of an object and another

    list. Lists are ideally suited to programming in functional and logic languages, which do most of

    their work via recursion and higher order functions. In Lisp, in fact a program is a list, and can

    extended itself at run time by constructing a list and executing it.

    Lists in ML and Lisp:

    Lists in ML are homogeneous: every element of the list must have the same type. Lisp

    lists, by contrast are heterogeneous: any object may be placed in a list, so long as it is

    never used in an inconsistent fashion. An ML list is usually a chain of blocks, each of

    which contains an element and a pointer to the next block. A Lisp list is a chain of cons

    cells, each of which contains two pointers, one to the element and one to the next cons

    cell.

    An ML list is enclosed in square brackets, with elements separated by commas:[a, b, c, d]

    A Lisp list is enclosed in parentheses, with elements separated by white space: (a b c d).

    In both cases, the notation represents a proper list- one whose innermost pair consists of

    the final element and the empty list. In Lisp it is also possible to construct an improper

    list, whose final pair contains two elements.

    The most fundamental operations on lists are those that construct them from their

    components or extract their components from them.

    In Lisp:

    ( cons a (b)) => (a b) (car (a b)) => a (car nil ) => ??

    ( cdr (a b c)) => (b c) (cdr (a)) => nil

  • Principles of Programming Languages

    Page 46

    (cdr nil) =>??

    (append (a b) (c d)) => (a b c d)

    Here we have used => to mean evaluates to. The car and cdr of the empty list (nil) are defined to be nil in Common Lisp.

    In ML the equivalent operations are written as follows:

    a :: [b] => [a, b]

    hd [a, b] => a

    hd [ ] => run-time exception

    t1 [a, b, c] => [b, c]

    t1 [a] => nil

    t1[ ] => run-time exception

    [a, b] @ [c, d] => [a, b, c, d]

    43. Discuss about Files and Input/Output Input/output facilities allow a program to communicate with the outside world. Interactive I/O

    generally implies communication with human users or physical devices, which work in parallel

    with the running program, and whose input to the program may depend on earlier output from

    the program. Files generally refer to off-line storage implemented by the operating system. Files

    may be further categorized into those that are temporary and those that are persistent. Temporary

    files exist for the duration of a single program run; their purpose is to store information that is

    too large to fit in the memory available to the program. Persistent files allow a program to read

    data that existed before the program began running, and to write data that will continue to exist

    after the program has ended. Some languages provide built in file data types and special syntactic

    constructs for I/O. Others relegate I/O entirely to library packages, which export a file type and a

    variety of input and output subroutines. The principal advantage of language integration is the

    ability to employ non-subroutine call syntax, and to perform operations that may not otherwise

    be available to library routines.

  • Principles of Programming Languages

    Page 47

    Unit-2

    44. Discuss about Static and Dynamic links In a language with nested subroutines and static scoping, objects that lie in surrounding

    subroutines, and that are thus neither local nor global, can be found by maintaining a static chain.

    Each stack frame contains a reference to the frame of the lexically surrounding subroutine. This

    reference is called the static link. By analogy, the saved value of the frame pointer, which will be

    restored on subroutine return, is called the dynamic link. The static and dynamic links may or

    may not be the same, depending on whether the current routine was called by its lexically

    surrounding routine, or by some other routine nested in that surrounding routine.

    Whether or not a subroutine is called directly by the lexically surrounding routine, we can be

    sure that the surrounding routine is active; there is no other way that the current routine could

    have been visible, allowing it to be called.

    If subroutine D is called directly from B, then clearly Bs frame will already be on the stack. D is nested inside of B, when control enters B that D comes into view. It can therefore be called by C,

    or by any other routine that is nested inside C or D, but only because these are also within B.

  • Principles of Programming Languages

    Page 48

    45. Calling Sequences

    Maintenance of the subroutine call stack is the responsibility of the calling sequence. Sometimes

    the term calling sequence is used to refer to the combined operations of the caller, the prologue,

    and the epilogue.

    Tasks that must be accomplished on the way into a subroutine include passing parameters,

    saving the return address, changing the program counter, changing the stack pointer to allocate

    space, saving registers that contain important values and that may be overwritten by the callee,

    changing the frame pointer to refer to the new frame, and executing initialization code for any

    objects in the new frame that require it. Tasks that must be accomplished on the way out include

    passing return parameters or function values, executing finalization code for any local objects