Top Banner
MC0073 – System Programming Book ID: B0811 Set-1 1. Describe the following with respect to Language Specification: A) Programming Language Grammars B) Classification of Grammars C) Binding and Binding Times Ans:1 Programming Language Grammars: The lexical and syntactic features of a programming language are specified by its grammar. This section discusses key concepts and notions from formal language grammars. A language L can be considered to be a collection of valid sentences. Each sentence can be looked upon as a sequence of words and each word as a sequence of letters or graphic symbols acceptable in L. A language specified in this manner is known as & formal language. A formal language grammar is a set of rules which precisely specify the sentences of L. It is clear that natural languages are not formal languages due to their rich vocabulary. However, PLs are formal languages. Terminal symbols, alphabet and strings: The alphabet of L, denoted by the Greek symbol , is the collection of symbols in its character set. We will use lower case letters a, b, c, etc. to denote symbols in . A symbol in the alphabet is known as a terminal symbol (T) of L. The alphabet can be represented using the mathematical notation of a set, e.g. = {a, b,… z, 0, l,… 9} Here the symbols {, ‘,’ and} are part of the notation. We call them metasymbols to differentiate them from terminal symbols. Throughout this discussion we assume that metasymbols are
40

MC0073 SEM3 SMU 2011

Nov 27, 2014

Download

Documents

Nitin Sivach

assignments
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Set-1

1. Describe the following with respect to Language Specification:

A) Programming Language Grammars B) Classification of

Grammars

C) Binding and Binding Times

Ans:1

Programming Language Grammars:

The lexical and syntactic features of a programming language are specified by its grammar. This section discusses key concepts and notions from formal language grammars. A language L can be considered to be a collection of valid sentences. Each sentence can be looked upon as a sequence of words and each word as a sequence of letters or graphic symbols acceptable in L. A language specified in this manner is known as & formal language. A formal language grammar is a set of rules which precisely specify the sentences of L. It is clear that natural languages are not formal languages due to their rich vocabulary. However, PLs are formal languages.

Terminal symbols, alphabet and strings:

The alphabet of L, denoted by the Greek symbol , is the collection of symbols in its character

set. We will use lower case letters a, b, c, etc. to denote symbols in . A symbol in the alphabet is known as a terminal symbol (T) of L. The alphabet can be represented using the mathematical

notation of a set, e.g. = {a, b,… z, 0, l,… 9}

Here the symbols {, ‘,’ and} are part of the notation. We call them metasymbols to differentiate them from terminal symbols. Throughout this discussion we assume that metasymbols are distinct from the terminal symbols. If this is not the case, i.e. if a terminal symbol and a meta symbol are identical, we enclose the terminal symbol in quotes to differentiate it from the meta symbol. For example, the set of punctuation symbols of English can be defined as where ‘,’ denotes the terminal symbol ‘comma’.

A string is a finite sequence of symbols. We will represent strings by Greek symbols a, (α, ß, γ etc. Thus α = axy is a string over ∑. The length of a string is the number of symbols in it. Note that the absence of any symbol is also a string, the null string ε. The concatenation operation combines two strings into a single string. It is used to build larger strings from existing strings. Thus, given two strings α and ß, concatenation of α with ß yields a string which is formed by putting the sequence of symbols forming α before the sequence of symbols forming ß. For example, if α = ab, ß = axy, then concatenation of α and ß, represented as α.ß or simply αß, gives the string abaxy. The null string can also participate in a concatenation, thus a.ε =ε.a = a.

Page 2: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Nonterminal symbols

A nonterminal symbol (NT) is the name of a syntax category of a language, e.g. noun, verb, etc. An NT is written as a single capital letter, or as a name enclosed between <…>, e.g. A or < Noun >. During grammatical analysis, a nonterminal symbol represents an instance of the category. Thus, < Noun > represents a noun.

Productions

A production, also called a rewriting rule, is a rule of the grammar. A production has the form A nonterminal symbol :: = String of Ts and NTs and defines the fact that the NT on the LHS of the production can be rewritten as the string of Ts and NTs appearing on the RHS. When an NT can be written as one of many different strings, the symbol ‘|’ (standing for ‘or’) is used to separate the strings on the RHS, e.g.

< Article > ::- a | an | the

The string on the RHS of a production can be a concatenation of component strings, e.g. the production < Noun Phrase > ::= < Article >< Noun >

expresses the fact that the noun phrase consists of an article followed by a noun.

Each grammar G defines a language lg. G contains an NT called the distinguished symbol or the start NT of G. Unless otherwise specified, we use the symbol S as the distinguished symbol of G. A valid string α of lg is obtained by using the following procedure

1. Let α= ‘S’.

2. While α is not a string of terminal symbols

(a) Select an NT appearing in α, say X.

(b) Replace X by a string appearing on the RHS of a production of X.

Example 1.3

Grammar (1.1) defines a language consisting of noun phrases in English

< Noun Phrase > :: = < Article > < Noun >

< Article > ::= a | an | the

<Noun> ::= boy | apple

< Noun Phrase > is the distinguished symbol of the grammar, the boy and an apple are some valid strings in the language.

Page 3: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Definition (Grammar)

A grammar G of a language lg is a quadruple where

is the alphabet of Lg, i.e. the set of Ts,

SNT is the set of NTs,

S is the distinguished symbol, and

P is the set of productions.

Classification of Grammars:

Grammars are classified on the basis of the nature of productions used in them (Chomsky, 1963). Each grammar class has its own characteristics and limitations.

Type – 0 Grammars

These grammars, known as phrase structure grammars, contain productions of the form

where both α and ß can be strings of Ts and NTs. Such productions permit arbitrary substitution of strings during derivation or reduction, hence they are not relevant to specification of programming languages.

Type – 1 grammars

These grammars are known as context sensitive grammars because their productions specify that derivation or reduction of strings can take place only in specific contexts. A Type-1 production has the form

Thus, a string in a sentential form can be replaced by ‘A’ (or vice versa) only

when it is enclosed by the strings . These grammars are also not particularly relevant for PL specification since recognition of PL constructs is not context sensitive in nature.

Type – 2 grammars

These grammars impose no context requirements on derivations or reductions. A typical Type-2 production is of the form

Page 4: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

which can be applied independent of its context. These grammars are therefore known as context free grammars (CFG). CFGs are ideally suited for programming language specification.

Type – 3 grammars

Type-3 grammars are characterized by productions of the form

A::= tB | t or A ::= Bt | t

Note that these productions also satisfy the requirements of Type-2 grammars. The specific form of the RHS alternatives—namely a single T or a string containing a single T and a single NT—gives some practical advantages in scanning.

Type-3 grammars are also known as linear grammars or regular grammars. These are further categorized into left-linear and right-linear grammars depending on whether the NT in the RHS alternative appears at the extreme left or extreme right.

Binding and Binding Times:

Definition: Binding: A binding is the association of an attribute of a program entity with a value.

Binding time is the time at which a binding is performed. Thus the type attribute of variable var is bound to type, when its declaration is processed. The size attribute of type is bound to a value sometime prior to this binding. We are interested in the following binding times:

1. Language definition time of L

2. Language implementation time of L

3. Compilation time of P

4. Execution init time of proc

5. Execution time of proc.

Where L is a programming language, P is a program written in L and proc is a procedure in P. Note that language implementation time is the time when a language translator is designed. The preceding list of binding times is not exhaustive; other binding times can be defined, viz. binding at the linking time of P. The language definition of L specifies binding times for the attributes of various entities of programs written in L.

Page 5: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

2. What is RISC and how it is different from the CISC?

Ans:2

CISC:

A Complex Instruction Set Computer (CISC) supplies a large number of complex instructions at the assembly language level. Assembly language is a low-level computer programming language in which each statement corresponds to a single machine instruction. CISC instructions facilitate the extensive manipulation of low-level computational elements and events such as memory, binary arithmetic, and addressing. The goal of the CISC architectural philosophy is to make microprocessors easy and flexible to program and to provide for more efficient memory use.

The CISC philosophy was unquestioned during the 1960s when the early computing machines such as the popular Digital Equipment Corporation PDP 11 family of minicomputers were being programmed in assembly language and memory was slow and expensive.

CISC machines merely used the then-available technologies to optimize computer performance. Their advantages included the following: (1) A new processor design could incorporate the instruction set of its predecessor as a subset of an ever-growing language–no need to reinvent the wheel, code-wise, with each design cycle. (2) Fewer instructions were needed to implement a particular computing task, which led to lower memory use for program storage and fewer time-consuming instruction fetches from memory. (3) Simpler compilers sufficed, as complex CISC instructions could be written that closely resembled the instructions of high-level languages. In effect, CISC made a computer’s assembly language more like a high-level language to begin with, leaving the compiler less to do.

Some disadvantages of the CISC design philosophy are as follows: (1) The first advantage listed above could be viewed as a disadvantage. That is, the incorporation of older instruction sets into new generations of processors tended to force growing complexity. (3) Many specialized CISC instructions were not used frequently enough to justify their existence. The existence of each instruction needed to be justified because each one requires the storage of more microcode at in the central processing unit (the final and lowest layer of code translation), which must be built in at some cost. (4) Because each CISC command must be translated by the processor into tens or even hundreds of lines of microcode, it tends to run slower than an equivalent series of simpler commands that do not require so much translation. All translation requires time. (4) Because a CISC machine builds complexity into the processor, where all its various commands must be translated into microcode for actual execution, the design of CISC hardware is more difficult and the CISC design cycle correspondingly long; this means delay in getting to market with a new chip.

The terms CISC and RISC (Reduced Instruction Set Computer) were coined at this time to reflect the widening split in computer-architectural philosophy.

RISC:

Page 6: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

The Reduced Instruction Set Computer, or RISC, is a microprocessor CPU design philosophy that favors a simpler set of instructions that all take about the same amount of time to execute. The most common RISC microprocessors are AVR, PIC, ARM, DEC Alpha, PA-RISC, SPARC, MIPS, and IBM’s PowerPC.

· RISC characteristics

- Small number of machine instructions : less than 150

- Small number of addressing modes : less than 4

- Small number of instruction formats : less than 4

- Instructions of the same length : 32 bits (or 64 bits)

- Single cycle execution

- Load / Store architecture

- Large number of GRPs (General Purpose Registers): more than 32

- Hardwired control

- Support for HLL (High Level Language).

RISC and x86

However, despite many successes,RISC has made few inroads into the desktop PC and commodity server markets, where Intel’s x86 platform remains the dominant processor architecture (Intel is facing increased competition from AMD, but even AMD’s processors implement the x86 platform, or a 64-bit superset known as x86-64). There are three main reasons for this. One, the very large base of proprietary PC applications are written for x86, whereas no RISC platform has a similar installed base, and this meant PC users were locked into the x86. The second is that, although RISC was indeed able to scale up in performance quite quickly and cheaply, Intel took advantage of its large market by spending vast amounts of money on processor development. Intel could spend many times as much as any RISC manufacturer on improving low level design and manufacturing. The same could not be said about smaller firms like Cyrix and NexGen, but they realized that they could apply pipelined design philosophies and practices to the x86-architecture – either directly as in the 6×86 and MII series, or indirectly (via extra decoding stages) as in Nx586 and AMD K5. Later, more powerful processors such as Intel P6 and AMD K6 had similar RISC-like units that executed a stream of micro-operations generated from decoding stages that split most x86 instructions into several pieces. Today, these

Page 7: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

principles have been further refined and are used by modern x86 processors such as Intel Core 2 and AMD K8. The first available chip deploying such techniques was the NexGen Nx586, released in 1994 (while the AMD K5 was severely delayed and released in 1995). As of 2007, the x86 designs (whether Intel’s or AMD’s) are as fast as (if not faster than) the fastest true RISC single-chip solutions available.

RISC VS CISC :

CISC RISCEmphasis on hardware Emphasis on software Includes multi-clock

complex instructions

Single-clock,

reduced instruction only Memory-to-memory:

"LOAD" and "STORE"

incorporated in instructions

Register to register:

"LOAD" and "STORE"

are independent instructions Small code sizes,

high cycles per second

Low cycles per second,

large code sizes Transistors used for storing

complex instructions

Spends more transistors

on memory registers

Fig. 5.0 (table for CISC VS RISC)

3. Explain the following with respect to the design specifications of an Assembler:

A) Data Structures B) pass1 & pass2 Assembler flow chart

Ans:3

Data Structure

The second step in our design procedure is to establish the databases that we have to work with.

Pass 1 Data Structures

Page 8: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

1. Input source program

2. A Location Counter (LC), used to keep track of each instruction’s location.

3. A table, the Machine-operation Table (MOT), that indicates the symbolic mnemonic, for each instruction and its length (two, four, or six bytes)

4. A table, the Pseudo-Operation Table (POT) that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 1.

5. A table, the Symbol Table (ST) that is used to store each label and its corresponding value.

6. A table, the literal table (LT) that is used to store each literal encountered and its corresponding assignment location.

7. A copy of the input to be used by pass 2.

Pass 2 Data Structures

1. Copy of source program input to pass1.

2. Location Counter (LC)

3. A table, the Machine-operation Table (MOT), that indicates for each instruction, symbolic mnemonic, length (two, four, or six bytes), binary machine opcode and format of instruction.

4. A table, the Pseudo-Operation Table (POT), that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 2.

5. A table, the Symbol Table (ST), prepared by pass1, containing each label and corresponding value.

6. A Table, the base table (BT), that indicates which registers are currently specified as base registers by USING pseudo-ops and what the specified contents of these registers are.

7. A work space INST that is used to hold each instruction as its various parts are being assembled together.

8. A work space, PRINT LINE, used to produce a printed listing.

9. A work space, PUNCH CARD, used prior to actual outputting for converting the assembled instructions into the format needed by the loader.

10. An output deck of assembled instructions in the format needed by the loader.

Page 9: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Fig. 1.3: Data structures of the assembler

Format of Data Structures

The third step in our design procedure is to specify the format and content of each of the data structures. Pass 2 requires a machine operation table (MOT) containing the name, length, binary code and format; pass 1 requires only name and length. Instead of using two different tables, we construct single (MOT). The Machine operation table (MOT) and pseudo-operation table are example of fixed tables. The contents of these tables are not filled in or altered during the assembly process.

The following figure depicts the format of the machine-op table (MOT)

—————————————– 6 bytes per entry ———————————–

Mnemonic Opcode (4bytes) characters

Binary Opcode (1byte) (hexadecimal)

Instruction length

(2 bits) (binary)

Instruction format

(3bits) (binary)

Not used here

(3 bits)“Abbb” 5A 10 001  

“Ahbb” 4A 10 001  

“ALbb” 5E 10 001  

“ALRB” 1E 01 000  

……. ……. ……. …….  

‘b’ represents “blank”

Fig.: 1.4

Page 10: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

3.3.2 The flowchart for Pass – 1:

The primary function performed by the analysis phase is the building of the symbol table. For this purpose it must determine the addresses with which the symbol names used in a program are associated. It is possible to determine some address directly, e.g. the address of the first instruction in the program, however others must be inferred.

To implement memory allocation a data structure called location counter (LC) is introduced. The location counter is always made to contain the address of the next memory word in the target program. It is initialized to the constant. Whenever the analysis phase sees a label in an assembly statement, it enters the label and the contents of LC in a new entry of the symbol table. It then finds the number of memory words required by the assembly statement and updates; the LC contents. This ensure: that LC points to the next memory word in the target program even when machine instructions have different lengths and DS/DC statements reserve different amounts of memory. To update the contents of LC, analysis phase needs to know lengths of different instructions. This information simply depends on the assembly language hence the mnemonics table can be extended to include this information in a new field called length. We refer to the processing involved in maintaining the location counter as LC processing.

Page 11: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Flow hart for Pass – 2

Fig. 1.6: Pass2 flowchart

4. Define the following,

A) Parsing

B) Scanning

C) Token

Ans:4

Parsing:

Parsing transforms input text or string into a data structure, usually a tree, which is suitable for later processing and which captures the implied hierarchy of the input. Lexical analysis creates tokens from a sequence of input characters and it is these tokens that are processed by a parser to build a data structure such as parse tree or abstract syntax trees.

Conceptually, the parser accepts a sequence of tokens and produces a parse tree. In practice this might not occur.

1. The source program might have errors. Shamefully, we will do very little error handling.

Page 12: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

2. Real compilers produce (abstract) syntax trees not parse trees (concrete syntax trees). We don’t do this for the pedagogical reasons given previously.

There are three classes for grammar-based parsers.

1. Universal

2. Top-down

3. Bottom-up

The universal parsers are not used in practice as they are inefficient; we will not discuss them.

As expected, top-down parsers start from the root of the tree and proceed downward; whereas, bottom-up parsers start from the leaves and proceed upward. The commonly used top-down and bottom parsers are not universal. That is, there are (context-free) grammars that cannot be used with them.

The LL and LR parsers are important in practice. Hand written parsers are often LL. Specifically, the predictive parsers we looked at in chapter two are for LL grammars. The LR grammars form a larger class. Parsers for this class are usually constructed with the aid of automatic tools.

Scanning and token:

There are three phases of analysis with the output of one phase the input of the next. Each of these phases changes the representation of the program being compiled. The phases are called lexical analysis or scanning, which transforms the program from a string of characters to a string of tokens. Syntax Analysis or Parsing, transforms the program into some kind of syntax tree; and Semantic Analysis, decorates the tree with semantic information.

The character stream input is grouped into meaningful units called lexemes, which are then mapped into tokens, the latter constituting the output of the lexical analyzer.

For example, any one of the following C statements

x3 = y + 3;x3 = y + 3 ;x3 = y+ 3 ;

but not

x 3 = y + 3;

would be grouped into the lexemes x3, =, y, +, 3, and ;. A token is a

Page 13: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

<token-name,attribute-value> pair. The hierarchical decomposition above sentence is given figure 1.0

Fig. 1.0

A token is a <token-name, attribute-value> pair.

For example

1. The lexeme x3 would be mapped to a token such as <id,1>. The name id is short for identifier. The value 1 is the index of the entry for x3 in the symbol table produced by the compiler. This table is used gather information about the identifiers and to pass this information to subsequent phases.

2. The lexeme = would be mapped to the token <=>. In reality it is probably mapped to a pair, whose second component is ignored. The point is that there are many different identifiers so we need the second component, but there is only one assignment symbol =.

3. The lexeme y is mapped to the token <id,2>

4. The lexeme + is mapped to the token <+>.

5. The number 3 is mapped to <number, something>, but what is the something. On the one hand there is only one 3 so we could just use the token <number,3>.

6. However, there can be a difference between how this should be printed (e.g., in an error message produced by subsequent phases) and how it should be stored (fixed vs. float vs. double). Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored. Another possibility is to have a separate numbers table.

7. The lexeme ; is mapped to the token <;>.

Note, non-significant blanks are normally removed during scanning. In C, most blanks are non-significant. That does not mean the blanks are unnecessary. Consider

        int x;        intx;

Page 14: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

The blank between int and x is clearly necessary, but it does not become part of any token. Blanks inside strings are an exception, they are part of the token (or more likely the table entry pointed to by the second component of the token).

Note that we can define identifiers, numbers, and the various symbols and punctuation without using recursion (compare with parsing below).

Parsing involves a further grouping in which tokens are grouped into grammatical phrases, which are often represented in a parse tree.

5. Describe the process of Bootstrapping in the context of Linkers

Ans:5

Boot straping:

In computing, bootstrapping refers to a process where a simple system activates another more complicated system that serves the same purpose. It is a solution to the Chicken-and-egg problem of starting a certain system without the system already functioning. The term is most often applied to the process of starting up a computer, in which a mechanism is needed to execute the software program that is responsible for executing software programs (the operating system).

Bootstrap loading:

The discussions of loading up to this point have all presumed that there’s already an operating system or at least a program loader resident in the computer to load the program of interest. The chain of programs being loaded by other programs has to start somewhere, so the obvious question is how is the first program loaded into the computer?

In modern computers, the first program the computer runs after a hardware reset invariably is stored in a ROM known as bootstrap ROM. as in "pulling one’s self up by the bootstraps." When the CPU is powered on or reset, it sets its registers to a known state. On x86 systems, for example, the reset sequence jumps to the address 16 bytes below the top of the system’s address space. The bootstrap ROM occupies the top 64K of the address space and ROM code then starts up the computer. On IBM-compatible x86 systems, the boot ROM code reads the first block of the floppy disk into memory, or if that fails the first block of the first hard disk, into memory location zero and jumps to location zero. The program in block zero in turn loads a slightly larger operating system boot program from a known place on the disk into memory, and jumps to that program which in turn loads in the operating system and starts it. (There can be even more steps, e.g., a boot manager that decides from which disk partition to read the operating system boot program, but the sequence of increasingly capable loaders remains.)

Page 15: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Why not just load the operating system directly? Because you can’t fit an operating system loader into 512 bytes. The first level loader typically is only able to load a single-segment program from a file with a fixed name in the top-level directory of the boot disk. The operating system loader contains more sophisticated code that can read and interpret a configuration file, uncompress a compressed operating system executable, address large amounts of memory (on an x86 the loader usually runs in real mode which means that it’s tricky to address more than 1MB of memory.) The full operating system can turn on the virtual memory system, loads the drivers it needs, and then proceed to run user-level programs.

Many Unix systems use a similar bootstrap process to get user-mode programs running. The kernel creates a process, then stuffs a tiny little program, only a few dozen bytes long, into that process. The tiny program executes a system call that runs /etc/init, the user mode initialization program that in turn runs configuration files and starts the daemons and login programs that a running system needs.

None of this matters much to the application level programmer, but it becomes more interesting if you want to write programs that run on the bare hardware of the machine, since then you need to arrange to intercept the bootstrap sequence somewhere and run your program rather than the usual operating system. Some systems make this quite easy (just stick the name of your program in AUTOEXEC.BAT and reboot Windows 95, for example), others make it nearly impossible. It also presents opportunities for customized systems. For example, a single-application system could be built over a Unix kernel by naming the application /etc/init.

Software Bootstraping & Compiler Bootstraping:

Bootstrapping can also refer to the development of successively more complex, faster programming environments. The simplest environment will be, perhaps, a very basic text editor (e.g. ed) and an assembler program. Using these tools, one can write a more complex text editor, and a simple compiler for a higher-level language and so on, until one can have a graphical IDE and an extremely high-level programming language

Compiler Bootstraping:

In compiler design, a bootstrap or bootstrapping compiler is a compiler that is written in the target language, or a subset of the language, that it compiles. Examples include gcc, GHC, OCaml, BASIC, PL/I and more recently the Mono C# compiler.

Page 16: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

6. Describe the procedure for design of a Linker.

Ans:6

Design of a linker

Relocation and linking requirements in segmented addressing

The relocation requirements of a program are influenced by the addressing structure of the computer system on which it is to execute. Use of the segmented addressing structure reduces the relocation requirements of program.

A Linker for MS-DOS

Example 7.7: Consider the program of written in the assembly language of intel 8088. The ASSUME statement declares the segment registers CS and DS to the available for memory addressing. Hence all memory addressing is performed by using suitable displacements from their contents. Translation time address o A is 0196. In statement 16, a reference to A is assembled as a displacement of 196 from the contents of the CS register. This avoids the use of an absolute address, hence the instruction is not address sensitive. Now no relocation is needed if segment SAMPLE is to be loaded with address 2000 by a calling program (or by the OS). The effective operand address would be calculated as <CS>+0196, which is the correct address 2196. A similar situation exists with the reference to B in statement 17. The reference to B is assembled as a displacement of 0002 from the contents of the DS register. Since the DS register would be loaded with the execution time address of DATA_HERE, the reference to B would be automatically relocated to the correct address.

Though use of segment register reduces the relocation requirements, it does not completely eliminate the need for relocation. Consider statement 14 .

MOV AX, DATA_HERE

Page 17: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Which loads the segment base of DATA_HERE into the AX register preparatory to its transfer into the DS register . Since the assembler knows DATA_HERE to be a segment, it makes provision to load the higher order 16 bits of the address of DATA_HERE into the AX register. However it does not know the link time address of DATA_HERE, hence it assembles the MOV instruction in the immediate operand format and puts zeroes in the operand field. It also makes an entry for this instruction in RELOCTAB so that the linker would put the appropriate address in the operand field. Inter-segment calls and jumps are handled in a similar way.

Relocation is somewhat more involved in the case of intra-segment jumps assembled in the FAR format. For example, consider the following program :

FAR_LAB EQU THIS FAR ; FAR_LAB is a FAR label

JMP FAR_LAB ; A FAR jump

Here the displacement and the segment base of FAR_LAB are to be put in the JMP instruction itself. The assembler puts the displacement of FAR_LAB in the first two operand bytes of the instruction , and makes a RELOCTAB entry for the third and fourth operand bytes which are to hold the segment base address. A segment like

ADDR_A DW OFFSET A

(which is an ‘address constant’) does not need any relocation since the assemble can itself put the required offset in the bytes. In summary, the only RELOCATAB entries that must exist for a program using segmented memory addressing are for the bytes that contain a segment base address.

For linking, however both segment base address and offset of the external symbol must be computed by the linker. Hence there is no reduction in the linking requirements.

Page 18: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Set-21. Discuss the various Addressing mode for CISC.

Ans:1

Addressing Modes of CISC :

The 68000 addressing (Motorola) modes

· Register to Register,

· Register to Memory,

· Memory to Register, and

· Memory to Memory

68000 Supports a wide variety of addressing modes.

· Immediate mode –- the operand immediately follows the instruction

· Absolute address – the address (in either the "short" 16-bit form or "long" 32-bit form) of the operand immediately follows the instruction

· Program Counter relative with displacement – A displacement value is added to the program counter to calculate the operand’s address. The displacement can be positive or negative.

· Program Counter relative with index and displacement – The instruction contains both the identity of an "index register" and a trailing displacement value. The contents of the index register, the displacement value, and the program counter are added together to get the final address.

· Register direct – The operand is contained in an address or data register.

· Address register indirect – An address register contains the address of the operand.

· Address register indirect with predecrement or postdecrement – An address register contains the address of the operand in memory. With the predecrement option set, a predetermined value is subtracted from the register before the (new) address is used. With the postincrement option set, a predetermined value is added to the register after the operation completes.

· Address register indirect with displacement — A displacement value is added to the register’s contents to calculate the operand’s address. The displacement can be positive or negative.

· Address register relative with index and displacement — The instruction contains both the identity of an "index register" and a trailing displacement value. The contents of the index

Page 19: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

register, the displacement value, and the specified address register are added together to get the final address.

2. Write about Deterministic and Non-Deterministic Finite Automata with suitable

numerical examples.

Ans:2

Deterministic finite automata (DFA) :

A deterministic finite automaton (DFA) is a 5-tuple: (S, Σ, T, s, A)

· an alphabet (Σ)

· a set of states (S)

· a transition function (T : S × Σ → S).

· a start state (s S)

· a set of accept states (A S)

The machine starts in the start state and reads in a string of symbols from its alphabet. It uses the transition function T to determine the next state using the current state and the symbol just read. If, when it has finished reading, it is in an accepting state, it is said to accept the string, otherwise it is said to reject the string. The set of strings it accepts form a language, which is the language the DFA recognizes.

Non-Deterministic Finite Automaton (N-DFA):

A Non-Deterministic Finite Automaton (NFA) is a 5-tuple: (S, Σ, T, s, A)

· an alphabet (Σ)

· a set of states (S)

· a transition function (T: S × Σ → S).

· a start state (s S)

· a set of accept states (A S)

Page 20: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Where P(S) is the power set of S and ε is the empty string. The machine starts in the start state and reads in a string of symbols from its alphabet. It uses the transition relation T to determine the next state(s) using the current state and the symbol just read or the empty string. If, when it has finished reading, it is in an accepting state, it is said to accept the string, otherwise it is said to reject the string. The set of strings it accepts form a language, which is the language the NFA recognizes.

3. Write a short note on:

A) C Preprocessor for GCC version 2

B) Conditional Assembly

Ans:3

The C Preprocessor for GCC version 2

The C preprocessor is a macro processor that is used automatically by the C compiler to transform your program before actual compilation. It is called a macro processor because it allows you to define macros, which are brief abbreviations for longer constructs.

The C preprocessor provides four separate facilities that you can use as you see fit:

· Inclusion of header files. These are files of declarations that can be substituted into your program.

· Macro expansion. You can define macros, which are abbreviations for arbitrary fragments of C code, and then the C preprocessor will replace the macros with their definitions throughout the program.

· Conditional compilation. Using special preprocessing directives, you can include or exclude parts of the program according to various conditions.

· Line control. If you use a program to combine or rearrange source files into an intermediate file which is then compiled, you can use line control to inform the compiler of where each source line originally came from.

ANSI Standard C requires the rejection of many harmless constructs commonly used by today’s C programs. Such incompatibility would be inconvenient for users, so the GNU C preprocessor is configured to accept these constructs by default. Strictly speaking, to get ANSI Standard C, you must use the options `-trigraphs’, `-undef’ and `-pedantic’, but in practice the consequences of having strict ANSI Standard C make it undesirable to do this.

Page 21: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Conditional Assembly :

Means that some sections of the program may be optional, either included or not in the final program, dependent upon specified conditions. A reasonable use of conditional assembly would be to combine two versions of a program, one that prints debugging information during test executions for the developer, another version for production operation that displays only results of interest for the average user. A program fragment that assembles the instructions to print the Ax register only if Debug is true is given below. Note that true is any non-zero value.

Here is a conditional statements in C programming, the following statements tests the expression `BUFSIZE == 1020′, where `BUFSIZE’ must be a macro.

#if BUFSIZE == 1020

printf ("Large buffers!n");

#endif /* BUFSIZE is large */

4. Write about different Phases of Compilation.

Ans:4

Phases of Compiler

A compiler takes as input a source program and produces as output an equivalent sequence of machine instructions. This process is so complex that it is not reasonable, either from a logical point of view or from an implementation point of view, to consider the compilation process as occurring in one single step. For this reason, it is customary to partition the compilation process into a series of sub processes called phases, as shown in the Fig 1.2. A phase is a logically cohesive operation that takes as input one representation of the source program and produces as output another representation.

Page 22: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

The first phase, called the lexical analyzer, or scanner, separates characters of the source language into groups that logically belong together; these groups are called tokens. The usual tokens are keywords, such as DO or IF identifiers, such as X or NUM, operator symbols such as < = or +, and punctuation symbols such as parentheses or commas. The output of the lexical analyzer is a stream of tokens, which is passes to the next phase, the syntax analyzer, or parser. The tokens in this stream can be represented by codes which we may regard as integers. Thus, DO might be represented by 1, + by 2, and “identifier” by 3. In the case of a token like ‘identifier”, a second quantity, telling which of those identifiers used by the program is represented by this instance of token “identifier”, is passed along with the integer code for “identifier”.

The syntax analyzer groups tokens together into syntactic structures. For example, the three tokens representing A + B might be grouped into a syntactic structure called an expression. Expressions might further be combined to form statements. Often the syntactic structure can be regarded as a tree whose leaves are the tokens. The interior nodes of the tree represent strings of tokens that logically belong together.

The intermediate code generator uses the structure produced by the syntax analyzer to create a stream of simple instructions. Many styles of intermediate code are possible. One common style uses instructions with one operator and a small number of operands. These instructions can be viewed as simple macros like the macro ADD2. The primary difference between intermediate code and assembly code is that the intermediate code need not specify the registers to be used for each operation.

Code Optimization is an optional phase designed to improve the intermediate code so that the ultimate object program runs faster and / or takes less space. Its output is another intermediate code program that does the same job as the original, but perhaps in a way that saves time and / or space.

The final phase, code generation, produces the object code by deciding on the memory locations for data, selecting code to access each datum, and selecting the registers in which each

Page 23: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

computation is to be done. Designing a code generator that produces truly efficient object programs is one of the most difficult parts of compiler design, both practically and theoretically.

The Table-Management, or bookkeeping, portion of the compiler keeps track of the names used by the program and records essential information about each, such as its type (integer, real, etc). The data structure used to record this information is called a Symbol table.

The Error Handler is invoked when a flaw in the source program is detected. It must warn the programmer by issuing a diagnostic, and adjust the information being passed from phase to phase so that each phase can proceed. It is desirable that compilation be completed on flawed programs, at least through the syntax-analysis phase, so that as many errors as possible can be detected in one compilation. Both the table management and error handling routines interact with all phases of the compiler.

5. What is MACRO? Discuss its Expansion in detail with the suitable example.

Ans:5

Macro definition and Expansion

Definition : macro

A macro name is an abbreviation, which stands for some related lines of code. Macros are useful for the following purposes:

· To simplify and reduce the amount of repetitive coding

· To reduce errors caused by repetitive coding

· To make an assembly program more readable.

A macro consists of name, set of formal parameters and body of code. The use of macro name with set of actual parameters is replaced by some code generated by its body. This is called macro expansion.

Macros allow a programmer to define pseudo operations, typically operations that are generally desirable, are not implemented as part of the processor instruction, and can be implemented as a sequence of instructions. Each use of a macro generates new program instructions, the macro has the effect of automating writing of the program.

Page 24: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Macros can be defined used in many programming languages, like C, C++ etc. Example macro in C programming.Macros are commonly used in C to define small snippets of code. If the macro has parameters, they are substituted into the macro body during expansion; thus, a C macro can mimic a C function. The usual reason for doing this is to avoid the overhead of a function call in simple cases, where the code is lightweight enough that function call overhead has a significant impact on performance.

For instance,

#define max (a, b) a>b? A: b

Defines the macro max, taking two arguments a and b. This macro may be called like any C function, using identical syntax. Therefore, after preprocessing

z = max(x, y);

Becomes z = x>y? X:y;

While this use of macros is very important for C, for instance to define type-safe generic data-types or debugging tools, it is also slow, rather inefficient, and may lead to a number of pitfalls.

C macros are capable of mimicking functions, creating new syntax within some limitations, as well as expanding into arbitrary text (although the C compiler will require that text to be valid C source code, or else comments), but they have some limitations as a programming construct. Macros which mimic functions, for instance, can be called like real functions, but a macro cannot be passed to another function using a function pointer, since the macro itself has no address.

In programming languages, such as C or assembly language, a name that defines a set of commands that are substituted for the macro name wherever the name appears in a program (a process called macro expansion) when the program is compiled or assembled. Macros are similar to functions in that they can take arguments and in that they are calls to lengthier sets of instructions. Unlike functions, macros are replaced by the actual commands they represent when the program is prepared for execution. function instructions are copied into a program only once.

Macro Expansion.

A macro call leads to macro expansion. During macro expansion, the macro statement is replaced by sequence of assembly statements.

Page 25: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

Figure 1.1 Macro expansion on a source program.

Example

In the above program a macro call is shown in the middle of the figure. i.e. INITZ. Which is called during program execution. Every macro begins with MACRO keyword at the beginning and ends with the ENDM (end macro).when ever a macro is called the entire is code is substituted in the program where it is called. So the resultant of the macro code is shown on the right most side of the figure. Macro calling in high level programming languages

(C programming)

#define max(a,b) a>b?a:b

Main () {

int x , y;

x=4; y=6;

z = max(x, y); }

The above program was written using C programing statements. Defines the macro max, taking two arguments a and b. This macro may be called like any C function, using identical syntax. Therefore, after preprocessing

Becomes z = x>y ? x: y;

After macro expansion, the whole code would appear like this.

#define max(a,b) a>b?a:b

Page 26: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

main()

{ int x , y;

x=4; y=6;z = x>y?x:y; }

6. Describe the following with respect to Software Tools for Program Development:

A) Compilers B) Editors

C) Debuggers D) Interpreters

Ans:6

Compiler :

A compiler is a computer program (or set of programs) that translates text written in a computer language (the source language) into another computer language (the target language). The original sequence is usually called the source code and the output called object code. Commonly the output has a form suitable for processing by other programs (e.g., a linker), but it may be a human-readable text file.

Compiler Backend:

While there are applications where only the compiler frontend is necessary, such as static language verification tools, a real compiler hands the intermediate representation generated by the frontend to the backend, which produces a functional equivalent program in the output language. This is done in multiple steps:

1. Optimization – the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms.

2. Code Generation – the transformed intermediate language is translated into the output language, usually the native machine language of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions.

Typical compilers output so called objects, which basically contain machine code augmented by information about the name and location of entry points and external calls (to functions not contained in the object). A set of object files, which need not have all come from a single compiler, may then be linked together to create the final executable which can be run directly by a user.Compiler frontend

Page 27: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

The compiler frontend consists of multiple phases in itself, each informed by formal language theory:

1. Scanning – breaking the source code text into small pieces, tokens - sometimes called ‘terminals’ - each representing a single piece of the language, for instance a keyword, identifier or symbol names. The token language is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it.

2. Parsing – Identifying syntactic structures - so called ‘non-terminals’ - constructed from one or more tokens and non-terminals, representing complicated language elements, for instance assignments, conditions and loops. This is typically done with a parser for a context-free grammar, often an LL parser or LR parser from a parser generator. (Most programming languages are only almost context-free, so there’s often some extra logic hacked in.)

3. Intermediate Language Generation – an equivalent to the original program is created in a special purpose intermediate language

10.1.2 Editor :

An Editor is a software tool for editing something, i.e. introducing changes into some text, graphics or programme.

· HTML editor

· text editor

· source code editor

· graphics editor

· game level editor

· game character editor

· word processor, more complex text-producing tools

10.1.3 Debuggers:

A Debugger is a computer program that is used to test and debug other programs. The code to be examined might alternatively be running on an instruction set simulator (ISS), a technique that allows great power in its ability to halt when specific conditions are encountered but which will typically be much slower than executing the code directly on the appropriate processor.

When the program crashes, the debugger shows the position in the original code if it is a source-level debugger or symbolic debugger, commonly seen in integrated development environments. If it is a low-level debugger or a machine-language debugger it shows the line in the

Page 28: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

disassembly. (A "crash" happens when the program cannot continue because of a programming bug. For example, perhaps the program tried to use an instruction not available on the current version of the CPU or attempted access to unavailable or protected memory.)

Typically, debuggers also offer more sophisticated functions such as running a program step by step (single-stepping), stopping (breaking) (pausing the program to examine the current state) at some kind of event by means of breakpoint, and tracking the values of some variables. Some debuggers have the ability to modify the state of the program while it is running, rather than merely to observe it.

The importance of a good debugger cannot be overstated. Indeed, the existence and quality of such a tool for a given language and platform can often be the deciding factor in its use, even if another language/platform is better-suited to the task. However, it is also important to note that software can (and often does) behave differently running under a debugger than normally, due to the inevitable changes the presence of a debugger will make to a software program’s internal timing. As a result, even with a good debugging tool, it is often very difficult to track down runtime problems in complex multi-threaded or distributed systems.

List of Debuggers:

· CodeView

· DAEDALUS

· DBG - A PHP Debugger and Profiler

· Xdebug - PHP Debugger, [1]

· dbx

· DDD, Data Display Debugger

· Ddbg - Win32 Debugger for the D Programming Language

· DEBUG DOS Command

· Dynamic debugging technique (DDT), and its octal counterpart Octal Debugging Technique

· Eclipse

· gDEBugger is a commercial OpenGL debugger and OpenGL ES debugger. Real time GPU debugger and analysis tool provided by Graphic Remedy. Available for Windows and Linux

· GoBug symbolic debugger for Windows

· GNU Debugger (GDB)

Page 29: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811

· Insight

· Interactive Disassembler (IDA Pro)

· Java Platform Debugger Architecture

· JSwat, open-source Java debugger

· MacsBug

· OLIVER (CICS interactive test/debug)

· OllyDbg

· IBM Rational Purify

· sdb

· SIMMON (Simulation Monitor)

· SIMON (Batch Interactive test/debug)

10.1.4 Interpreter :

In computer science, an interpreter is a computer program that executes, or performs, instructions written in a computer programming language. Interpretation is one of the two major ways in which a programming language can be implemented, the other being compilation. The term interpreter may refer to a program that executes source code that has already been translated to some intermediate form, or it may refer to the program that performs both the translation and execution (e.g., many BASIC implementations).

An interpreter needs to be able to analyze, or parse, instructions written in the source language. It also needs to represent any program state, such as variables or data structures, that a program may create. It needs to be able to move around in the source code when instructed to do so by control flow constructs such as loops or conditionals. Finally, it usually needs to interact with an environment, such as by doing input/output with a terminal or other user interface device.

Page 30: MC0073 SEM3 SMU 2011

MC0073 – System ProgrammingBook ID: B0811