System Programming

Spring 2012 Master of Computer Application (MCA) – Semester III MC0073 – Systems Programming– 4 Credits (Book ID: B0811) Assignment Set – 1 (60 Marks) Answer all questions 1. Consider the following C Language program and list out the outcomes of: Lexical Analysis

Syntactic Analysis

Semantic Analysis phases respectively (10 marks)

main() { int a, b, c,d; printf(“ enter the value of a”, &a); printf(“ enter the value of a”, &b); if ( a > b) { c = a+b; printf ( “%d” %d” “”,c); } else { d=a+b; printf ( “%d” %d” “”,d); } } 2. What is the limitation of conventional pass-1 pass-2 compilation? How do you overcome it? (5 marks)

3. Identify the following notations and define them with examples: (10 Marks) L, , T, NT, and G define them with examples

The alphabet of L, denoted by the Greek symbol , is the collection of symbols in its character

set. We will use lower case letters a, b, c, etc. to denote symbols in . A symbol in the alphabet is known as a terminal symbol (T) of L. The alphabet can be represented using the mathematical

notation of a set, e.g. = {a, b,… z, 0, l,… 9}

Here the symbols {, ‘,’ and} are part of the notation. We call them metasymbols to differentiate them from terminal symbols. Throughout this discussion we assume that metasymbols are distinct from the terminal symbols. If this is not the case, i.e. if a terminal symbol and a meta symbol are identical, we enclose the terminal symbol in quotes to differentiate it from the meta symbol. For example, the set of punctuation symbols of English can be defined as where ‘,’ denotes the terminal symbol ‘comma’.

A nonterminal symbol (NT) is the name of a syntax category of a language, e.g. noun, verb, etc. An NT is written as a single capital letter, or as a name enclosed between <…>, e.g. A or < Noun >. During grammatical analysis, a nonterminal symbol represents an instance of the category. Thus, < Noun > represents a noun.

http://train-srv.manipalu.com/wpress/wp-content/uploads/2010/01/clip-image01232.jpg



Each grammar G defines a language lg. G contains an NT called the distinguished symbol or the start NT of G. Unless otherwise specified, we use the symbol S as the distinguished symbol of G.

Identify the basic elements of Grammar G

Each grammar G defines a language lg. G contains an NT called the distinguished symbol or the start NT of G. Unless otherwise specified, we use the symbol S as the distinguished symbol of G. A valid string α of lg is obtained by using the following procedure

1. Let α= ‘S’.

2. While α is not a string of terminal symbols

(a) Select an NT appearing in α, say X.

(b) Replace X by a string appearing on the RHS of a production of X.

What is sentinel form? Give an example

4. Classify and define Grammars. Which Grammar is best suitable for Programming Languages and why? ( 5 Marks)

5. How many characters can be represented by ASCII-8 data format? What is the limitation of ASCII-7 format? (5 Marks)

The code called ASCII (pronounced "AS-key"), which stands for American Standard Code for Information Interchange, uses 7 bits for each character. Since there are exactly 128 unique combinations of 7 bits, this 7-bit code can represent only characters. A more common version is ASCII-8, also called extended ASCII, which uses 8 bits per character and can represent 256 different characters. For example, the letter A is represented by 01000001. The ASCII representation has been adopted as a standard by the U.S. government and is found in a variety of computers, particularly minicomputers and microcomputers. The following table shows part of the ASCII-8 code. Note that the byte:

01000011

does represent the character 'C'.

Ascii7 is a Unicode-to-ASCII conversion module for programmers. It converts any Unicode string to 7-bit ASCII preserving information. Available as a source code module, Ascii7 is an easy way to support good Unicode-to-ASCII conversion in your own applications.

Key features. Convert Unicode strings to 7-bit US-ASCII. Drop diacritics. Remove accents and umlauts. Replace special symbols with pure ASCII. Convert Cyrillic and Greek letters to their Latin equivalents. Get rid of gargabe conversion.

Available as source code for:Visual Basic 6.0Visual Basic .NetVisual Basic for ApplicationsOther languages can be arranged on demand. Please inquire.

The problem. Today's applications support a large range of Unicode characters. However, compatibility often requires the use of 7-bit ASCII. Character values must be forced to the 0–127 range. What's the best way to convert Unicode text to ASCII? Programming environments, such as Visual Basic and the .Net framework, have lacking support for proper conversion. Even where available, conversion loses some non-ASCII characters and converts them to question marks (?). The result is loss of information and garbage text.

The solution. This is where Ascii7 comes to help. Ascii7 converts Unicode text to its ASCII representation. Instead of turning non-ASCII characters to garbage, it provides a meaningful conversion. It does this by dropping diacritics from Latin letters and finding the closest ASCII equivalent for a wide range of characters.Where an exact match is not possible, a reasonable equivalent is used. The text stays as intelligible as possible for a human reader.

Suggested uses. Enforce ASCII filenames for generated files. Produce standards-compliant file formats. Common formats requiring 7-bit ASCII: GIF file comment field, MHT file header lines and email headers. With Ascii7 you convert national characters to an international format that is guaranteed to work everywhere.

6. Compare RISC Architecture with CISC Architecture? What was the necessity to move to RISC architecture? (5 Marks)

The simplest way to examine the advantages and disadvantages of RISC architecture is by contrasting it with it's predecessor: CISC (Complex Instruction Set Computers) architecture.

Multiplying Two Numbers in MemoryOn the right is a diagram representing the storage scheme for a generic computer. The main memory is divided into locations numbered from (row) 1: (column) 1 to (row) 6: (column) 4. The execution unit is responsible for carrying out all computations. However, the execution unit can only operate on data that has been loaded into one of the six registers (A, B, C, D, E, or F). Let's say we want to find the product of two numbers - one stored in location 2:3 and another stored in location 5:2 - and then store the product back in the location 2:3.

The CISC Approach The primary goal of CISC architecture is to complete a task in as few lines of assembly as possible. This is achieved by building processor hardware that is capable of understanding and executing a series of operations. For this particular task, a CISC processor would come prepared with a specific instruction (we'll call it "MULT"). When executed, this instruction loads the two values into separate registers, multiplies the operands in the execution unit, and then stores the product in the appropriate register. Thus, the entire task of multiplying two numbers can be completed with one instruction:

MULT 2:3, 5:2

MULT is what is known as a "complex instruction." It operates directly on the computer's memory banks and does not require the programmer to explicitly call any loading or storing functions. It closely resembles a command in a higher level language. For instance, if we let "a" represent the value of 2:3 and "b" represent the value of 5:2, then this command is identical to the C statement "a = a * b."

One of the primary advantages of this system is that the compiler has to do very little work to translate a high-level language statement into assembly. Because the length of the code is relatively short, very little RAM is required to store instructions. The emphasis is put on building complex instructions directly into the hardware.

The RISC Approach RISC processors only use simple instructions that can be executed within one clock cycle. Thus, the "MULT" command described above could be divided into three separate commands: "LOAD," which moves data from the memory bank to a register, "PROD," which finds the product of two operands located within the registers, and "STORE," which moves data from a register to the memory banks. In order to perform the exact series of steps described in the CISC approach, a programmer would need to code four lines of assembly:

LOAD A, 2:3LOAD B, 5:2PROD A, BSTORE 2:3, A

At first, this may seem like a much less efficient way of completing the operation. Because there are more lines of code, more RAM is needed to store the assembly level instructions. The compiler must also perform more work to convert a high-level language statement into code of this form.

However, the RISC strategy also brings some very important advantages. Because each instruction requires only one clock cycle to execute, the entire program will execute in approximately the same amount of time as the multi-cycle "MULT" command. These RISC "reduced instructions" require less transistors of hardware space than the complex instructions, leaving more room for general purpose

registers. Because all of the instructions execute in a uniform amount of time (i.e. one clock), pipelining is possible.

Separating the "LOAD" and "STORE" instructions actually reduces the amount of work that the computer must perform. After a CISC-style "MULT" command is executed, the processor automatically erases the registers. If one of the operands needs to be used for another computation, the processor must re-load the data from the memory bank into a register. In RISC, the operand will remain in the register until another value is loaded in its place

7. Discuss Addressing Modes of Intel 80X86 with suitable examples. (10 Marks)

ADDRESSING MODES OF 8086

Addressing mode indicates a way of locating data or operands. Dependingupon the data types used in the instruction and the memory addressing

modes, any instruction may belong to one or more addressing modes, orsome instruction may not belong to any of the addressing modes. Thus theaddressing modes describe the types of operands and the way they areaccessed for executing an instruction. Here, we will present the addressingmodes of the instructions depending upon their types. According to theflow of instruction execution, the instructions may be categorized as(i) Sequential control flow instructions and(ii) Control transfer instructions.

Sequential control flow instructions are the instructions, whichafter execution, transfer control to the next instruction appearingimmediately after it (in the sequence) in the program. For example, thearithmetic, logical, data transfer and processor control instructions aresequential control flow instructions. The control transfer instructions,on the other hand, transfer control to some predefined addresssomehow specified in the instruction after their execution. For example,INT, CALL, RET and JUMP instructions fall under this category.

The addressing modes for sequential control transfer instructions areexplained as follows:

CISC RISC Emphasis on hardware Emphasis on software Includes multi-clockcomplex instructions

Single-clock,reduced instruction only

Memory-to-memory:"LOAD" and "STORE"incorporated in instructions

Register to register:"LOAD" and "STORE"are independent instructions

Small code sizes,high cycles per second

Low cycles per second,large code sizes

Transistors used for storingcomplex instructions

Spends more transistorson memory registers

1. Immediate: In this type of addressing, immediate data is a part ofinstruction, and appears in the form of successive byte or bytes.Example: MOV AX, 0005HIn the above example, 0005H is the immediate data. The immediatedata may be 8-bit or 16-bit in size.

2. Direct: In the direct addressing mode, a 16-bit memory address(offset) is directly specified in the instruction as a part of it.Example: MOV AX, [5000H]Here, data resides in a memory location in the data segment, whoseeffective address may be computed using 5000H as the offsetaddress and content of DS as segment address. The effectiveaddress, here, is 10H*DS+5000H.

3. Register: In register addressing mode, the data is stored in a registerand it is referred using the particular register. All the registers,except IP, may be used in this mode.Example: MOV BX, AX.

4. Register Indirect: Sometimes, the address of the memory location,which contains data or operand, is determined in an indirect way,using the offset registers. This mode of addressing is known asregister indirect mode. In this addressing mode, the offset addressof data is in either BX or SI or DI registers. The default segment iseither DS or ES. The data is supposed to be available at the address

pointed to by the content of any of the above registers in the defaultdata segment.Example: MOV AX, [BX]Here, data is present in a memory location in DS whose offsetaddress is in BX. The effective address of the data is given as10H*DS+ [BX].

5. Indexed: In this addressing mode, offset of the operand is stored inone of the index registers. DS and ES are the default segments forindex registers SI and DI respectively. This mode is a special case ofthe above discussed register indirect addressing mode.Example: MOV AX, [SI]Here, data is available at an offset address stored in SI in DS. Theeffective address, in this case, is computed as 10H*DS+ [SI].

6. Register Relative: In this addressing mode, the data is available at aneffective address formed by adding an 8-bit or 16-bit displacementwith the content of any one of the registers BX, BP, SI and DI inthe default (either DS or ES) segment. The example given beforeexplains this mode.Example: MOV Ax, 50H [BX]Here, effective address is given as 10H*DS+50H+ [BX].

7. Based Indexed: The effective address of data is formed, in this

addressing mode, by adding content of a base register (any one ofBX or BP) to the content of an index register (any one of SI or DI).The default segment register may be ES or DS.Example: MOV AX, [BX] [SI]Here, BX is the base register and SI is the index register. Theeffective address is computed as 10H*DS+ [BX] + [SI].

8. Relative Based Indexed: The effective address is formed by addingan 8-bit or 16-bit displacement with the sum of contents of any oneof the bases registers (BX or BP) and any one of the index registers,in a default segment.Example: MOV AX, 50H [BX] [SI]Here, 50H is an immediate displacement, BX is a base register andSI is an index register. The effective address of data is computed as160H*DS+ [BX] + [SI] + 50H.

For the control transfer instructions, the addressing modes dependupon whether the destination location is within the same segmentor a different one. It also depends upon the method of passingthe destination address to the processor. Basically, there are twoaddressing modes for the control transfer instructions, viz. inter-segment and intra-segment addressing modes.

If the location to which the control is to be transferred lies in adifferent segment other than the current one, the mode is calledinter-segment mode. If the destination location lies in the samesegment, the mode is called intra-segment.

Inter-segment Direct

Inter-segment

Inter-segmentIndirect

Modes for controlTransfer instructions

Intra-segment

Intra-segmentDirectIntra-segmentIndirect

Addressing Modes for Control Transfer Instruction

9. Intra-segment direct mode: In this mode, the address to which thecontrol is to be transferred lies in the same segment in which the controltransfer instruction lies and appears directly in the instruction as animmediate displacement value. In this addressing mode, the displacement

is computed relative to the content of the instruction pointer IP.

The effective address to which the control will be transferred is givenby the sum of 8 or 16 bit displacement and current content of IP. In caseof jump instruction, if the signed displacement (d) is of 8 bits (i.e. –128<d<+128), we term it as short jump and if it is of16 bits (i.e. –32768<+32768), it is termed as long jump.

10. Intra-segment Indirect Mode: In this mode, the displacement to whichthe control is to be transferred, is in the same segment in which the controltransfer instruction lies, but it is passed to the instruction indirectly. Here,the branch address is found as the content of a register or a memorylocation. This addressing mode may be used in unconditional branchinstructions.

11. Inter-segment Direct Mode: In this mode, the address to which thecontrol is to be transferred is in a different segment. This addressing modeprovides a means of branching from one code segment to another code

segment. Here, the CS and IP of the destination address are specifieddirectly in the instruction.

12. Inter-segment Indirect Mode: In this mode, the address to which thecontrol is to be transferred lies in a different segment and it is passed tothe instruction indirectly, i.e. contents of a memory block containing fourbytes, i.e. IP (LSB), IP (MSB), CS (LSB) and CS (MSB) sequentially. Thestarting address of the memory block may be referred using any of theaddressing modes, except immediate mode.

8086 INSTRUCTION FORMAT

The 8086 instruction sizes vary from one to six bytes.

8. List out the pass-1 data structures and pass-2 data structures. ( 5 Marks)

Two Pass Assembler Pass 1 (define symbols)- Assign addresses to all statements- Save addresses assigned to all labels- Process assembler directives (e.g., WORD, RESB, ...Pass 2 (assemble instructions & write output files)- Translate operation codes- Assemble operands- Generate data values for BYTE & WORD directives- Process START and END directives- Write to object and listing files

Algorithm:

Pass 1: BEGIN initialize Scnt, Locctr, ENDval, and Errorflag to 0 WHILE Sourceline[Scnt] is a comment BEGIN increment Scnt END {while} Breakup Sourceline[Scnt] IF Opcode = 'START' THEN BEGIN convert Operand from hex and save in Locctr and ENDval IF Label not NULL THEN Insert (Label, Locctr) into Symtab ENDIF increment Scnt Breakup Sourceline[Scnt] END ENDIF WHILE Opcode <> 'END' BEGIN IF Sourceline[Scnt] is not a comment THEN BEGIN IF Label not NULL THEN Xsearch Symtab for Label IF not found Insert (Label, Locctr) into Symtab ELSE set errors flag in Errors[Scnt] ENDIF ENDIF Xsearch Opcodetab for Opcode IF found THEN DO CASE 1. Opcode is 'RESW' or 'RESB' BEGIN increment Locctr by Storageincr IF error THEN set errors flag in Errors[Scnt] ENDIF END {case 1 (RESW or RESB)} 2. Opcode is 'WORD' or 'BYTE' THEN BEGIN increment Locctr by Storageincr IF error THEN set errors flag in Errors[Scnt] ENDIF END {case 2 (WORD or BYTE)} 3. OTHERWISE BEGIN increment Locctr by Opcodeincr IF error THEN set errors flag in Errors[Scnt] ENDIF {case 3 (default)} END ENDCASE ELSE

/* directives such as BASE handled here or */

set errors flag in Errors[Scnt] ENDIF END {IF block} ENDIF increment Scnt Breakup Sourceline[Scnt] END {while} IF Label not NULL THEN Xsearch Symtab for Label IF not found Insert (Label, Locctr) into Symtab ELSE set errors flag in Errors[Scnt] ENDIF ENDIF IF Operand not NULL Xsearch Symtab for Operand IF found install in ENDval ENDIF ENDIF

END {of Pass 1}

Pass 2: BEGIN initialize Scnt, Locctr, Skip, and Errorflag to 0 write assembler report headings WHILE Sourceline[Scnt] is a comment BEGIN append to assembler report increment Scnt END {while} Breakup Sourceline[Scnt] IF Opcode = 'START' THEN BEGIN convert Operand from hex and save in Locctr append to assembler report increment Scnt Breakup Sourceline[Scnt] END ENDIF format and place the load point on object code array format and place ENDval on object code array, index ENDloc WHILE Opcode <> 'END' BEGIN IF Sourceline[Scnt] is not a comment THEN BEGIN Xsearch Opcodetab for Opcode IF found THEN DO CASE 1. Opcode is 'RESW' or 'RESB' BEGIN increment Locctr by Storageincr place '!' on object code array replace the value at index ENDloc with loader address format and place Locctr on object code array format and place ENDval on object code array, index ENDloc set Skip to 1 END 2. Opcode is 'WORD' or 'BYTE' BEGIN

increment Locctr by Storageincr Dostorage to get Objline IF error THEN set errors flag in Errors[Scnt] ENDIF END 3. OTHERWISE BEGIN increment Locctr by Opcodeincr Doinstruct to get Objline IF error THEN set errors flag in Errors[Scnt] ENDIF END ENDCASE ELSE

/* directives such as BASE handled here or */ set errors flag in Errors[Scnt] ENDIF END ENDIF append to assembler report IF Errors[Scnt] <> 0 THEN BEGIN set Errorflag to 1 append error report to assembler report END ENDIF IF Errorflag = 0 and Skip = 0 THEN BEGIN place Objline on object code array END ENDIF IF Skip = 1 THEN set Skip to 0 ENDIF increment Scnt Breakup Sourceline[Scnt] END {while} place '!' on object code array IF Errorflag = 0 THEN transfer object code array to file ENDIF

END {of Pass 2} Data Structures

1) OPTAB (operation code table)» mnemonic, machine code (instruction format, length) etc.» static table» instruction length» array or hash table, easy for search» Contents

- Mnemonic Codes of all instructions- Machine language op code- Other Information (for architectures with more that one length of

instruction format)o Length of instruction

o Instruction formats» In pass 1

- Validate op codes- Compute instruction length (in SIC/XE)

» In pass 2- Translate op codes to machine language

» Organized as hash table- Static (no updates)

2) SYMTAB (symbol table)

» label name, value, flag, (type, length) etc.» dynamic table (insert, delete, search)» hash table, non-random keys, hashing function» Contents

- Name of symbol- Address (value)- Error flags (from pass1)- Other information: attributes of data or instruction labeled

» In pass 1- Enter labels as they are encountered in source program- Enter addresses from LOCCTR

» In pass 2- Look up operand labels for addresses

» Also organized as a hash table- Entries only (no deletions)- Non-random keys -- watch hashing function

3) Location Counter (LOCCTR)

» counted in bytes.» Initialized to address specified in START directive.» Length of each assembled instruction is added to LOCCTR.» LOCCTR points to starting address of each statement in the program.

Intermediate Data. Between Passes 1 and 2. For each statement in source program

- Address- Error flags- Pointers to OPTAB and SYMTAB- Other (more complicated languages): results of processing of operation and operand

fields TWO PASS ASSEMBLER

9. Define Macro. Write a C program with a macro to find out biggest of two numbers. ( 5 Marks) A macro is a fragment of code which has been given a name. Whenever the name is used, it is replaced by the contents of the macro. There are two kinds of macros. They differ mostly in what they look like when they are used. Object-like macros resemble data objects when used, function-like macros resemble function calls.

You may define any valid identifier as a macro, even if it is a C keyword. The preprocessor does not know anything about keywords.

#include<stdio.h>#include<conio.h>

#define Greatest(X,Y) X>Y?X:Y

int main(){

int x,y;scanf("%d %d",&x,&y);printf("%d",Greatest(x,y));getch();

}

Spring 2012 Master of Computer Application (MCA) – Semester III MC0073 – Systems Programming– 4 Credits (Book ID: B0811) Assignment Set – 2 (60 Marks) Answer All Questions

1. Define Bootstrapping. Distinguish between Software Bootstrapping and Compiler Bootstrapping. ( 5 Marks)

Bootstrap loading

The discussions of loading up to this point have all presumed that there’s already an operating system or at least a program loader resident in the computer to load the program of interest. The chain of programs being loaded by other programs has to start somewhere, so the obvious question is how is the first program loaded into the computer?

In modern computers, the first program the computer runs after a hardware reset invariably is stored in a ROM known as bootstrap ROM. as in "pulling one’s self up by the bootstraps." When the CPU is powered on or reset, it sets its registers to a known state. On x86 systems, for example, the reset sequence jumps to the address 16 bytes below the top of the system’s address space. The bootstrap ROM occupies the top 64K of the address space and ROM code then starts up the computer. On IBM-compatible x86 systems, the boot ROM code reads the first block of the floppy disk into memory, or if that fails the first block of the first hard disk, into memory location zero and jumps to location zero. The program in block zero in turn loads a slightly larger operating system boot program from a known place on the disk into memory, and jumps to that program which in turn loads in the operating system and starts it. (There can be even more steps, e.g., a boot manager that decides from which disk partition to read the operating system boot program, but the sequence of increasingly capable loaders remains.)

Why not just load the operating system directly? Because you can’t fit an operating system loader into 512 bytes. The first level loader typically is only able to load a single-segment program from a file with a fixed name in the top-level directory of the boot disk. The operating system loader contains more sophisticated code that can read and interpret a configuration file, uncompress a compressed operating system executable, address large amounts of memory (on an x86 the loader usually runs in real mode which means that it’s tricky to address more than 1MB of memory.) The full operating system can turn on the virtual memory system, loads the drivers it needs, and then proceed to run user-level programs.

Many Unix systems use a similar bootstrap process to get user-mode programs running. The kernel creates a process, then stuffs a tiny little program, only a few dozen bytes long, into that process. The tiny program executes a system call that runs /etc/init, the user mode initialization program that in turn runs configuration files and starts the daemons and login programs that a running system needs.

None of this matters much to the application level programmer, but it becomes more interesting if you want to write programs that run on the bare hardware of the machine, since then you need to arrange to intercept the bootstrap sequence somewhere and run your program rather than the usual operating system. Some systems make this quite easy (just stick the name of your program in AUTOEXEC.BAT and reboot Windows 95, for example), others make it nearly impossible. It also presents opportunities for customized systems. For example, a single-application system could be built over a Unix kernel by naming the application /etc/init.

5.2.2 Software Bootstraping & Compiler Bootstraping

Bootstrapping can also refer to the development of successively more complex, faster programming environments. The simplest environment will be, perhaps, a very basic text editor (e.g. ed) and an assembler program. Using these tools, one can write a more complex text editor, and a simple compiler for a higher-level language and so on, until one can have a graphical IDE and an extremely high-level programming language.

5.2.3 Compiler Bootstraping

In compiler design, a bootstrap or bootstrapping compiler is a compiler that is written in the target language, or a subset of the language, that it compiles. Examples include gcc, GHC, OCaml, BASIC, PL/I and more recently the Mono C# compiler.

2. What is the function of following Intelx86 registers ( 5 Marks) AX

DX,

CX,

DI

SI

SP

BP

BX

Each register name is really an acronym. This is true even for the "alphabetical" registers AX, BX, CX, and DX. The following list shows the register names and their meanings:

· AX - Accumulator Register

· BX - Base Register

· CX - Counter Register

· DX - Data Register

· SI - Source Index

· DI - Destination Index

· BP - Base Pointer

· SP - Stack Pointer

Purpose of Intelx86 registers

· AX – All major calculations take place in EAX, making it similar to a dedicated accumulator register.

· DX – The data register is an extension to the accumulator. It is most useful for storing data related to the accumulator’s current calculation.

· CX – Like the variable i in high-level languages, the count register is the universal loop counter.

· DI – Every loop must store its result somewhere, and the destination index points to that place. With a single-byte STOS instruction to write data out of the accumulator, this register makes data operations much more size-efficient.

· SI – In loops that process data, the source index holds the location of the input data stream. Like the destination index, EDI had a convenient one-byte instruction for loading data out of memory into the accumulator.

· SP – ESP is the sacred stack pointer. With the important PUSH, POP, CALL, and RET instructions requiring it’s value, there is never a good reason to use the stack pointer for anything else.

· BP – In functions that store parameters or variables on the stack, the base pointer holds the location of the current stack frame. In other situations, however, EBP is a free data-storage register.

· BX – In 16-bit mode, the base register was useful as a pointer. Now it is completely free for extra storage space.

2. Write the algorithm of Boot Strap Loader. ( 5 Marks)

4. Identify Lexemes and Tokens in the following statements: ( 5 Marks)

A5=B+9

x=(y+z)/ 10

1. A5=B+9;Tokenized in the following table

Lexeme Token Type

A5 Identifier

= Assignment Operator

B Identifier

+ Addition Operator

9 Number

; End of Statement

2. x=(y+z)/10;Tokenized in the following tableLexeme Token Type

x Identifier

= Assignment Operator

( Opening Paranthesis

y Identifier

+ Addition Operator

z Identifier

) Closing Parenthesis

/ Division Operator

10 Number

; End of Statement

Tokens are frequently defined by regular expressions, which are understood by a

lexical analyzer generator such as lex. The lexical analyzer (either generated

http://en.wikipedia.org/wiki/Lex_(software)

http://en.wikipedia.org/wiki/Regular_expression

automatically by a tool like lex, or hand-crafted) reads in a stream of characters,

identifies the lexemes in the stream, and categorizes them into tokens. This is called

"tokenizing." If the lexer finds an invalid token, it will report an error.

Following tokenizing is parsing. From there, the interpreted data may be loaded into

data structures for general use, interpretation, or compiling.

http://en.wikipedia.org/wiki/Compiling

http://en.wikipedia.org/wiki/Parsing

5. Given the following Grammar ( 10 Marks)

F ->id T -> F E -> T F -> id T -> F F -> id T ->T*F E -> E+T E Parse the input string id+id*id by Bottom-up Parsing or Shift-Reduce Parsing method.

Answer:Parse Stack Remaining Input Parser Action$ id+id*id$ Shift (push next token from input on stack, advance

input)$id +id*id$ Shift$F +id*id$ Reduce : F->id$T +id*id$ Reduce : T->F$E +id*id$ Reduce: E->T$E+ id*id$ Shift$E+id *id$ Shift$E+F *id$ Reduce: F->id$E+T *id$ Reduce : T->F$E *id$ Reduce : E->E+T$T *id$ Reduce : E->T$T* id$ Shift$T*id $ ShiftT*F $ Reduce : F->idT $ Reduce : T->T*FF $ Reduce : T->F

6. With a neat Block Diagram Explain the Phases of Compiler. ( 10 Marks)

) Phases of Compilation:A compiler is a computer program (or set of programs) that transforms source code written in a computer

language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program.

The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). A program that translates from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language.

A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis, code generation, and code optimization.

Program faults caused by incorrect compiler behavior can be very difficult to track down and work around and compiler implementors invest a lot of time ensuring the correctness of their software

7. Write short notes on: ( 10 Marks)

Compiler-Writing Tools

. Compiler writing tools:A compiler is a computer program (or set of programs) that transforms source code written in a computer

language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program.

The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). A program that translates from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language. A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis, code generation, and code optimization.

Purdue Compiler-Construction Tool Set tool:(PCCTS) A highly integrated lexical analser generator and parser generator by Terence J. Parr , Will E. Cohen

and Henry G. Dietz , both of Purdue University. ANTLR (ANother Tool for Language Recognition) corresponds to YACC and DLG (DFA-based Lexical analyser Generator) functions like LEX. PCCTS has many additional features which make it easier to use for a wide range of translation problems. PCCTS grammars contain specifications for lexical and syntactic analysis with selective backtracking ("infinite lookahead"), semantic predicates, intermediate-form construction and error reporting. Rules may employ Extended BNF (EBNF) grammar constructs and may define parameters, return values, and have local variables.Languages described in PCCTS are recognised via LLk parsers constructed in pure, human-readable, C code. Selective backtracking is available to handle non-LL(k) constructs. PCCTS parsers may be compiled with a C++ compiler. PCCTS also includes the SORCERER tree parser generator. Latest version: 1.10, runs under Unix, MS-DOS, OS/2, and Macintosh and is very portable.

If you are thinking of creating your own programming language, writing a compiler or interpreter, or a scripting facility for your application, or even creating a documentation parsing facility, the tools on this page are designed to (hopefully) ease your task. These compiler construction kits, parser generators, lexical analyzer / analyser (lexers) generators, code optimzers (optimizer generators), provide the facility where you define your language and allow the compiler creation tools to generate the source code for your software.

Static, Dynamic and Stack Memory allocations

Heap and Garbage Collection

8. Define: Finite State Automaton, Deterministic Finite state Automaton and Non-Deterministic Finite State Automaton with suitable examples. ( 6 Marks)

Deterministic finite automaton (DFA)—also known as deterministic finite state

machine—is afinite state machine accepting finite strings of symbols. For each state,

there is a transition arrow leading out to a next state for each symbol. Upon reading a

symbol, a DFA jumps deterministically from a state to another by following the transition

arrow. Deterministic means that there is only one outcome (i.e. move to next state when

the symbol matches (S0 -> S1) or move back to the same state (S0 -> S0)). A DFA has

a start state (denoted graphically by an arrow coming in from nowhere) where

computations begin, and a set of accept states (denoted graphically by a double circle)

which help define when a computation is successful.

DFAs recognize exactly the set of regular languages which are, among other things,

useful for doing lexical analysis and pattern matching. A DFA can be used in either an

accepting mode to verify that an input string is indeed part of the language it represents,

or a generating mode to create a list of all the strings in the language.

A DFA is defined as an abstract mathematical concept, but due to the deterministic

nature of a DFA, it is implementable in hardware and software for solving various

specific problems. For example, a software state machine that decides whether or not

online user-input such as phone numbers and email addresses are valid can be modeled

as a DFA. Another example in hardware is the digital logic circuitry that controls whether

an automatic door is open or closed, using input from motion sensors or pressure pads

to decide whether or not to perform a state transition .

Formal definition

A deterministic finite automaton M is a 5-tuple, (Q, Σ, δ, q0, F), consisting of

a finite set of states (Q)

a finite set of input symbols called the alphabet (Σ)

a transition function (δ : Q × Σ → Q)

a start state (q0 ∈ Q)

a set of accept states (F ⊆ Q)

Let w = a1a2 ... an be a string over the alphabet Σ. The automaton M accepts the

string w if a sequence of states, r0,r1, ..., rn, exists in Q with the following conditions:

1. r0 = q0

http://en.wikipedia.org/wiki/Finite_state_machine#Accept_state

http://en.wikipedia.org/wiki/Finite_state_machine#Start_state

http://en.wikipedia.org/wiki/Function_(mathematics)

http://en.wikipedia.org/wiki/Alphabet_(computer_science)

http://en.wikipedia.org/wiki/State_(computer_science)

http://en.wikipedia.org/wiki/N-tuple

http://en.wikipedia.org/wiki/Lexical_analysis

http://en.wikipedia.org/wiki/Regular_language

http://en.wikipedia.org/wiki/Finite_state_machine#Accept_state

http://en.wikipedia.org/wiki/Set_(mathematics)

http://en.wikipedia.org/wiki/Finite_state_machine#Start_state

http://en.wikipedia.org/wiki/Finite_state_machine

2. ri+1 = δ(ri, ai+1), for i = 0, ..., n−1

3. rn ∈ F.

In words, the first condition says that the machine starts in the start state q0. The second

condition says that given each character of string w, the machine will transition from

state to state according to the transition function δ. The last condition says that the

machine accepts w if the last input of w causes the machine to halt in one of the

accepting states. Otherwise, it is said that the automatonrejects the string. The set of

strings M accepts is the language recognized by M and this language is denoted

by L(M).

A deterministic finite automaton without accept states and without a starting state is

known as a transition system or semiautomaton.

For more comprehensive introduction of the formal definition see automata theory.

DFAs can be built from nondeterministic finite automata through the powerset

construction.

An example of a Deterministic Finite Automaton that accepts only binary numbers that are

multiples of 3. The state S0 is both the start state and an accept state.

Example

The following example is of a DFA M, with a binary alphabet, which requires that the

input contains an even number of 0s.

http://en.wikipedia.org/wiki/Powerset_construction


http://en.wikipedia.org/wiki/Nondeterministic_finite_automata

http://en.wikipedia.org/wiki/Automata_theory

http://en.wikipedia.org/wiki/Semiautomaton

http://en.wikipedia.org/wiki/Transition_system

http://en.wikipedia.org/wiki/Formal_language

http://en.wikipedia.org/wiki/File:DFA_example_multiplies_of_3.svg

The state diagram for M

M = (Q, Σ, δ, q0, F) where

Q = {S1, S2},

Σ = {0, 1},

q0 = S1,

F = {S1}, and

δ is defined by the following state transition table:

0 1

S1 S2 S1

S2 S1 S2

The state S1 represents that there has been an even number of 0s in the input so far,

while S2 signifies an odd number. A 1 in the input does not change the state of the

automaton. When the input ends, the state will show whether the input contained an

even number of 0s or not. If the input did contain an even number of 0s, M will finish

in state S1, an accepting state, so the input string will be accepted.

The language recognized by M is the regular language given by the regular

expression 1*( 0 (1*) 0 (1*) )*, where "*" is the Kleene star, e.g., 1* denotes any non-

negative number (possibly zero) of symbols "1".

Nondeterministic finite automaton (NFA) or nondeterministic finite state

machine is a finite state machine where from each state and a given input symbol the

automaton may jump into several possible next states. This distinguishes it from

the deterministic finite automaton (DFA), where the next possible state is uniquely

determined. Although the DFA and NFA have distinct definitions, a NFA can be

translated to equivalent DFA using powerset construction, i.e., the constructed DFA and

the NFA recognize the same formal language. Both types of automata recognize



http://en.wikipedia.org/wiki/Deterministic_finite_automaton

http://en.wikipedia.org/wiki/Finite_state_machine

http://en.wikipedia.org/wiki/Kleene_star




http://en.wikipedia.org/wiki/State_transition_table

http://en.wikipedia.org/wiki/State_diagram

http://en.wikipedia.org/wiki/File:DFAexample.svg

only regular languages. Nondeterministic finite automata were introduced in 1959

by Michael O. Rabin and Dana Scott,[1] who also showed their equivalence to

deterministic finite automata.

Non-deterministic finite state machines are sometimes studied by the name subshifts of

finite type. Non-deterministic finite state machines are generalized by probabilistic

automata, which assign a probability to each state transition.

Formal definition

An NFA is represented formally by a 5-tuple, (Q, Σ, Δ, q0, F), consisting of

a finite set of states Q

a finite set of input symbols Σ

a transition function Δ : Q × Σ → P(Q).

an initial (or start) state q0 ∈ Q

a set of states F distinguished as accepting (or final) states F ⊆ Q.

Here, P(Q) denotes the power set of Q. Let w = a1a2 ... an be a word over the alphabet Σ.

The automaton M accepts the word w if a sequence of states, r0,r1, ..., rn, exists in Q with

the following conditions:

1. r0 = q0

2. ri+1 ∈ Δ(ri, ai+1), for i = 0, ..., n−1

3. rn ∈ F.

In words, the first condition says that the machine starts in the start state q0. The second

condition says that given each character of string w, the machine will transition from

state to state according to the transition relation Δ. The last condition says that the

machine accepts w if the last input of w causes the machine to halt in one of the

accepting states. Otherwise, it is said that the automatonrejects the string. The set of

strings M accepts is the language recognized by M and this language is denoted

by L(M).

For more comprehensive introduction of the formal definition see automata theory.

NFA-ε

The NFA-ε (also sometimes called NFA-λ or NFA with epsilon moves) replaces the

transition function with one that allows the empty string ε as a possible input, so that one

has instead

Δ : Q × (Σ ∪{ε}) → P(Q).

It can be shown that ordinary NFA and NFA-ε are equivalent, in that, given either one, one

can construct the other, which recognizes the same language.

http://en.wikipedia.org/wiki/Empty_string

http://en.wikipedia.org/wiki/Automata_theory


http://en.wikipedia.org/wiki/Power_set

http://en.wikipedia.org/wiki/Function_(mathematics)

http://en.wikipedia.org/wiki/Input_symbol

http://en.wikipedia.org/wiki/Set_(mathematics)

http://en.wikipedia.org/wiki/N-tuple

http://en.wikipedia.org/wiki/Probabilistic_automata

http://en.wikipedia.org/wiki/Probabilistic_automata

http://en.wikipedia.org/wiki/Subshifts_of_finite_type

http://en.wikipedia.org/wiki/Subshifts_of_finite_type

http://en.wikipedia.org/wiki/Nondeterministic_finite-state_machine#cite_note-0

http://en.wikipedia.org/wiki/Dana_Scott

http://en.wikipedia.org/wiki/Michael_O._Rabin

http://en.wikipedia.org/wiki/Regular_languages

Example

The state diagram for M

Let M be a NFA-ε, with a binary alphabet, that determines if the input contains an even

number of 0s or an even number of 1s. Note that 0 occurrences is an even number of

occurrences as well.

In formal notation, let M = ({s0, s1, s2, s3, s4}, {0, 1}, Δ, s0, {s1, s3}) where the transition

relation T can be defined by this state transition table:

0 1 ε

S0 {} {} {S1, S3}

S1 {S2} {S1} {}

S2 {S1} {S2} {}

S3 {S3} {S4} {}

S4 {S4} {S3} {}

M can be viewed as the union of two DFAs: one with states {S1, S2} and the other with

states {S3, S4}. The language of M can be described by theregular language given by

this regular expression (1*(01*01*)*) ∪ (0*(10*10*)*). We define M using ε-moves

but M can be defined without using ε-moves.

9. Bring out the Basic functions of Loader. ( 4 Marks)



http://en.wikipedia.org/wiki/Deterministic_finite_automaton

http://en.wikipedia.org/wiki/State_transition_table

http://en.wikipedia.org/wiki/State_diagram

http://en.wikipedia.org/wiki/File:NFAexample.svg