Self Learning Material System Programming

Self Learning Material

System Programming(MCA-305A)

Course: Masters in Computer Applications

Semester-III

Distance Education Programme

I.K. Gujral Punjab Technical University

Jalandhar

SyllabusI.K. Gujral Punjab Technical University

MCA-305A System Programming

-----------------------------------------------------------------------------------------------------------Section-A

Assemblers and Macro Processors: Language processors, data structures for language processing,General Design Procedure, Single pass and two pass assembler and their algorithms, assemblylanguage specifications (example MASM). Macro Instructions, Features of Macro Facility: Macroinstruction arguments, Conditional macro expansion, Macro calls within macro.

Section-B

Loaders and Linkers & Editors: Loader Schemes: Compile and go loader, general loader scheme,absolute loaders, subroutine linkages, relocating loaders, direct linking loaders, Relocation, Design ofAbsolute Loader, Bootstrap Loaders, Dynamic Linking, MS-DOS Linker, Text Editors, Line Editor,Steam Editors, Screen editor, Word processors, Structure editors.

Section-CCompiler Design: Introduction to various translators, interpreters, debuggers, various phases of compiler,Introduction to Grammars and finite automata, Bootstrapping for compilers, Lexical Analysisand syntax analysis, Intermediate Code Generation, Code optimization techniques, Code generation,Introduction to YACC, Just-in-time compilers, Platform Independent systems.

Section-DOperating System: OperatingSystemsanditsfunctions,Typesofoperatingsystems:Real-timeOS, DistributedOS, Mobile OS, Network OS, Booting techniques and subroutines, I/O programming, Introduction toDevice Drivers, USB and Plug and Play systems, Systems Programming (API’s).

TEXTBOOKS:• Donovan J.J., Systems Programming, New York, Mc-Graw Hill, 1972.• Leland L. Beck, System Software, San Diego State University, Pearson Education, 1997.• Dhamdhere,D.M.,SystemProgrammingandOperatingSystems,TataMc-GrawHill1996.

REFERENCES:1.Aho A.V.and J.D. Ullman Principles of compiler Design Addison Wesley/Narosa 1985.

Table of Contents

Chapter No. Title Written By Page No.

1 Fundamental of Assembler Mr. Mandeep KumarAssistant Professor,CEM, Kapurthala

1

2 Design Procedure: Assemblers Mr. Mandeep KumarAssistant Professor,CEM, KApurthala

9

3 Assembly Language Ms. Sajpreet KaurAssistant ProfessorDAV University,Jalandhar

24

4 Macro Instructions Ms. Sajpreet KaurAssistant ProfessorDAV University,Jalandhar

34

5 Loaders Mr. Manpreet SinghAssistant Professor,CTIT, Shahpur,Jalandhar

46

6 Linkers Mr. Manpreet SinghAssistant Professor,CTIT, Shahpur,Jalandhar

54

7 Editors Mr. Manpreet SinghAssistant Professor,CTIT, Shahpur,Jalandhar

64

8 Fundamentals of Compiler Design Mr. Abhinav HansAssistant Professor,DAVIET, Jalandhar

72

9 Finite automata and grammar Mr. Abhinav HansAssistant Professor,DAVIET, Jalandhar

86

10 Phases of Compiler Design Mr. Abhinav HansAssistant Professor,DAVIET, Jalandhar

110

11 YACC Mr. Abhinav HansAssistant Professor,DAVIET, Jalandhar

134

12 Fundamentals of OS Mr. Sandeep Sood

GNDU, Amritsar

152

13 Booting Techniques and Device Drivers Mr. Sandeep SoodAssistant Professor,GNDU, Amritsar

165

14 System Programming API Mr. Sandeep SoodAssistant Professor,GNDU, Amritsar

178

Reviewed byMr. Palvinder Singh Mann

Assistant ProfessorDAVIET

© IK Gujral Punjab Technical University JalandharAll rights reserved with IK Gujral Punjab Technical University

Jalandhar

1

Lesson 1 Fundamental of Assembler

Structure of the Chapter

1.0 Objective

1.1 Introduction

1.2 Assembler

1.3 Macro Processor

1.4 Language Processor

1.5 Data Structure used in Language Processor

1.6 Summary

1.7 Glossary

1.8 Answers to check your progress/self assessment questions

1.9 References/ Suggested Readings

1.10 Model Questions

1.0 Objective

After studying this chapter students will be able to:

Explain assembler and MACRO Processors. Describe Language Processor Describe different data structure used in language processor.

1.1 Introduction

Assembler is one of the fundamental components of any processing system. In this chapter,assembler, MACRO processor, Language processor and data structures used in languageprocessor will be discussed.

1.2 Assembler

An assembler is a computer program that interprets software program written in assemblylanguage into machine language that can be executed by a computer. An assembler is referred asthe compiler of assembly language. Assembly language is different to specific computerarchitecture.

Typically, assemblers make two passes over the assembly language code as:

2

1. First Pass: read each line and record labels in a symbol table.2. Second Pass: use information in a symbol table to produce actual machine code for each

line.

Further, we will have a complete chapter on Assembler to discuss it in detail.

Fig 1.1 Role of Assembler

1.3 Macro Processor

A macro is a commonly used group of statements in the source programming language. Amacro instruction (or macro) is a convenience for the programmers use. It allows programmer towrite short version of a program (module programming).The macro processor is a program thatreplaces each macro instruction with the corresponding group of source language, although Itperform no analysis on the program.

Basic Macro Processor Functions: Design of macro processor is machine independent.TwoAssembler directives are used in macro definition.

1. MACRO: - Shows beginning of macro definition.2. MEND: -Shows end of macro definition.

Syntax for macro:

Each parameter start with ‘&’NAME MACRO parameters

.

.Body

Assembly Code

Assembler

Machine Code

3

.

.MEND

Example: -

SOURCE PROGRAM:-

AS MACROSTA D1STB D2MEND

.AS.AS.

AS is a macro with no argument.The MACRO stores the contents of register A in D1 and thecontents of register B in D2.

EXTENDED SOURCE PROGRAM: -..STA D1STB D2.STA D1STB D2

Further we will have a complete chapter on MACRO PROCESSOR to discuss it in detail.

Check Your Progress/ Self assessment Questions

Q1 What is the need of Assembler ?

………………………………………………………………………………………………………

…………………………………………………………………………………………………….

……………………………………………………………………………………………………..

Q2.________generation computers use assembly language

a. First generationb. Third generationc. Second generation

4

d. Fourth generation

Q3. Assembler works to convert assembly language program into machine language :a. Before the computer can execute itb. After the computer can execute itc. In between executiond. All of these

Q4. Assembly language program is called

a. Object programb. Source programc. Oriented programd. All of these

1.4 Language ProcessorLanguage processing activities comes into play due to the differences in manner by which asoftware designer describes the ideas concerning the behavior of a software and a manner bywhich these ideas are implemented in a computer system. For example, if we write a code inC++ language then we need something that convert the code into machine readable text and thatcan be done by intermediate called C++ compiler i.e. a Language Processor.

Types of Language Processors are: -

1. Language Translator: -These type of language processor convert the higher level languageto a machine level language. Example of these type of processor are Assembler andCompliers.Some examples of translators are:(a)English to French interpreter: This program translates English to French. Source languageremains same in both the cases.(b)Java to C translator: this program translates java program source code to c languagesource code.

2. Detranslator: -It is a type of language processor that takes object code at low level andregenerates the source code at a higher level. Example of this is Dissembler.

3. Preprocessor: -It performs a simple text substitution before translation takes place. Exampleof this type of processor is Macro, Which we studied in previous topic. In every systemspecific language like C, C++ etc. have preprocessor to directly process its variables andfunctions.

5

1.5 Data Structure for data processing by Assembler

Data Structure: Firstly, we should know what data structure is before starting this topic. DataStructure is a way of organizing the data so that it can be used efficiently.

The various type of data structures used by assembler are:

1. Symbol Table (SYSTAB)2. Literal Table (LITTAB)3. Mnemonics Table or Machine operation Table (MOT)4. Pseudo- Opcode Table (POT) or Operation Code Table (OPTAB)5. Location Counter6. Pool Table (POOLTAB)

1. Symbol Table (SYSTAB): - Symbol table is a table that contains all the symbol used inthe program and also store their location and value of that symbol if any. Symbols used tostore in the SYMTAB includes variables, labels etc. It also contains flag which indicateerror messages.

During Pass 1: labels and variables are entered into the symbol table along with theirassigned address value as they encountered. All the symbols addresses and their valuesshould get resolved at the pass 1.

During Pass 2: Symbols used as operands are checked up the symbol table to get theaddress value to be inserted in the assembled instructions.

SYMTAB

SYMBOL VALUE LOCATION LENGTH

2. Literal Table (LITTAB): - Literal Table store all the information of literal used in theassembly program. Literal table includes name and the location assigned to that particularliteral. LITTAB is also created by analysis phase and used by synthesis phase to generatemachine code.

3. Mnemonic Table: - Mnemonic table is also known as machine operation table(MOT).Mnemonic table content remains same during the lifetime of the assembler. Noone can make changes in the mnemonic table. It remains fixed. When we use mnemoniccode then assembler fits its opcode value in conversion to machine language. Example ofmnemonics are ADD, SUB, MUL etc. and in MOT its opcode value and length is definedwhich help it to use easily and effectively.

6

4. Pseudo – Opcode Table (POT) or Operation Code Table (OPTAB): -It is also a fixedtable and remains same during lifetime of assembler, same as mnemonic. OPTABcontains 3 fields: Mnemonic opcode, class and mnemonic information.

Mnemonic opcode is same as mnemonic table. Class field indicates whether the opcoderelated to an imperative statement (IS), a declarative statement (DS) or an assemblerdirective (AD).

Location Counter: - Location counter is a stack which keep track of the presentinstruction location and it is a variable which is used to initialize the address of theprogram. Example,

START 100

Now, LC has value 100 in it means, starting address of program is 100.

5. Pool Table (POOLTAB): - When in a program multiple LTORG statements are usedthen assembler creates a different pool for each LTORG statement. POOLTAB containsinformation regarding different literal pools. Number of pool table value is stored inn_pool variable.


Q5 What is the use of Literal Table?

………………………………………………………………………………………………………

…………………………………………………………………………………………………….

……………………………………………………………………………………………………..

Q6 Mnemonic refers to:

a. Instructionsb. Codec. Symbolic codesd. Assembler

Q7 A_______ processor controls repetitious writing of sequence:

a. Macrob. Microc. Nanod. All of these

7

1.5 Summary

An assembler is a computer program that interprets software program written in assemblylanguage into machine language that can be executed by a computer.

A Macro is a commonly used group of statements in the source programming language. The macro processor is a program that replaces each macro instruction with the

corresponding group of source language. It performs no analysis of program. Language Translator converts the higher level language to machine level language. Detranslator: -It is type of language processor that takes object code at low level and

regenerates the source code at a higher level. Preprocessor: -It performs a simple text substitution before translation takes place.

1.7 Glossary

The important terms discussed in this chapter are:

Assembler: It converts the code written in assembly language into machine language. Assembly language: It is low level programming language which is machine dependent. MACRO:A Macro is a commonly used group of statements in the source programming

language. MACRO Processor: The macro processor is a program that replaces each macro

instruction with the corresponding group of source language. It performs no analysis ofprogram.

Language Processor: Language Processor just act like a bridge between two differentlanguages and make them communicate.

Data Structure: It is a way of arranging and storing the data so that is can be retrieved inan efficient way.

1.8 Answers to Self Assessment Questions

1. It is a translator that converts the code written in assembly language into machine language.

2. C second generation

3. a) Before the computer can execute it

4. b)Source program

5. Literal Table store all the information of literal used in the assembly program. Literal table

includes name and the location assigned to that particular literal. LITTAB is also created by

analysis phase and used by synthesis phase to generate machine code.

6. c) Symbolic codes

7. a) Macro

8

1.9 References

1. System Software, Charanjeet Singh, Kalyani publishers.

2. System Programming, Donovan, 2nd Edition, Tata McGraw-Hill Education.

3. Systems Programming, D M Dhamdhere, Tata McGraw-Hill Education, 2011.


Q1. How many passes are there in Assembler, elaborate?

Q2. What is MACRO PROCESSOR and what are its functions?

Q3. What are different data Structure used in language processor?

Q4. What is language processor and what are its types?

9

Design Procedure: Assemblers

2.0 Objective

2.1 Introduction

2.2 Assemblers

2.3 Assembly Language

2.4 Format of Assembly Language

2.5 Constants and literals

2.6 Assembly Scheme

2.6.1 Phases of Assembler

2.7 One Pass Assembler

2.8 Two Pass Assembler

2.9 Summary

2.10 Glossary



2.13Model Questions

2.0 Objective

After studying this chapter, students will be able to:

Write assembly language statements Learn how Assembler works Learn how to design their own assembler

2.1 Introduction

In this chapter assembler, assembly language, instructions writing in assembly language, howassembler works and how to design one’s own assembler will be discussed. Some algorithm thatwe can use to create assemblers will also be discussed.

2.2 AssemblerLet us first consider an example to understand the term assembler. Let us suppose thereare two persons, one can only speak English and other can only speak Spanish and theywant to communicate with each other. Now the problem is no one can understand whatother is saying. So in order to solve this problem they include a third person who can

10

understand English as well as Spanish so that he can translate their languages. Same caseis with our computer and programming languages. Our computer can only understandbinary language whereas it is too hard for us to understand. We can understand Englishlanguage very well but it is not known to our computer. So we need some translator andhere the role of assembler comes into play. Assembler takes our code written in assemblylanguage (Source Program) convert it into binary language for computer so thatcomputer can understand it. The output produced by computer is in binary languageassembler take it from computer, convert it into assembly language and give us the outputso that we can understand it.

Fig 2.1 Working of Assembler

2.2.1 Tasks performed by Assembler:-

Translate mnemonic code in binary language. Notify errors if present in the source code. Produce information for linkers and loaders. Assign machine addresses to all the labels used in the program.

2.3 Assembly LanguageIt is a low level programming language which is machine dependent. It consist of numberof mnemonics that represent what operation is to be performed. Eg. ADD, this mnemonicrepresent that the programmer wants to perform addition. Machine can’t understandassembly language directly. Assembler is required to convert assembly language in tobinary language and vice versa.

11

2.4 Format of assembly language:-

An assembly language statement generally consists of 4 components.

[Label] <opcode><operand specification> [<operand specification>…..]

Description of various components are:1. […] Optional Field2. Label Name given to memory location3. Opcode Mnemonic code4. Operand SpecificationOperand type like register, memory etc.

Note: First operand of assembly language statement is always a register like AREG,BREG, CREG etc.

Second operand of assembly language can be register, memory or immediate value etc.

2.4.1 Various mnemonic codes, their meaning, opcode value and example:

Mnemonic Opcode Value Meaning Example

STOP 00 Stops the execution STOP

ADD 01 Addition ADD AREG ONE

SUB 02 Subtraction SUB AREG ONE

MULT 03 Multiplication MULT AREG ONE

MOVER 04 Move memory content toregister

MOVER AREGONE

MOVEM 05 Move register content tomemory

MOVEM AREGONE

COMP 06 Comparison instruction set code COMP AREG ONE

BC 07 Branch on Condition BC AREG LT ONE2000

DIV 08 Division DIV AREG ONE

READ 09 To read data READ ONE

PRINT 10 To Print data PRINT ONE

2.4.2 Condition Code Table

Code Mnemonics Meaning

12

1 LT Less Than

2 LE Less than Equal to

3 EQ Equal To

4 GT Greater Than

5 GE Greater than or Equal to

6 ANY Unconditional control transfer

2.4.3 Machine Instruction FormatAs we know assembly language statement has three parts opcode first operand andsecond operand. Similarly, Machine language statement also has three parts. In machinelanguage, opcode occupies 2 spaces, register occupies 1 and memory occupies 3 spaces.

2.4.4 Type of Assembly Language Statements

Assembly Language statements are of three types:

Imperative Statements:1. Imperative statements are those statements that defines to the processor that what

operation is to be performed.2. Example of Imperative statements are ADD, SUB, MOVER etc.

Declarative Statements:1. These statements are either used to declare a storage or to declare a constant.2. There are only two declarative statements in Assembly Language i.e. DS and DC.

DS: DS Stand for Declare Storage. As the name describes, this statement is to reserve ablock of memory. In actual this statement just put a label on the particular memorylocation that you provides. For example,

A DS 10

This statement will reserve a block of 10 memory words with the label A. On accessingA directly you will access first word of the reserved 10 words. Other words can beaccessed by using offsets. Like to access 6th word you need to write A+5.

DC: DC stands for Declare Constant. This statement reserves a word in the memory andput assign a constant value to it. To use the stored constant value into our program, weprovide a label with it.

13

TWO DC ‘2’

You can store constants in decimal, hexadecimal or in any other form. Assembler willconvert it into binary form and then put it on memory.

Assembler Directives:As the name describes, Assembler Directive statements are used to direct the assembler toperform some particular tasks during translation of the source code into destination code. Nomemory space is reserved for the assembler directives .Some of the assembler directives areSTART, END etc.


Q1 Discuss the format of Assembly Language.

………………………………………………………………………………………………

………………………………………………………………………………………………..Q2 Why are Declarative Statements used ?

………………………………………………………………………………………………

………………………………………………………………………………………………..

Q3 Assembler Directives statements are used to..............................................................

2.5 Constants and Literals

As we have previously discussed, DC means ‘Declare Constant’. Whereas in actualimplementation, we do not declare constant, we actually gave a label to memory locationand put value in that particular memory space. Its value can be changed as per as user’srequirement. To understand this we can take example of variables in C language. Like wecan declare a variable in C in this way

int a=5;Its value can change with user’s requirements.Assembly language can take constant in two ways:

1. Immediate Operands2. Literals

An example of immediate operand isAdd AREG,5

A literal is an operand with syntax:=’<Value>’

A literal is different from a constant in two ways:1. Its value can’t be changed during program execution.

14

2. As value of literal can’t be changed so it is more secured than constant.Literal is identified with prefix =.

2.5.1 Difference between literal and constant

Literal Constant

1.) A literal is an operand specified with ‘=’sign.

1.) A constant is an immediate operand.

2.) Whenever a literal is found in theprogram, Assembler first allocate a memoryspace, add label to that memory space, andthen, put the value in it.

2.) No such arrangement is made for theconstant in assembly language.

3.) The value of literal can’t be changedduring the execution of the program.

3.) The value of the constant can be changedduring the execution of the program.

4.) Literal is the most secured thing in theassembly language.

4.) Constant is not secured as its value canbe changed at any time of the programexecution.

2.6Assembly Scheme

The design procedure of an assembler includes some steps that one need to follow inorder to design a new assembler. These steps are:1. Specify the problem.2. Specify the data structures to be used.3. Define the format of the data structures.4. Specify the algorithm that will obtain and maintain information.

2.6.1 Phases of Assembler

There are two phases in which an assembler works:1.) Analysis Phase2.) Synthesis Phase

1. Analysis Phase

15

The main tasks or jobs of Analysis phase are to create symbol table and literal table.Symbol table is one which stores all the symbols and Literal table is one that stores all theliterals present in the program or code. Symbol table stores symbols as well as addresses.Analysis phase also perform memory allocation task. For memory allocation, it uses adata structure called Location Counter(LC).The Location Counter always contain thenext memory word in the program. Initially it contains value specified by STARTstatement. Whenever a new label is encountered, it is stored in the symbol table. In orderto update the contents of LC, analysis phase needs to the size of each statement. To getthe size of each statement, analysis phase contact with mnemonic table. This process ofmaintaining the location counter is known as LC Processing.Tasks performed by Analysis phase are:1. Differentiate label, mnemonics and operands.2. If a label is present in the statement then it is put into the symbol table.3. It checks the validity of the mnemonics through mnemonic table.4. It performs LC processing.

2. Synthesis Phase

The main task of this phase is to generate equivalent machine code for given assemblycode. In generating machine code it uses some data structures like SYMTAB (SymbolTable), LITTAB (Literal Table) etc. This phase obtains machine code of the symbolgenerated by analysis phase. Address of literal used in program is obtained fromLITTAB. Opcode for mnemonics are obtained from mnemonic table.Tasks performed by synthesis phase are:1.) Obtain addresses of mnemonics, symbols and literals from their respective data

structure tables.2.) Generate machine code.

2.7 One pass Assembler

It is also known as single pass assembler as it scans the input file only once. It is muchfaster than two pass assembler, as two pass assembler scans the file two times. It creates all thedata structures like SYMTAB, LITTAB etc. It also performs LC processing. There exists aproblem in this assembler that is Forward Referencing. As the programs are scanned only once,there exist some symbols which are used earlier in the program and defined later. The problemwith symbol table that arises because of forward referencing is that entry of symbol cannot bemade in to the symbol table until its address is not known.

The solution for Forward Referencing is back patching. In the process of back patching,an additional data structure is required i.e. TII (Table of Incomplete Instructions). All thesymbols whose addresses are not known are left blank initially in TII. Later on when the symbolis defined in the program its address is stored in the TII. After the complete scan of the program,contents stored in the TII are shifted to SYMTAB.

16

2.8 Two Pass Assembler

As the name describes, two pass assembler uses two phases to scan the whole code written in theassembly language. These two phases are generally referred as Pass 1 and Pass 2. The detaileddescription of the working of Pass 1 and Pass 2 are given below.

2.8.1 Pass 1

The main job or task of Pass 1 of assemble is to assign a memory location to each of thestatement or instruction of the program. It is responsible for generation of SYMTAB, LITTABand IC (Intermediate Code). In this phase, initially value of LC is set to 0, then its value is stepby step incremented on the scanning of instructions. If a symbol is discovered in the scanningthen it is stored in to the SYMTAB data structure, If a literal is discovered, then its information isstored LITTAB data structure. Various mnemonics are converted into their relevant opcodes byusing OPCODE data structures. When the END statement is discovered then the first phase ofassembler is said to be complete and the Intermediate Code of the program is produced as output,which in further will act as an input for the Pass 2 of assembler.

2.8.2 Algorithm for Pass 1:

beginif starting address is given

LOCCTR = starting address;else

LOCCTR = 0;while OPCODE != END do ;; or EOF

beginread a line from the codeif there is a label

if this label is in SYMTAB, then errorelse insert (label, LOCCTR) into SYMTAB

search OPTAB for the op codeif found

LOCCTR += N ;; N is the length of this instruction (4 for MIPS)else if this is an assembly directive

update LOCCTR as directedelse errorwrite line to intermediate fileend

program size = LOCCTR - starting address;end

17

2.8. Flowchart for Pass 1:

Fig 2.2 Flowchart for Pass1

18

2.8.4 Intermediate code:

Intermediate code is processed form of the source code written in assembly language generatedby pass 1 of two pass assembler. This is submitted to pass 2 of the assembler as an input. Theintermediate code generated consists of some blocks that contain 3 parts:

1.) Address2.) Mnemonic opcode3.) Operands

Mnemonic opcode field further contains two things:

1.) Statement class like Imperative statement or assembler redirective2.) Opcode

2.8.5 Pass 2:

Pass 2 phase of assembler is responsible for generating the machine equivalent code forthe assembly language code. For the code generation purpose, it uses all the data structures andthe Intermediate Code that is generated by Pass 1 phase of assembler. Initially the locationcounter is again set to 0 as it was done in Pass 1. Then, pass 2 starts reading the code blocks ofintermediate code one by one. If any Assembler Directive is encountered then the value ofLocation Counter is set according to the memory address written in that particular statement. Ifany Imperative statement is encountered, then the value of location counter is updatedaccordingly. If the operand of the statement is symbol, then the symbol is searched in theSYMTAB data structures. If the operand is a literal, then it is search in LITTAB data structures.

2.8.6 Algorithm for Pass 2:

beginread a line;if op code = START then ;;

write header record;while op code != END do ;; or EOF

beginsearch OPTAB for the op code;if found

if the operand is a symbol thenreplace it with an address using SYMTAB;

assemble the object code;else if is a defined directive

convert it to object code;add object code to the text;read next line;end

write End record to the text;output text;

19

end

2.8.7 Flowchart for Pass 2:

20

Fig 2.3 Flowchart for Pass 2


21

Q4. Differentiate between literal and constant.

Q5.The assembler in first pass reads the program to collect symbols defined with offsets in atable_______:

a. Hash tableb. Symbol tablec. Both a& bd. None of these

Q6. In second pass, assembler creates _______in binary format for every instruction inprogram and then refers to the symbol table to giving every symbol an______ relating thesegment.

a. Code and programb. Program and instructionc. Code and offsetd. All of these

Q7 The Different phases of assembler are……………….. and…………………….

2.9 Summary The assembler converts the code written in assembly language into machine language It is a low level programming language which is machine dependent. It consist of number

of mnemonics that represent what operation is to be perform. An assembly language statement generally consists of 4 components.


There are two phases in which an assembler works:1) Analysis Phase: The main tasks or jobs of Analysis phase are to create symbol

table and literal table. Symbol table is one which stores all the symbols andLiteral table is one that stores all the literals present in the program or code.

2) Synthesis Phase: The main task of this phase is to generate equivalent machinecode for given assembly code. In generating machine code it uses some datastructures like SYMTAB (Symbol Table), LITTAB (Literal Table) etc

Single pass assembler scans the input file only once. It is much faster than two passassembler as two pass assembler scans the file two times. It creates all the data structureslike SYMTAB, LITTAB etc.

Two pass assembler uses two phases to scan the whole code written in the assemblylanguage. These two phases are generally referred as Pass 1 and Pass 2.

22

1) Pass1: The main job or task of Pass 1 of assembles is to assign a memory location toeach of the statement or instruction of the program. It is responsible for generation ofSYMTAB, LITTAB and IC (Intermediate Code

2) Pass 2 phase of assembler is responsible for generating the machine equivalent codefor the assembly language code.).

2.10 Glossary

Assembler: It converts the code written in assembly language into machine language. Assembly language: It is low level programming language which is machine dependent. Imperative Statements: Imperative statements are those statements that defines to the

processor that what operation is to be performed. Declarative Statements: These statements are either used to declare a storage or to

declare a constant. Assembler Directives: statements are used to direct the assembler to perform some

particular tasks during translation of the source code into destination code.

2.11 ANSWERS TO SELF ASSESSMENT QUESTIONS

1. An assembly language statement generally consist of 4 components.


2. These statements are either used to declare storage or to declare a constant. There are only twodeclarative statements in Assembly Language i.e. DS and DC. DS Stand for Declare Storagewhereas DC stands for Declare constant.

3. Direct the assembler to perform some particular tasks during translation of the source codeinto destination code.

4. See topic 2.5.1

5. B)Symbol Table

6. Code and Offset

7. Synthesis and Analysis

2.12 REFERENCES

1. System Software, Charanjeet Singh, Kalyani publishers


23

3. Systems Programming, D M Dhamdhere, Tata McGraw-Hill Education, 2011

2.13 MODEL QUESTIONS

Q1 What is assembler and how it works?

Q2 Elaborate the statement format in assembly language? Explain the use of assembly language?

Q3 Explain the uses of Pass 1 phase of assembler?

Q4 Explain the uses of Pass 2 phase of assembler?

Q5 Compare Single pass and two pass assembler?

24

Assembly Language

3.0 Objective

3.1 Introduction

3.2 Assembly language and assembler

3.3 Assembly language program to its mnemonic equivalent code

3.4 MASM

3.5 Using MASM in Visual C++ 2010

3.6 Programs in assembly language

3.7 Summary

3.8 Glossary

3.8 Answers to self-assessment questions

3.9 References


3.0 Objective

After studying this chapter you will be able to:

Write you own program in assembly language Understand prewritten assembly language programs Have knowledge of latest assemblers

3.1 Introduction

In this chapter, we will study about how to write program in assembly language, how theseprograms are decoded by assemblers and converted into machine language and then how resultsare converted back in to the assembly language. In the end, we will learn about some latestassemblers like MASM etc.

3.2 Assembly language and Assembler

As we already have discussed assembly language and assembler in previous chapters, now weshall move further and talk about few more things. Before moving further let’s just take a look atthe working of assembler and how the conversion is done from assembly language to machinelanguage and vice versa. Let’s just consider below diagram which is elaborating the working ofassembler on assembly language.

25

Fig. 3.1 Working of Assembler

As we can see in the given picture, assembler is converting assembly language into machinelanguage and vice versa using the database it has and always provide the result in assemblylanguage so that user can understand it. As we have discussed in our previous chapter theconcepts like assembly language format, mnemonic codes for the various assembly languagestatements etc. now we will move further and see how a program is converted into mnemoniccodes and then into binary codes.

3.3 Assembly language program to its mnemonic equivalent code

Before we start learn how a program written in assembly language is converted to its equivalentmnemonic code, we need to learn few things. The first statement of every program will be startstatement which will be followed by a memory location. That memory location defines the firststatement of the program or defines the location from where a program begins. After that eachstatement will get one block of memory. For our ease to understand, we always keep 200 as ouraddress of our first statement.

To understand the conversion, we will take a factorial calculator program in assembly languageand will find its mnemonic equivalent value.

Assembly language program Mnemonic code program

Label Instruction Operands MemoryLocation

Opcode Operands

Register Memory

START 200

READ N 200 09 0 212

MOVER BREG,ONE 201 04 2 233

26

MOVEM BREG,TERM 202 05 2 234

AGAIN MULT BREG,TERM 203 03 2 234

MOVER CREG,TERM 204 04 3 234

ADD CREG,ONE 205 01 3 233

MOVEM CREG,TERM 206 05 3 234

COMP CREG,N 207 06 3 212

BC LE, AGAIN 208 07 2 203

MOVEM BREG,RESULT 209 05 2 213

PRINT RESULT 210 10 0 213

STOP 211 00 0 000

N DS 1 212

RESULT DS 20 213

ONE DC ‘1’ 233 00 0 001

TERM DS 1 234

END

Here, as the given example shows, instructions and operands are converted according to theirrelevant mnemonic opcodes and memory location is always incremented by one block. So this isnot a tough part to understand. All the values of mnemonic code values were given in theprevious chapter, all conversions will be done on their basis.

Check Your Progress/ Self-assessment Questions

Q1. What is the need of assembly language?

Ans. ……………………………………………………………………………………………

…………………………………………………………………………………………………

……….…………………………………………………………………………………………

Q2. Convert this assembly language code into its mnemonic equivalent code.

START 200

27

READ FN

READ SN

MOVER AREG, FN

MOVER BREG, SN

ADD AREG, BREG

MOVEM RS, AREG

FN DC '0'

SN DC '0'

RS DC '0'

Ans.

Assembly language program Mnemonic code program

Label Instruction Operand MemoryLocation

Opcode Operand

Register Memory

3.4 MASM MASM is an acronym for Microsoft Macro Assembler whose first version wasdeveloped in 1981. This assembler was specifically designed for MS-DOS and MICROSOFTWINDOWS. First it was designed for 16-bit architecture. Now days, It has two supportedarchitectures i.e. 32-bit and 64-bit. Last version of MASM that was sold separately was 6.12,after that Microsoft included MASM into its C Compiler, VISUAL C++ and C also supportassembly language. As there have been many versions of assembly language are available in the

28

market as it is machine dependent, we should learn which particular version of assemblylanguage is used by MASM. Early versions of MASM used to support object models usingOMFs, which were actually used to generate binary equivalent for the given assembly languageprogram. From the day, when Microsoft packed MASM into its C compiler, it started usingPORTABLE EXECUTABLE FORMAT for model generation.

3.5 Using MASM in Visual C++ 2010

Follow these steps to start using visual C++ 2010 as an assembler.

1.) In the given templates for project types, click on the ‘Other Project Types’ and select‘Visual Studio Solutions’ and choose ‘Blank Solution’.

2.) In ‘Other languages option’ choose ;visual C++’ and select ‘empty project’ in ‘general’option.

29

3.) Right click on the Project in the Solution Explorer and select ‘Build Customizations’.

4.) Select ‘MASM’ and click on ‘OK’ button.

5.) Then give the name of the file. Its extension will be .asm. Then click on OK button.6.) Now, if you want you can give addition property values like start address etc. if you

want. If you don’t specify these values, then assembler will use default values.

3.6 Programs in assembly language

Now, we will make some programs in assembly language that will clear our idea how tomake various programs. Firstly, we will see two simple programs, then we will do somequestions.

Write a program to add two number in assembly language.

30

START 200

READ FN

READ SN

MOVE AREG, FN

MOVE BREG, SN

ADD AREG, BREG

MOVE RS, AREG

FN DC ‘5’

SN DC ‘4’

RS DC ‘0’

This is a simple program that will add two number that are stored in the memory labelled byFN and SN and the final result will be stored in memory location which is labelled by RS.

Write a program to print a number

START 200

READ N

PRINT N

N DC ‘5’

This program will print the constant value that is stored at the memory location labelled byN.

Check Your Progress/ Self-assessment Questions

Q3. What is MASM? Why do we need it?

Ans. ……………………………………………………………………………………………

…………………………………………………………………………………………………

……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

31

……….…………………………………………………………………………………………

Q4. Write a program to multiply three numbers?Ans.……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

……….…………………………………………………………………………………………

3.7 Summary

In this chapter, we have studied about assembly language and assembler. We learnt to convertassembly language program in to its mnemonic code from where assembler converts ourprogram in to binary language. Then we learnt about MASM, and how we can use it using visualC++ 2010. Then we have seen few programs to understand how to write a programs in assemblylanguage.

3.8 Glossary

Assembly language: It is low level programming language that is machine dependent andrequires assembler for its conversion into binary and vice versa.

Assembler: It is a program that converts assembly language to machine language andvice versa.

MASM: It stand for Microsoft Macro Assembler that comes in visual C++ and allowsprogrammer to write and execute programs in assembly language.

3.8 Answers to self-assessment questions

32

Q1. Before assembly language, programmer use to write code in binary language, which wasvery difficult to understand and debug. So, assembly language was designed. It looks likeEnglish language so it is easy to understand. The only disadvantage of assembly language is thatit requires additional program to convert it into binary language as our processor can not directlyunderstand assembly language.

Q2.Assembly language program Mnemonic code program

Label Instruction

Operand MemoryLocation

Opcode Operand

Register Memory

START 200

READ FN 201 09 0 207

READ SN 202 09 0 208

MOVER AREG, FN 203 04 1 207

MOVER BREG, SN 204 04 2 208

ADD AREG, BREG 205 01 2

MOVEM RS, AREG 206 05 2 209

FN DC ‘0’ 207 00 0 207

SN DC ‘0’ 208 00 0 208

RS DC ‘0’ 209 00 0 209

END

Q3. MASM is an acronym for Microsoft Macro Assembler whose first version was developedin 1981. This assembler was specifically designed for MS-DOS and MICROSOFT WINDOWS.First it was designed for 16-bit architecture. Now days, It has two supported architectures i.e. 32-bit and 64-bit. We need MASM, when we want to write some program in assembly language,then our basic requirement is an assembler and as now days no assembler is separately availablein market, Microsoft provides combined assembler with visual C++ 2010. We can use it to writeand execute programs in assembly language..

Q4.

START 200

READ FN

33

READ SN

READ TN

MOVER AREG FN

MOVER BREG SN

MUL AREG BREG

MOVER BREG TN

MUL AREG BREG

MOVEM AREG, RS

FN DC ‘0;

SN DC ‘0’

TN DC ‘0’

RS DC ‘0’

END

3.9 REFERENCES

1. System Software, Charanjeet Singh, Kalyani publishers



3.10 MODEL QUESTIONS

1. Write a programme to calculate sum of five odd numbers in Assembly Language.

2. Write a programme to calculate multiplication of three integers in Assembly Language.

34

Chapter 4 Macro Instructions

Structure of the lesson

4.0 Objective

4.1 Introduction

4.2 Macros

4.3Macro Expansion

4.4 Features of Macro Facility

4.4.1 Macro instruction arguments

4.4.2 Conditional macro expansions

4.4.3 Macro calls within macros

4.5 Summary

4.6 Glossary



4.9 Model Questions

4.0 Objective

After studying this lesson, student will be able to:

Explain macros. Describe features of macros Implement Arguments of macros Discuss Expansion of conditional macros. Use macros calls within macros

4.1 Introduction

In this chapter the concept of macros, macros instructions and expansion of macros willbe elaborated. Macro is an abbreviation used to define a sequence of operations. When in aprogram there is need to perform the repeated set of instructions again and again, then use ofmacros come into picture. We define a macro for that repeated set of instructions and use it in theprogram instead of repeating the same set of instructions. Also in situations when the chances ofchanging a particular set of operation is more, then it is easy to make changes in their definedmacro rather than making changes in the whole program. The changes made in the macro will be

35

automatically reflected back in the whole program. It also describes macro instruction argumentsand macro calls within macros.

4.2 Macro

While writing an assembly language program, a programmer has to repeat that blocks ofcode in the program that perform a particular task. Thus the programmer defines a singlemachine instruction to represent a block of code known as macro in the program. Once defined,the macro can be used in place of those repeated instructions performing a particular task. Thereare many definitions of macros. Some of them are given below:

1. Macro is a name or abbreviation for a part of program or for a sequence of operations.2. A set of codes for a particular operation can be defined as a macro. Defining any

subroutine like this is called macro definition. This is actually text replacing capability.3. A macro represents a commonly used group of statements in the source program.

A macro consists of a name, a set of arguments and a block of the code. It is just like afunction used to perform a particular task. The macro can be used anywhere in the program. Thecalling of Macro is called macro call .Macro call is also called as macro invocation.

There are very few differences between macro call and procedure call.

1. The macro call is made during the assembly process whereas the procedure call is madeduring the program execution.

2. In macro call the body is put into the object program. But in procedure call, the control istransferred to the program.

3. There is no need for any return statement in case of macros. But the procedure is expected toreturn something.

4. For every macro call, the body of the macro come into object program. In the procedure call,the body of the procedure appear only once in the object program.

Prototype for the macro

Each parameter begins with ‘&’. The following structure shows how to define a macro in theprogram.

Name MACRO & (parameters list)

:

Body

:

MEND

Name: is the name given to the macro.

36

MACRO: identify the beginning of a macro definition.

& Parameters list: defines the parameters that can be passed during macro call

Body: the set of statements that will be generated as the expansion of the macro.

MEND: identify the end of a macro definition.

The Keywords MACRO and MEND are the macro directives. The macros can have parametersas in subroutines. This expands the scope of the macro for various other situations. Theparameters can be formal and actual parameters as in procedures. The formal parameters arepresent in the macros definition and the actual parameters are present in the macro calls.

Let us consider an example, which shows the use of a pseudo-op named Define Constant (DC).

A 1, DATA Add contents of DATA to register 1A 2, DATA Add contents of DATA to register 2A 3, DATA Add contents of DATA to register 3

::


::DATA DC F ’2’ // is the value of the DATA to be used in the

opcode DC::

In this program, the following sequence occurs twice.


So in this case a macro can be used to perform this operation. Let’s define a macro named asADD. Following the prototype of the macro we can define it as follows:

MACROADD (is the name of the Macro)

A 1, DATA Add contents of DATA to register 1 (Body ofA 2, DATA Add contents of DATA to register 2 macro)A 3, DATA Add contents of DATA to register 3

37

MEND


Q1 What is the need of Macros.

………………………………………………………………………………………………………

…………………………………………………………………………………………………….

……………………………………………………………………………………………………..

Q2. What is the difference between the macro and Procedure call ?

………………………………………………………………………………………………………

…………………………………………………………………………………………………….

……………………………………………………………………………………………………..

Q3. Macro definition is also called macro call :

TRUE

FALSE

4.3 MACRO EXPANSION

The macro expansion is done by the interpreter or compiler automatically by replacing thepattern described in the macro. The macro expansion always happens at the compile-time incompiled languages. The tool that performs the macro expansion is known as macro expander.Once a macro is defined, the name of macro can be used rather than using the entire instructionset again and again. As you need not write the entire program repeatedly while expandingmacros, moreover the overhead associated with macros is very less. On encountering the abovedefined macro ADD, the compiler will replace it with the body of the macro (which defines theset of operations associated with that macro). You can notice the source and the correspondingexpanded source in the following code:

The macro processor replaces each macro call with the following lines:

A 1, DATA

A 2, DATA

A 3, DATA

38

The process of such a replacement is known as expanding the macro. The macro definition itselfdoes not appear in the expanded source code. This is because the macro processor saves thedefinition of the macro. In addition, the occurrence of the macro name in the source programrefers to a macro call. When the macro is called in the program, the sequence of instructionscorresponding to that macro name gets replaced in the expanded source.

Source | Expanded SourceMacro |ADD |A 1, DATA |A 2, DATA |A 3, DATA |MEND |. |. |. |ADD | A 1, DATA. | A 2, DATA. | A 3, DATA. | |

| .ADD | .. | .. | A 1, DATA. | A 2, DATA. | A 3, DATA. | .

: | .DATA DC F ‘2’ | DC DATA

4.4 Features of Macro Facility

Macro instruction arguments Conditional macro expansion Macro instructions defining macros

4.4.1 Macro Instruction Arguments

The macro calls are used to replace the repeated set of instructions in a program. But, this facilityis not as flexible as needed. Whenever a macro call is made, the coding that replaces it remainsthe same .It cannot be changed easily. There is one way to change it, by using the arguments orparameters in the macro calls. Consider the following program.

::

39

:A 1, DATA1A 2, DATA1 Block 1A 3, DATA1

:::

A 1, DATA2A 2, DATA2 Block 2A 3, DATA2

:::

DATA1 DC F ’5’DATA2 DC F ’10’

In this example, Block 1 and Block 2 of the instructions are performing the same operation buton the different data values. The first instruction performs an operation on an operand DATA1.In the second sequence the operation is being performed on operand DATA2. Now, we can usethe concept of Macro instruction arguments to handle such situations. The two differentoperands: DATA1 and DATA 2 can be passed as actual arguments in two separate calls to thesame macro to perform the same operation on them. This can be explained with the help offollowing example. The below given program contains dummy argument or parameter. Thisparameter is known as a macro instruction argument or dummy argument. The above writtenprogram can be rewritten as follows:

Source | Expanded SourceMacro |ADD &PAR |

A 1, &PAR | Dummy argumentA 2, &PAR |A 3, &PAR |MEND |

. |. |. |ADD DATA1 | A 1, DATA1 The operation. | A 2, DATA1 will be performed. | A 3, DATA1 on DATA1. |

| .

ADD DATA2 A 1, DATA2 The operation will. | A 2, DATA2 be performed on. | A 3, DATA2 DATA2. | .

40

: | .

DATA1 DC F ‘2’ | DC DATADATA2 DC F ‘7’ | DC DATA

In the above program, a dummy argument is specified by inserting an ampersand (&) symbolbefore it. Any number of arguments can be passed in the macros depending on the need of theprogrammer. The important thing to understand about the macro instruction argument is thateach argument must correspond to a definition or dummy argument on the macro name line ofthe macro definition. The supplied arguments are substituted for the respective dummyarguments in the macro definition whenever a macro call is processed.

4.4.2 Conditional Macro Expansion

The concept of conditional statements is very common in programming. All the programs aresequential in nature but in some situations we may need to change the flow of execution ofinstructions based on some conditions. This is implemented with the help of conditionalstatements. The conditional macro expansion is also used for the same purpose. There are twoimportant macro processor pseudo-operations namely AIF and AGO

a) AIF: is a conditional branch pseudo operation; the condition is tested, if it evaluates to betrue the program branches to that particular set of instructions.

b) AGO: is an unconditional branch pseudo-operation and behaves like a go to statement. Ittransfers the control of the program to the macro instruction containing the label specifiedafter go to statement.These statements are directive to the macro processor and do not appear in the macroexpansion.

The concept of conditional macro expansion can be easily explained with the help of belowgiven example.

BLOCK1 A 1, DATA1A 2, DATA2A 3, DATA 3

BLOCK2 A 1, DATA 3A 2, DATA2

BLOCK3 A 1, DATA1A 2, DATA3A 3, DATA2A 4, DATA4

41

DATA1 DC F’2’DATA2 DC F’7’DATA3 DC F’11’DATA1 DC F’15’

In the above code, the number of instructions, the data operands and the labels are different foreach loop. This program could be written as follows:-

MACRO&PAR0 LOOP &NUMBER, &PAR1, &PAR2, &PAR3&PAR0 A 2, &PAR2

AIF (&NUMBER EQ 2) .FINISH Conditional Macro that testA 3, &PAR3 value of NUMBER and if itAIF (& NUMBER EQ 3) .FINISH evaluates to be trueA 4, &PAR4 transfers the control to the

.FINISH MEND end of macro

Source | Expanded Source|

BLOCK 1 LOOP 3, DATA1, DATA2, DATA3 BLOCK 1 A 1 DATA1| A 2 DATA2

A 3 DATA3|

BLOCK 3 LOOP 4, DATA1, DATA3, DATA2, DATA4 BLOCK 3 A 1 DATA1A 2 DATA3

| A 3 DATA2A 4 DATA4

|It can be seen in the above example if the value of NUMBER comes out to be 2, and then thetwo parameters are passed to the macro. Its value is checked with the help of AIF statement. Thesame sequence of parameters is used in the macro expansion as passed in the macro call.The labels starting with the (.) like .FINISH is a macro label that transfers the control to the

statement where .FINISH is written.Thus the AIF and AGO control the sequence in which the macro processor expands the sequenceof instructions in macro expansion.

4.4.3 Macro calls within Macros

One macro can be used within the definition of another macro. Thus one macro can be calledfrom other macro. Macro calls made within other macros usually consists of various levels. Thevarious conditional macro operations like AGO and AIF can also be used to design the macroaccording to the programmer needs.

42

MACRO

MUL &PARL 1, &PAR i.e load the passed argument in the Register 1M 1, =F’4’ Multiply the contents of register with constant 4ST 1, &PAR Store the value of Register 1 in Passed variable

MEND

MACRO (A new macro calling already defined macro MUL)

MUL1 &PAR1,&PAR2MUL &PAR1MUL &PAR2

MEND

In the above example macro MUL1 calls the MUL with different parameters. This macroexpansion takes at various levels as explained below:-

Source Expanded Source Expanded Source(Level 1) (Level 2)

MUL1 DATA1, DATA2 MUL DATA1 L 1, DATA1M 1, =F’4’ST 1, DATA1

MUL DATA2 L 1, DATA2M 1, =F’4’ST 1, DATA2

DATA1 DC F’5’ DATA1 DC F’5’DATA2 DC F’3’ DATA2 DC F’3’

Al level 1, MUL1 is expanded which further made call to the two MUL macros. These two MULmacros are further expanded in the next higher level with the respective called parameters.The macro calls when used with the conditional macros can provide ample scope ofprogramming to the programmers and increases the reusability and flexibility of the code.

43


Q4. What is macro expander?

………………………………………………………………………………………………………

…………………………………………………………………………………………………….

……………………………………………………………………………………………………..

Q5. What is a macro label? Give its example.

………………………………………………………………………………………………………

…………………………………………………………………………………………………….

……………………………………………………………………………………………………..

Q6.True or False:

a) Macros cannot be nested:................

b) Macro calling involves name of the macro and the arguments to be passed :............

c) AGO is a conditional branch statement. :...................

d) MACRO and MEND are macro directives......................

Q7.The macro processor must perform:

a) Recognize macro definition and macro call

b) Save the macro definitions

c) Expand macro calls and substitute arguments

D) all of these

4.5 Summary Macro is a name or abbreviation for a part of program or for a sequence of operations.

A macro consists of a name, a set of arguments and a block of the code.

44

The macro can be used anywhere in the program. The calling of Macro is called macrocall.

On encountering the defined macro, the compiler will replace it with the body of themacro (which defines the set of operations associated with that macro).The process ofsuch a replacement is known as expanding the macro.

The flexibility of macros can be enhanced by passing any number of arguments to it. Thisis known as Macro Instructions arguments.

Just like programming, the flow of execution of instructions can be changed based onsome conditions as per requirement. This is implemented with the help of conditionalstatements.

There are two important macro processor conditional pseudo-operation namely AIF andAGO.

One macro can be used within the definition of another macro. This is called macro callswithin macros.

Macro calls made within other macros usually consists of various levels.

4.6 Glossary

Macro: - Macro is a name or abbreviation for a part of program or for a sequence ofoperations.

Macro call: - The calling of Macro is called macro call .Macro call is also called as macroinvocation.

Macro Expansion:- The macro expansion is done by the interpreter or compilerautomatically by replacing the pattern described in the macro.

Macro Conditional Expansion:-it is used to change the flow of instructions in a programfrom sequential to some conditional.

Macro label; - The label starting with the (.) and that transfer the control where label isgiven.

4.7 Answers to Self Assessment questions

1. While writing an assembly language program, a programmer has to repeat that blocks of codein the program that perform a particular task. Thus the programmer defines a single machineinstruction to represent a block of code known as macro in the program. Once defined, the macrocan be used in place of those repeated instructions performing a particular task.

2. The macro call is made during the assembly process whereas the procedure call is madeduring the program execution. In macro call the body is put into the object program but inprocedure call, the control is transferred to the program. There is no need for any returnstatement in case of macros. But the procedure is expected to return something. For every

45

macro call, the body of the macro come into object program. In the procedure call, the body ofthe procedure appear only once in the object program.

3. FALSE

4. The macro expansion is done by the interpreter or compiler automatically by replacing thepattern described in the macro. The macro expansion always happens at the compile-time incompiled languages. The tool that performs the macro expansion is known as macro expander.

5. The labels starting with the ( . ) like .END is a macro label that transfers the control to thestatement where .END is written in the program.

6. a) FALSE

b) TRUE

c) FALSE

d) TRUE

7 d) all of these

4.8 References

1. System Programming , Donovan, 2nd Edition, Tata McGraw-Hill Education.


4.9 Model Questions

1. What do you mean by Macros? Why they are needed in any programming language?2. Discuss the prototype of a macro?3. What is the need of conditional expansion of macros?4. What do you mean by AGO and AIF?5. How macros expansion takes place in a program?6. How the macro calls within macros can be implemented? Discuss in detail.

46

Lesson 5: Loaders

5.0 Objectives

5.1 Introduction

5.2 Functions of loader

5.3 Compile and go loader

5.4 General Loader Scheme

5.5 Absolute Loaders

5.6 Summary

5.7 Glossary

5.8 Answers to check your progress

5.9 References/Suggested Readings


5.0 Objectives:-

After studying this lesson, student will able to:-

List the functions of loaders. Discuss the concept of compile and go loader. Explain the general loader scheme. Define the concept of absolute loaders.

5.1 Introduction

A loader is a system program and is a part of operating system that performs the loadingfunction. It allocates memory and brings object program into memory and starts its execution.

The execution period of user program is called execution time and the translating period is calledassembly or compiles time.

47

Figure 5.1 Role of loader

Source Program – Assembly Language

Object Program - From assembler

Contains translated instructions and data values from the source program

Executable Code - From Linker

Loader - Loads the executable code to the specified memory locations and code

gets executed.

5.2 Functions of loaders

The loader performs following functions:

Allocation - The loader examines and allocates the memory space for the execution ofprogram.

Linking – It combines two or more different objects and provides needful information.

Relocation - The loader maps and relocates the address references to newly allocated memoryspace during execution.

Loading - The loader brings object program into memory.

5.3 Compile and go loader

It is a link editor in which assembler itself places the assembled instructions directly intothe designated memory locations for execution.

The instruction is read line - by – line and after that it is assigned starting address afterthe completion of assembly process.

E.G. WATFOR FORTRAN compiler. This loading scheme is also called as assemble-and-go or load-and-go system. Thus in such type of loader, assembling or compiling, linking and loading goes in one

step. As a result it does not require extra procedures.

48

Figure 5.2 Compile with go loader

Advantages

1. It is easy to implement.2. It is simple and efficient solution. It does not involve extra procedures.

Disadvantages

1. A portion of memory is wasted because the core is occupied by the assembler.2. It is necessary to retranslate or assembler the users program every time it is run.3. It is very difficult to handle multiple modules.4. To execute assembly program, it has to be assemble again and again.5. The codes of the program have to be in same language.

Self assessment questions 1

1. List various functions of loader.

________________________________________________________________________

________________________________________________________________________

2. _________________ is a link editor in which assembler itself places the assembledinstructions directly into the designated memory locations for execution.

3. It is easy to handle multiple modules in compiler and go loader. ( TRUE/ FALSE ).

______________________________________________________________________

5.4 General Loader Scheme

49

General loader produces the translated form of source code. This output containing codedform of instructions is called object program.

These object programs are not directly placed into the core. Rather these instructions anddata are saved elsewhere and can be loaded into the core whenever the code is to beexecuted.

In this, source program is translated into object program using assembler. Then it is loaded into main memory along with source code of loader. Size of loader is smaller than assembler code therefore; more space is available for the

object program. The use of an object program as intermediate data requires the addition of new program

to the system. This is called loader. The loader accepts the object program and places into core in an executable form.

Figure 5.3 General loader

Advantages1. It is not required to reassemble the program in order to run the program at later stage.2. Loader is assumed to be smaller than the assembler. As a result more memory is

available to the user.3. The assembler does not reside in memory at all times. Thus core is not wasted.

Disadvantages

1. We have to store source code of loader.

5.5 Absolute Loaders

50

The object code is loaded to particular locations in the memory. After that the loaderjumps to the specified address to begin execution of the loaded program.

The loader reads the file and places the code at the absolute address given us the file. No relocation is needed to be stored as part of an object file. Resolution of external references and linking of different modules which are

interdependent is done by the programmer assuming programmer knows memorymanagement.

In this scheme multiple segments are allowed.

Absolute loader ,the loader perform the four functions:o Allocation by programmero Linking by programmero Relocation by assemblero Loading by loader

For this the assembler must give the following information through object files:o Starting address and name of each module.o Length of each module.

Figure 5.4 Absolute loader

It requires two cards:

1. Text card2. Transfer card

Text card: It contains information about what is to be loaded.

Card Type: It indicate which type of card

0 for Text card.

1 for Transfer card.

Count: It indicates amount of information which is to be loaded.

Address: It indicates location at which information is to be loaded.

Content: It indicates binary information which is to be loaded.

51

Transfer Card: It is used to indicate where to load the program

Card type is 1.

Count is always 0.

Address: It indicate location from where execution of object programshould begin.

Content: It always kept blank.


4. ___________ card contains information about what is to be loaded.

5. Which card specifies the location at which the program is to be loaded?______________________________________________________________________

6. General loader does not require reassembling of program in order to run the programat later stage. ( TRUE/ FALSE ).

______________________________________________________________________

Algorithm for an absolute loader

Beginread Header recordreadHeaderrecordverify program name and lengthread first Text recordwhile record type is not ‘E’ dobegin{if object code is in character form convert it into international representation}Move object code to specified location in memoryRead next object program recordEndJump to address specified in End recordEnd

5.6 Summary

52

Loader is a system program that brings an executable file stored on disk into memory and startsits execution. It reads the executable file to determine the size and then create a new addressspace for the program. After that it copies instructions arguments and data into address space. Itinitializes the machine registers and jump to a start routine. A number of schemes are availabe toimplement the concept of loader.

5.7 Glossary

Loader- System program that loads the program into memory.Operating system- System program responsible for the overall resourcemanagement of the system.Assembler- It is used to perform the conversion of assembly code into machinecode.


1.

Allocation - The loader examines and allocates the memory space for the execution ofprogram.

Linking – It combines two or more different objects and provides needful information.

Relocation - The loader maps and relocates the address references to correspond to the newlyallocated memory space during execution.

Loading - The loader brings object program into memory.

2. Compiler and go loader

3. FALSE.

4. Text.

5. Transfer card.

6. TRUE.


1. Systems Programming by John J Donovan, Tata McGraw-Hill

2. Systems Programming by Dm Dhamdhere, Tata McGraw-Hill Education

3. Systems Programming Charanjeet Singh Kalyani Publications

53


1. List the advantages of using absolute loader.

2. List various functions of a loader.

3. Explain the concept of compile and go loader in detail.

4. Explain general loader scheme.

54

Lesson 6: Linkers

6.0 Objectives

6.1 Introduction

6.2 Subroutine Linkages

6.3 Relocating Loaders

6.4 Direct Linking Loaders

6.5 Relocation

6.6 Design of Absolute Loader

6.7 Boot strap Loaders

6.8 Dynamic Linking

6.9 MS-DOS Linker

6.10 Summary

6.11 Glossary




6.0 Objectives:-

After studying this lesson, student will able to:- Define the notion of linking. List different types of linkers. Explain the concept of relocation. Discuss the benefits of dynamic linking

6.1 Introduction

A program is generally divided into number of modules and these modules may or may not bepart of a single object file. Modules refer to each other by means of symbols. An object filecontains defined "external" symbols, undefined "external" symbols, and local symbols.

Each source code file after compilation results in one object file. Role of the linker is to combinemultiple object files into a single executable program. It does so by resolving the symbols.Library is a collection from where linker can take objects. The linker is also responsible toarrange the objects in a program's address space which may include relocating code. Compiler

55

assumes a fixed base location of base address. (for example, zero). Re-targeting of absolutejumps, loads and stores is usually involved in relocating the machine code.

Fig:6.1 Linker


Suppose that a program MAINR wishes to jump to some subprogram SUB. The code inprogram MAINR must contain instruction "BSR SUB". It means branch to sub routine SUB. Butthe assembler is unaware of this symbol reference. The assembler will then generate an error.solution to this problem is called subroutine linkage. The problem occurred because thesubroutine SUB is not written inside the program segment of MAINR.

The only program with the instruction BSR SUB" is that the assembler is unaware of segmentSUB and it is not able to search the value of this symbolic reference. The assembler directiveEXT is used to declare such subroutine as external and thus it should be added at the beginningof the segment MAINR. It is used to inform the assembler about the sub routine which is definedin some other segment. Variables in one segment that can be referred by other segments bedeclared using pseudo-ops INT. this concept is known as subroutine linkage.

For example

MAIN START EXT SUB..

55


Fig:6.1 Linker




For example


55


Fig:6.1 Linker




For example


56

.CALL SUB..END SUB START..RETEND

The subroutine SUB is declared as external at the beginning of MAIN. When a call to subroutineB is made, before making the unconditional jump, The current content of the program countershould be stored in the system stack which is maintained internally. To restore the programcounter of caller routine with the address of next instruction to be executed, the pop is performedwhile the subroutine B(at RET) is returned.

6.3 Relocating Loaders

Relocating loaders are needed by some operating systems, which adjust addresses (pointers) inthe executable to counterbalance for variations in the address at which loading initiates.Relocation loaders are needed for programs that are not always loaded at the same memorylocation. It helps in improving the memory utilization.

6.4 Direct Linking Loaders

The most common kind of loader is the direct linking loader which is a re -locatable loader.Source code cannot be accessed directly by the loader. Two methods can be used to load theobject code into memory. One method is to use the relative addressing and the other method is touse absolute addressing for loading the object code into the memory. In case relative addressing,it is the responsibility of the assembler to provide information related to relative addresses to theloader.

The lists of undefined symbols in the current segment but can be used in the current segment arestored in a data structure called USE table. The USE table includes the information such as nameof the symbol, address, and address relativity.

The lists of symbols which are defined in the current segment and can be referred by the othersegments are stored in a data structure called DEFINITION table. The definition table includesthe information such as symbol, address.

The assembler evolve following types of cards:1.ESD -External symbol dictionary comprises information about all symbols that are defined in thisprogram but referenced somewhere. It contains:

1 Symbol Name2 TYPE

57

3 Relative Location4 Length5 Reference no

2. TXT - Text card comprise of actual object cards

3. RLD -Relocation and linkage directory comprises information about address dependent instructions ofa program. The RLD cards contain the following information:

1 Location of the constant that needs relocation2 By what it has to be changed3 The operation to be performed4 The Format of RLD5 Reference No6 Symbol7 Flag8 Length

9 Relative Location

4. END – It signifies the end of the program and specifies starting address of for execution.

Advantages: The main task of loader is to load a source program into memory and prepares it forfurther execution. In pass - I direct link loader allocates segments and define symbols for lexicalanalysis. Each symbol in the phase assigned to next available location after proceeding. Segmentin order to minimize the amount of storage required for the total program.

Disadvantages

• It is necessary to allocate, relocate, link, and load all of the subroutines each time in orderto execute a program

– loading process can be extremely time consuming.

• Though smaller than the assembler, the loader absorbs a considerable amount of space

– Dividing the loading process into two separate programs a binder and a moduleloader can solve these problems.

6.5 Relocation

Relocation refers to the adjustment of code and data in the program to reflect the assigned loadaddresses. Relocation is achieved by linked in association with symbol resolution. Symbolresolution refers to the process of searching files and libraries to replace symbolic references orwith actual usable addresses in memory before executing a program. Relocation can be doneboth at link time and run time. Relocation at run time can be achieved using relocating loader.

58

Object code generated by the assembler is executed after it is loaded into a specified location inthe memory. The addresses of such object code will get specified only after the assembly processis over. Therefore, after loading,

Address of object code = Mere address of object code + relocation constant.

Both absolute and relative addresses can be used to map the object code in memory. Directmapping of the object code in main memory can be achieved using absolute address. Mapping ofthe object code in main memory can also be achieved by adding the value of relocation registerto the relative address. This is called relocation. It can be achieved as follows:

1. The linker merges all sections ( from all object files) of similar type into a single section ofthat type. It is done to generate a single executable file. The linker then assigns run timeaddresses to each section and each symbol.

2. Each section refers to one or more symbols which should be modified so that they point to thecorrect run time addresses based on information stored in a relocation table in the object file.

Relocation table refers to a list of pointers created by the assembler. These pointers are availablein the object file. Each entry in the table points to an address in the object code that must bechanged when the loader relocates the program.


1. An object file consists of following symbols..______________________________________________________________________

______________________________________________________________________

2. BSR instruction stands for?

______________________________________________________________________

______________________________________________________________________

3. _______________ refers to the adjustment of code and data in the program to reflect theassigned load addresses.4. Relocation table refers to a list of pointers created by the assembler. ( TRUE / FALSE )______________________________________________________________________

6.6 Design of Absolute Loader

Absolute loader creates re-locatable object files. These files are then loaded into the specifiedlocations in the memory. It is an example of absolute loader. The information related to the

59

relocation is included within the object files. The programmer must have in-depth knowledge ofthe memory management to implement the concept of absolute loader. Programmer must becapable of handling the resolution of external references or linking of different subroutines. Theprogrammer should take care of two things: first thing is : specification of starting address ofeach module to be used. In case any changes are made to any of the modules, programmer mustmake necessary changes in the starting addresses of other modules. In case of branching fromone segment to another, the absolute starting address of respective module is to be known by theprogrammer so that such address can be specified at respective JMP instruction.

Fig:6.2 Absolute loader

6.7 Bootstrap Loaders

Instructions stored in the ROM of the computer system are generally executed as and when a computer

system is switched on. There instructions are used to examine the system hardware. The process of

examining whether all system hardware is functioning properly or not is called POST or power on self

test. It checks the CPU, memory, and basic input-output systems (BIOS) for errors and stores the result in

a special memory location. Once the POST is complete, BIOS begins to activate the computer's disk

drives. In most modern computers, when the computer activates the hard disk drive, it finds the first piece

of the operating system: the bootstrap loader.

The primary function of the bootstrap loader program is to load the operating system into memory and

allows it to begin operation. It reads the hard drives boot sector to start with the process of loading

60

the computers operating system. Bootstrap loader sets up small driver programs that interface with and

control the various hardware subsystems of the computer. It sets up the memory partitions that hold the

operating system, user information and applications. It also establishes the data structures to hold the

signals, flags and semaphores that are used to communicate within and between the subsystems and

applications of the computer. Last function the bootstrap loader program is to hand over the control of the

system to the operating system.

Alternatively referred to as bootstrapping, boot loader, or boot program, a bootstrap loader is aprogram that resides in the computer's EPROM, ROM, or other non-volatile memory.

6.8 Dynamic Linking

The last step of compiling is called linking. It is performed by a linker or link editor. A program

makes use of a number of libraries files other than the source code written by the programmer.

The linker inserts code to map the shared libraries to resolve the problem of program library

references. Under static linking, executable file of each program must include the copy of the

library file. It is fast and portable as it does not require the library files linked to the executable

file to be present on the system where it is run. But, it results in memory wastage. It is possible to

share these library files among various programs. The library files are loaded separately in the

memory. Dynamic linking then links these system libraries to the program just before or during

the program execution. Multiple programs can be linked to the same library file without having

to embed the same into its executable file. Only the references to these sharable library files is

specified in the executable image. The linking takes place only when the file is executed. Also

only a single copy of that shared library file is loaded into the memory, even if it is linked to

multiple executable files.


5. Absolute loader creates ________________ object files.

6. POST stands for?

______________________________________________________________________

______________________________________________________________________

7. What is the advantage of static loading?______________________________________________________________________

61

6.9 MS DOS Linker

It refers to a linkage editor. It is used to combine multiple object modules to produce anexecutable program. The object module have the filename with extension . OBJ and includes abinary image of the translated instructions and program’s data. Therefore, it combines variousOBJ files and produces a file with extension EXE

Object Module of MS-DOS contains several different object record types.

The THEADR record specifies the name of the object module. It is typicallyoriginated by the translator from the source file name.

The PUBDEF record comprises a list of external symbols or public namesstated in the segment of object module.

The EXTDEF record involves a list of external references used in this objectmodule

Both PUBDEF and EXTDEF can contain information about the data typedesignated by an external name and these types are defined in the TYPEDEFrecord.

SEGDEF record describes the segments in object module which also haveinformation regarding the segment’s name, its length, alignmentrequirement of its base address (e.f. word or paragraph i.e. (6 byte alignment)and whether thesegment is relocatable or obsolute The LNAMES record includes a list of all the segment and class name used

in the program LEDATA records includes the translated instructions and data from the

source program i.e. the binary image of the code and data produced bytranslator.

LIDATA records specify translated instructions and data that occurrecursively in program.

FIXUPP records contains information to resolve external references and toperform address modification that are associated with relocation of segmentswithin the program.

MODEND record denotes the end of module and can contain reference to thestarting point of the program.

62

6.10 Summary

Role of the linker is to combine multiple object files into a single executable program. The linker is also

responsible to arrange the objects in a program's address space which may include relocating code.

Relocation loaders are needed for programs that are not always loaded at the same memory

location. It helps in improving the memory utilization. Relocation refers to the adjustment of code

and data in the program to reflect the assigned load addresses. Absolute loader creates re-locatable

object files. These files are then loaded into the specified locations in the memory. The primary

function of the bootstrap loader program is to load the operating system into memory and allows it to

begin operation. It is possible to share these library files among various programs. The library files are

loaded separately in the memory. Dynamic linking then links these system libraries to the program just

before or during the program execution. MS DOS linker is used to combine multiple object modules

to produce an executable program.

6.11 Glossary

Loader- the component used to load a program into memory.Linker- linker is used to combine multiple object files and library files into a single executableprogram.Program counter- Special register that points to the location of the next instructionto be executed.Pointer- A variable used to store the location of a reference or a function.Relocation- It refers to the adjustment of code and data in the program to reflect the assigned loadaddresses.


1. An object file contains defined "external" symbols, undefined "external" symbols, and localsymbols.

2. Branch to sub routine.

3. Relocation

4. TRUE.

5. re-locatable

6. POWER ON SELF TEST.

63

7. Static linking is fast and portable as it does not require the library files linked to the executablefile to be present on the system where it is run.



2. Systems Programming by D M Dhamdhere, Tata McGraw-Hill Education



1. What is the need of subroutine linkage?

2. Explain the role of bootstrap loader.

3. What is the advantage of dynamic linking?

4. Explain absolute loader.

5. What do you mean by a linker?

64

Lesson 7: Editors

7.0 Objective

7.1 Introduction

7.2 Types of text editor

7.3 Design of editor

7.4 Line Editor

7.5 Stream Editor

7.6 Screen Editor

7.7 Word Processor

7.8 Structure Editor

4.9 Summary

7.10 Glossary




7.0 Objectives

After studying this lesson, student will able to:-

Define the concept of editors. Discuss the design of editors. Explain different types of editors available.

7.1 Introduction

Editors are used to draft or modify codes in an operating system. Different types of editors areavailable from the programmer to use. Editor is an interactive tool that allows the programmersto alter the text on the run. Earlier only the simple text editors were available. Latest editorsallow you to create documents with different formats and provide advanced features to insertvarious objects in it.

7.2 Types of Text Editors

Following are the various types of editors based on how editing is performed and outputgenerated by it.

65

Line Editors – End of line marker is used to delimit lines or identify the end of one line duringoriginal creation and during successive revision , the line number explicitly specifies the line.

Stream Editors –Although an idea is similar to the line editor but the whole texts evaluated asstream of characters . So the line number does not specify the location for revision. Locations forrevision are either specified by using pattern contexts or through explicit positioning. Text-onlydocuments can be best created using line and stream editors. Example: sed in Unix/Linux.

Screen Editors - It is used to view a document as a two dimensional plane. You can also work onthe document using the same two dimensional plane. Content for revision purpose can bespecified within the displayed portion at any place. Example: vi, maces etc.

Word Processors – It is an advanced type of editor. Besides all the basic functionalities of lineand stream editor, it also provides support for display of various objects like images, graphicsand provide a number of choices of fonts, style, etc.

Structure Editors – Structure editor is used with specific type of documents. It is mainly used tokeep the structure/syntax of the document intact.

7.3 Design of Editors

The major functions in editing are travelling, editing, viewing and display. Travelling refers tothe movement of editing context to a new position inside the text. It may be implied in a usercommand or can also be explicitly specified by the user. Viewing means formatting a text in amanner as required by the user. This is an abstract view, independent of the physical aspects ofan IO device. Display function is used to map this abstract view into the physical characteristicsof the display device such as monitor, printer, etc. It denotes where a particular view may appearon the user’s screen. Separating both the functions, i.e. viewing and displaying helps indesigning multiple windows on the same screen, parallel edit operations using the same displayterminal, etc. most of the editors tend to combine the two functions.

66

Figure 7.1 Editor structure

The figure above illustrates the schematic structure of a simple editor. For a given position of theediting context, the editing and viewing filters work on the internal form of text to prepare theform suitable for viewing and editing. These forms are then put in the editing and viewing bufferrespectively. It is the responsibility of the viewing and display manager to make provisions forappropriate display of text. When the cursor position changes, the filters operate on a newportion of text to update the contents of the buffers. Once the editing has been performed, theediting filter reflects changes into the internal form and updates the contents of the viewingbuffer.

7.4 Line Editor

A line editor is a text editor. Each editing command in a line editor is applied to one or morelines of text designated by the user. Line editors are very old and obsolete. They were mainlyused for interaction with teleprompter. Teleprompter refers to an approach of connecting theprinter directly with the keyboard, with no video display. No mechanism was provided tonavigate interactively in a document.

Line editors are limited to typewriter keyboard text-oriented input and output methods. Editingusing line editors happened line-at-a-time. Typing, editing, and document display do not occursimultaneously. Typically, typing does not enter text directly into the document. Instead, usersmodify the document text by entering terse commands on a text-only terminal. Commands andtext, and corresponding output from the editor, will scroll up from the bottom of the screen in theorder that they are entered or printed to the screen. Although the commands typically indicate the

67

line(s) they modify, displaying the edited text within the context of larger portions of thedocument requires a separate command. The reference of “current line” is kept by the line editorsto which the entered commands are applied .On the contrary , modern screen based editors letsthe user to interactively and directly move , select, and modify portions of the document.Normally, line numbers or a search based context (especially when making alterations withinlines) are used to specify which part of the document is to be edited or displayed.


1. _____________________ is used to view a document as a two dimensional plane.

2. List various functions of editor.

______________________________________________________________________

______________________________________________________________________

3. ________________ refers to an approach of connecting the printer directly with thekeyboard, with no video display.

7.5 Stream Editor

A stream editor is used to view the entire text as a sequence of characters. It means that the textis not delimited by end of line characters and the edit operations can be performed beyond theboundary of a line. Stream editors support character, line, and context oriented commands. Astream editor performs the text transformations on an input stream. That input stream may be afile or input from a pipeline. The pointer can be employed using positioning or searchcommands. There is difference in way the text is displayed and it appears on the paper if printed.Only difference between the line editor and the stream editor is that the latter views the completetext as single stream of characters. Locations for revision are either specified by explicitpositioning or by using pattern context. eg. sed in Unix/Linux.

sed

The sed allows only one pass over the input(s), and is consequently more efficient. Another keyfeature of sed is that it can filter text in a pipeline. This feature separates the sed from othertypes of editors.

In UNIX operating system, sed can be invoked as follows:

sed OPTIONS... [SCRIPT] [INPUTFILE...]

It is not mandatory to specify the input file. In case you wish to filter the text inside thestandard input, you can leave the parameter INPUTFILE as empty or specify - for

68

it. sed considers the script and not an input file if none of the other options specifiesa script to be executed.

Working of sed

Two data buffers are maintained by sed. The two are named activepattern space and auxiliary hold space. Both the buffers are initially empty. For every line ofinput from the input stream, sed performs the following cycle on it:

It reads a single line from the input stream and places the same in the pattern space. The newlineat the end of the line is removed. Commands are then executed for that line. An address thatrefers to a condition code is linked to each command. Command is executed only after theverification of condition code. Once the pointer reached the end of the script, the contents ofpattern space are printed out to the output stream. The cycle is repeated for each line.

'D' command is used to hold the pattern space between the two cycles, otherwise the patternspace is deleted between two cycles. It is beyond the scope of this lesson to discuss all thecommands using the sed editor.

7.6 Screen Editor

1. A primitive form of editors that allows you to edit the text on the (display) screen bymoving the cursor to the specific location.

2. Ex : visual Basic Studio , html editors , etc3. A screen editor is based on the principle of WYSIWYG i.e. What You See Is What You

Get.4. It displays a screen full of text at a time.5. As an output, in screen editors it is possible to see the effect of an edit operation on the

screen.6. In such editors the document is printed in the same form as is exactly displayed on the

screen.7. It allows us to see and access many lines at a time.

Examples

1. Pico2. GNU Emacs3. Xedit4. Vi

69

7.7 Word Processor

A word processor is an advanced form of an editor. It is also popularly categorized as computersoftware application. It is capable of performing editing, formatting, insertion of non-textualobjects and also printing of documents.

Traditionally the word processor was used as a keyboard text-entry and printing functions of anelectric typewriter, Tape or floppy disk was used to record the text. Depending on company tocompany, word processors supported monochrome display and the ability to save documents onmemory cards or diskettes. The word processors in modern day are very advanced. They providesupport like spell-checking programs, organizing of text in table forms, applying various fontstyles on text, sorting, advanced searching. Text formatting options like bold, italics, underline,font, style. Users is allowed to move the section of text from one place to another merge text andsearch and replace the words. Example: WordStar and MS Word


4. sed is an example of _________________ editor.

5. Two buffers maintained by sed are ___________ and _____________ .

______________________________________________________________________

______________________________________________________________________

6. A primitive form of editors that allows you to edit the text on the (display) screen bymoving the cursor to the specific location is called screen editor. ( TRUE / FALSE ).______________________________________________________

7.8 Structure Editor

Structure editor is used with different programming languages such as C++, or HTML. It helpsto put the document or the code in proper structural format by automatically inserting variousdelimit symbols at the end of a block. It also provides automatic indentation at the beginning andend of each block. It aids in maintaining particular structure/syntax used by a specified language.Some advanced structure editors also provide hints in term of name of the method that mayfollow a given object. Structure editors are generally embedded within software developmenttools like eclipse or .NET.

Text editing can be combined with structure editing in user interface to form a single hybridediting tool. Emacs is an example of one such hybrid editing tool. It is a text editor that alsoprovides support like the manipulation of words, sentences, and paragraphs as structures that areinferred from the text. Page Maker and Dreamweaver are two more examples of such editingtools. Dreamweaver is used for marked up web documents that supports the display andmanipulation of raw HTML text.

70

Structure editors for marked-up text like HTML are useful in browsing via document. The useof structure specifies the requirements of editing.

7.9 Summary

A number of editors are available based on how editing is performed and output generated bythem. The major functions in editing are travelling, editing, viewing and display. Each editingcommand in a line editor is applied to one or more lines of text designated by the user. A streameditor is used to view the entire text as a sequence of characters. A stream editor performs thetext transformations on an input stream. That input stream may be a file or input from a pipeline.Screen editors allows you to edit the text on the (display) screen by moving the cursor to thespecific location. A word processor is an advanced form of an editor. It is also popularlycategorized as computer software application. It is capable of performing editing, formatting,insertion of non-textual objects and also printing of documents. Structure editor is used withdifferent programming languages such as C++, or HTML. It aids in maintaining particularstructure/syntax used by a specified language. Some advanced structure editors also providehints in term of name of the method that may follow a given object.

7.10 Glossary

Editor- Editors are used to draft or modify documents.

Stream- It refers to a sequence of characters.

sed- It is an example of editor used in UNIX/LINUX operating system.

Buffer- It refers to a temporary storage area.

Marked-up text- It is used to provide instructions to a web browser about the lookand working of a web page.


1. Screen editor.

2. Editing, travelling, display and viewing are major functions of an editor.

3. Teleprompter.

4. Stream.

5. The two buffers maintained by sed are active pattern space and auxiliary hold space.

6. TRUE.



71

2. Systems Programming by D M Dhamdhere, Tata McGraw-Hill Education



1. What is the role of an editor?

2. Explain the design of simple editor.

3. How a stream editor is different from line editor?

4. Explain the benefits of using structure editor.

72

Lesson-8 Fundamentals of Compiler Design

Structure of lesson8.0 Objective8.1 Introduction to Compilers8.2 Introduction to Translators

8.2.1 Various Types of Translators8.3Interpreters

8.3.1 Self Interpreters

8.3.2 Just in time complication

8.3.3 Byte code interpreters

8.3.5 Abstract syntax tree interpreters

8.3.5 Pros and cons of Interpreter

8.4 Debuggers8.3.1 Features of Debugging

8.3.2 How to Debug the Code8.5 Bootstrapping For Compilers8.6 Summery8.7 Glossary8.8 Answers to check your progress/self-assessment questions.

8.9 References

8.10 Model questions

8.0 ObjectiveAfter reading this chapter the students will able to:

Discuss the need and processes of various translators.

Explain various translators are different from each other due to their requirement and process of

implementation.

Explain the concept of interpreters and its types.

Evaluate the performance of debugger and compiler and their importance

8.1 Introduction to CompilersProgramming for right on time PCs was principally composed in low level computing construct. In spiteof the fact that the first abnormal state dialect is almost as old as the first PC, the constrained memorylimit of right on time PCs prompted significant specialized difficulties when the first compilers weredesigned. The main abnormal state programming language (Plankalkül) was proposed by Konrad Zuse in1943. The primary compiler was composed by Grace Hopper, in 1952, for the A-0 programming dialect;the A-0 worked more as a loader or linker than the present day idea of a compiler. The main autocode andits compiler were produced by Alick Glennie in 1952 for the Mark 1 PC at the University of Manchesterand are considered by some to be the initially aggregated programming dialect. The FORTRAN group

73

drove by John Backus at IBM is for the most part credited as having presented the first finish compiler in1957. COBOL was an early dialect to be gathered on various architectures, in 1960.In numerousapplication spaces the thought of utilizing a more elevated amount dialect rapidly got on. As a result ofthe extending usefulness upheld by more up to date programming dialects and the expanding intricacy ofPC architectures, compilers have turned out to be more minds boggling. Early compilers were composedin low level computing construct. The principal self-facilitating compiler – equipped for accumulating itsown particular source code in an abnormal state dialect – was made in 1962 for LISP by Tim Hart andMike Levin at MIT. Since the 1970s it has gotten to be regular practice to execute a compiler in thedialect it orders, albeit both Pascal and C have been mainstream decisions for usage dialect. Building aself-facilitating compiler is a bootstrapping issue—the first such compiler for a dialect must be gatheredeither by hand or by a compiler written in an alternate dialect, or (as in Hart and Levin's Lisp compiler)assembled by running the compiler in a Tran.A compiler is an uncommon project that proceduresarticulations written in a specific programming dialect and transforms them into machine dialect or"code" that a PC's processor employments. Commonly, a software engineer composes dialectproclamations in a dialect, for example, Pascal or C one line at once utilizing a supervisor. The record thatis made contains what are known as the source proclamations. The software engineer then runs the properdialect compiler, indicating the name of the record that contains the source proclamations. At the pointwhen executing (running), the compiler first parses (or breaks down) the greater part of the dialectarticulations grammatically in a steady progression and after that, in one or more progressive stages or"passes", fabricates the yield code, verifying that announcements that allude to different explanations arealluded to effectively in the last code. Generally, the yield of the accumulation has been called item codeor in some cases an article module. (Note that the expression "object" here is not identified with articlesituated programming.) The item code is machine code that the processor can transform or "execute" onedirection at once. All the more as of late, the Java programming dialect, a dialect utilized as a part of itemarranged programming, has presented the likelihood of ordering yield (called byte code ) that can keeprunning on any PC framework stage for which a Java virtual machine or byte code translator is given tochange over the byte code into directions that can be executed by the genuine equipment processor.Utilizing this virtual machine, the byte code can alternatively be recompiled at the execution stage by awithout a moment to spare compiler.Generally in some working frameworks, an extra step was needed after gathering - that of determining therelative area of directions and information when more than one article module was to be keep running inthe meantime and they cross-alluded to one another's guideline successions or information. Thisprocedure was some of the time called linkage altering and the yield known as a heap module.A compiler works with what are here and there called 3GL and larger amount dialects. A constructingagent chips away at projects composed utilizing a processor's constructing agent dialect. (See Fig.8.1) Itacquires input as a source program usually written in a high-level language and produces a comparabletarget program usually in assembly or machine language. The main part of this translation process is thatthe compiler reports its user the existence of errors in the source program.

Fig.8.1. A compiler

74

Two parts are important for executing a program in HLL programming language. The source programmust initial be compiled and translated into an object program (See Fig. 8.2(a)) Next the results of objectprogram are overloaded into a memory execute. (See Fig. 8.2(b))

Fig.8.2. (a) Compilation Process

Fig.8.2. (b) Processing of Object Program

The compiler has to go through many processes before achieving its main programming language. Theseprocesses may be named as phases. The diagram below is showing the different phases of the compiler

Figure 8.2 (c) Phases of Compiler

75

8.2 Introduction to TranslatorsA translator is a computer program that takes as input a program written in one language and produces asoutput of a program in another language. The translator performs another very important process i.e. errordetection. Any destruction of the HLL requirement would be detected and report to the programmers. Thesignificant role of translator is to translating the HLL program input into an equivalent ML program andproviding investigative messages whenever the programmer violates specification of the HLL.

8.2.1 Various types of translatorsThere are various types of translators described below:

a) Assembler

b) Compiler

c) Interpreter

d) Decompiler

e) Disassembler

a) Assembler: Assembler is a computer program which is used to translate program written in assembly

language into a machine language. The translated program is called as object program. Assembler

checks each instruction for its correctness and generates diagnostic messages, if there are mistakes in

the program (See Fig.8.3). It translates mnemonic operation codes to their machine language

equivalents. Assigning machine addresses to symbolic labels. Assembler pseudo-instructions provide

instructions to assembler itself. They are not translated into machine instructions such as Start or End.

The output of assembler program is called the object code or object program. The object code is

usually a machine code, also called a machine language, which can be understood directly by a

specific type of CPU (Central Processing Unit), such as PowerPC.

Fast assembler (FASM) is one of the examples of assembler.

76

Fig.8.3 Assembly Process

Compilers are designed to convert source code into an assembly language or some anotherprogramming language. An assembly language is a human-readable notation for the machine languagethat a specific type of CPU uses.

b) Compiler: A compiler is a program that translates a program written in HLL to execute a machine

language. The process of transferring HLL into object code is lengthy and complex process as

compared to assemblers. Compilers have diagnostic capabilities and prompt the programmer with

appropriate error message while compiling a HLL program (See Fig 8.4).The process is repeated until

the program is mistake free and translated to an object code. Compilers also have the ability of linking

subroutine of the program. Example of a compiler is Microsoft Visual Studio.

Fig.8.4 Compiler

c) Interpreter: In software engineering, a interpreter is a system program that straightforwardly

executes, i.e. performs, directions written in a programming or scripting dialect, without already

ordering them into a machine dialect program. A translator by and large uses one of the accompanying

procedures for project execution: parse the source code and perform its conduct straightforwardly

make an interpretation of source code into some proficient middle of the road representation and

instantly execute this unequivocally execute put away precompiled code made by a compiler which is

a piece of the mediator framework

Early forms of the LISP programming dialect and Dartmouth BASIC would be samples of the firstsort. Perl, Python, MATLAB, and Ruby are samples of the second, while UCSD Pascal is a case of the

76









Fig.8.4 Compiler









76









Fig.8.4 Compiler









77

third sort. Source projects are assembled early and put away as machine free code, which is thenconnected at run-time and executed by a translator and/or compiler (for JIT frameworks). A fewframeworks, for example, Smalltalk, contemporary adaptations of BASIC, Java and others mightlikewise join two and three.While understanding and gathering are the two principles implies by which programming dialects areexecuted, they are not fundamentally unrelated, as most deciphering frameworks likewise performsome interpretation work, much the same as compilers. The expressions "deciphered dialect" or"gathered dialect" mean that the standard execution of that dialect is a translator or a compiler,individually. An abnormal state dialect is in a perfect world a reflection free of specific usage.

Fig.8.5 Interpreter

d) Decompiler: Decompiler is a computer program that performs repeal process of compiler. Decompiler

normally applied to a program that translates executable programs means the output from a compiler

into source code in a high level language. It is used for the recovery of missing source code and is also

useful forcomputer security, interoperability and error correction.

e) Disassembler: Adisassembler is a computer program that translates machine language into assembly

language which is the inverse process an assembler. A disassembler is different from a decompiler

which targets a high level language rather than an assembly language.

8.2 Interpreters

A PC is a coordinated accumulation of equipment segments, which are able to do executing guidelines

called "object code" put away in a PC's memory segment. The PC's control segment takes the article code

put away as a string of two fold bits ( i.e.0's and 1's), proselytes the bits to voltage levels, and transmits

the voltage levels to its equipment parts which carry out, or execute, operations as coordinated by the

voltages, as determined in the item code. The progressions of transformation, transmission, and execution

by equipment segments are called interpretation. An interpreter is likewise an translator, yet rather than

making an interpretation of the source project to a target program, it translates every source teach particle

and produces the outcomes e.g. the duplication result (rather than the objective directions to process the

augmentation result, which is the thing that a compiler would produce).There is no spared interpretation

78

or target program, just the aftereffects of the reckonings. This accepts that as the interpreter is breaking

down the source program it can deliver the suitable double code for the voltages that direct the operation

of the equipment parts. An interpreter is quicker than a compiler in light of the fact that it has less stages

e.g. no streamlining stage and it delivers the consequences of the processing is as it deciphers the source

instructions. An interpreter is too less difficult than a compiler, however the calculations determined in

the source system are done less proficiently. Underneath we will study the different Distinctions of

Interpreters

8.3.1 Self- interpreters

A self-interpreter is a programming dialect interpreter written in a programming dialect which can

translate itself; a sample is a BASIC interpreter written in BASIC. Self-interpreters are identified with

self-facilitating compilers. If no compiler exists for the dialect to be deciphered, making a self-interpreter

requires the execution of the dialect in a host dialect (which may be another programming dialect or

constructing agent). By having a first interpreter, for example, this, the framework is bootstrapped and

new forms of the interpreter can be produced in the dialect itself. It was along these lines that Donald

Knuth built up the TANGLE interpreter for the dialect WEB of the mechanical standard TeX typesetting

system. Defining a code is generally done in connection to a unique machine (alleged operational

semantics) or as a numerical capacity (denotational semantics). A dialect might likewise be characterized

by an interpreter in which the semantics of the host dialect is given. The meaning of a dialect by a self-

interpreter is not all around established (it can't characterize a dialect), but rather a self-interpreter

educates a pursuer regarding the expressiveness and polish of a dialect. It additionally empowers the

interpreter to decipher its source code, the initial move towards intelligent interpreting.

8.3.2 Just in time complication

Further obscuring the refinement between interpreters, byte-code interpreters and assemblage is in the

nick of time gathering (JIT), a system in which the middle representation is arranged to local machine

code at runtime. This gives the effectiveness of running local code, at the expense of startup time and

expanded memory use when the byte code or AST is initially aggregated. Versatile enhancement is a

corresponding method in which the interpreter profiles the running program and aggregates its most every

now and again executed parts into local code. Both methods are a couple of decades old, showing up in

dialects, for example, Smalltalk in the 1980s. In the nick of time assemblage has picked up standard

consideration amongst dialect implementers lately, with Java, the .NET Framework and most present day

JavaScript usage now including JITs.

79

8.3.3 Byte code interpreters

There is a range of conceivable outcomes in the middle of interpreting and aggregating, contingent upon

the measure of investigation performed before the project is executed. For instance, Emacs LISP is

assembled to byte code, which is an exceedingly packed and enhanced representation of the Lisp source,

however is not machine code (and hence not attached to any specific equipment). This "assembled" code

is then deciphered by a bytecode interpreter (itself written in C). The accumulated code for this situation

is machine code for a virtual machine, which is actualized not in equipment, but rather in the byte code

interpreter. The same methodology is utilized with the Forth code utilized as a part of Open Firmware

frameworks: the source dialect is arranged into "F code" (a byte code), which is then deciphered by a

virtual machine.

8.3.4 Abstract syntax tree interpreters

In the range in the middle of interpreting and aggregating, another methodology is to change the source

code into an improved conceptual language structure tree (AST), then execute the project taking after this

tree structure, or utilization it to produce local code in the nick of time. In this approach, every sentence

should be parsed just once. As preference over byte code, the AST keeps the worldwide system structure

and relations between articulations (which is lost in a byte code representation), and when packed gives a

more reduced representation. Thus, utilizing AST has been proposed as a superior moderate arrangement

for without a moment to spare compilers than byte code. Additionally, it permits the framework to

perform better investigation amid runtime.

8.3.5 Pros and Cons of InterpreterIn this pros and cons of Interpreter are discussed below.

Pros of Interpreter When execution speed is not essential then interpreters are valuable for the program development.

Compilation stage is not so needed because execution of process can be completed into a single stage.

During runtime it is possible to modify the codes.

It is actually helpful for debugging the codes since source code execution can be analyze in an IDE

(Integrated Development Environment).

It also facilitates interactive code development.

Cons of Interpreter In a program with a loop same statement will be translated every time it is encountered. Therefore,

the interpreter programs are usually slower in execution then compile programs.

80

Translation has to finish every time when the program is running because no object code is produced.

Check your progress/self-assessment questions

Q.1 what is the difference between compiler and Interpreter?

_____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Q.2 What are the phases of compiler. Name them.

_________________________________________________________________________________

_________________________________________________________________________________

_________________________________________________________________________________

_________________________________________________________________________________

_____________________________________________________________________________

Q.3 Write down the disadvantages types of Interpreters?

_________________________________________________________________________________

_________________________________________________________________________________

_________________________________________________________________________________

_________________________________________________________________________________

____________________________________________________________________________

Exercise 1

a. Explain different phases of compiler?

b. What is the need of compilers?

c. Explain different types of interpreters?

d. Explain translators and their different types with example?

8.4 Debuggers

81

A debugger or investigating device is a PC program that is utilized to test and troubleshoot differentprojects (the "objective" system). The code to be analyzed may then again be running on a guideline settest system (ISS), a method that permits awesome force in its capacity to end when particular conditionsare experienced however which will commonly be fairly slower than executing the code specifically onthe proper (or the same) processor. A few debuggers offer two methods of operation—full or incompleterecreation—to utmost this effect.A "trap" happens when the system can't ordinarily proceed with as a result of a programming bug orinvalid information. For instance, the project may have attempted to utilize a guideline not accessible onthe present form of the CPU or endeavored to get to occupied or ensured memory. At the point when theproject "traps" or achieves a preset condition, the debugger normally demonstrates the area in the firstcode in the event that it is a source-level debugger or typical debugger, ordinarily now seen inincorporated improvement situations. In the event that it is a low-level debugger or a machine-dialectdebugger it demonstrates the line in the dismantling (unless it additionally has online access to the firstsource code and can show the fitting area of code from the gathering or assemble.

8.4.1 Features of DebuggerOrdinarily, debuggers offer a question processor, image resolver, expression translator, and investigatebolster interface at its top level. Debuggers additionally offer more complex capacities, for example,running a system orderly (single-venturing or project liveliness), ceasing (breaking) (stopping the projectto look at the present state) at some occasion or determined direction by method for a breakpoint, andfollowing the estimations of variables. Some debuggers can alter system state while it is running. It mightlikewise be conceivable to proceed with execution at an alternate area in the system to sidestep anaccident or legitimate slip.The same usefulness which makes a debugger valuable for killing bugs permits it to be utilized as aproduct splitting instrument to avoid duplicate security, advanced rights administration, and otherprogramming assurance highlights. It frequently likewise makes it valuable as a general check device,deficiency scope, and execution analyzer, particularly if direction way lengths are shown. Most standardtroubleshooting motors, for example, gdb and dbx, give comfort based summon line interfaces. Debuggerfront-finishes are well known augmentations to debugger motors that give IDE joining, systemmovement, and representation highlights. Some early centralized server debuggers, for example, Oliverand SIMON gave this same usefulness to the IBM System/360 and later working frameworks, as longprior as the 1970s.

8.3.2How to debug the codeThe process of debugging is shown in Fig.8.6. There are major four processes in debugging. First of allthe location of the error is found out. Further the design layout of the error repairmen is determined. In thenext step the error is repaired by using suitable approach. At the end the program is re-tested for thefurther processing.

Fig.8.6. Debugging Process

82

There are two types of debugger. Console mode debuggers and graphical or visual debuggers. Consoledebuggers are usually a part of language, i.e.,they are included in standard libraries of language. For aconsole debugger the user interface is the keyboard and console mode window. During the execution ofthe program the lines of the source code pass to the console window. Visual debuggers are components ofmulti-featured IDE (interactive development environment). It is powerful and easy to use debugger. Thevisual debugger provides the same user interface like the graphical text editor. The special margin area isprovided to the left for breakpoint symbols, current line pointer etc.The various features of the debuggers have been shown below:-

Debuggers help to provide breakpoints in source code.

They enable a single step mode.

With the debuggers it is possible to view memory of process.

They provide the functionality to change values at run time.

It is possible to view CPU registers.

It is easy to disassemble instructions.

They load memory dump after a crash.

It is used for server processes.

GDBand WinDbg are two popular debuggers. GDB is the GNU project debugger which allowsprogrammer to see the internal working of other program. GDB mostly runs on Microsoft windowvariants and UNIX. WinDbg is multipurpose debugger provided for Microsoft windows. It helps to debugthe user applications, device drivers and operating system in kernel mode.

8.5 Bootstrapping For CompilersBootstrapping is a process of compiler writing in the source language which is expected to be compiled. Itis also known as self hosting complier. The notation used to represent it is T-diagram. For constructionthree languages are essential. The T notation is shown in the Fig 8.7.

Fig 8.7 T diagram

i. Source language (S):- It is complied by newly written complier.

ii. Implementation language: - It is required for new complier writing.

iii. Target language: - It is generated by new complier.

Suppose we have to create a new complier in a new language ‘A’ which uses implementation language ‘I’as a source language and produces code for ‘B’ using same implementation language ‘B’. In this bothcompilers are represented as follows:-

83

AiZ IbB

To create a new complier ABZ the bootstrapping use as:

AiZ + IbB = AbZBootstrapping has following advantages:-

It is a type of non-trivial test for the language.

There is only need for the language being complied.

The development of compiler can also be done in high level language.

The various improvements in the back end of compiler improve the general programs as well as

compiler itself.

The various compilers for different programming languages are bootstrapped including compilers forJava, Lisp, Python, Pascal, ALGOL and C etc.

8.6 SummaryIn this chapter the role of various translators like complier, interpreter, assembler and disassembler alongwith their pros and cons has been explained. The working of debuggers along with detail process also hasbeen explained. The bootstrapping for compliers also has been discussed in detail.

8.7 GlossaryCompiler: A compiler is a program that translates a program written in high level language to execute amachine language.Translator: A translator is a computer program that takes as input a program written in one language andproduces as output of a program in another language.

84

Interpreter: It is a computer program that directly executes instructions written in a programminglanguage before compiling them into a machine language program.Assembler: Assembler is a computer program which is used to translate program written in assemblylanguage into a machine language.Debugger: A program which is used for the testing of the other programs.Bootstrapping: A bootstrapping is a process of writing a compiler in the source language which isexpected to be compiled.

Check your progress/self-assessment questions

Q.4 Define debugger?

_______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________Q. 5 write the features of Debugger?

_____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Exercise 2

a. What is the role of a debugger in compiler?b. Explain the process of bootstrapping in compiler?c. Give advantages and disadvantages of debugger?d. What is the process of debugging the code?

8.8 Answers Check your progress/self-assessment questions

1.

No Compiler Interpreter

1 Compiler Takes Entire program as inputInterpreter Takes Single instruction asinput .

2 Intermediate Object Code is GeneratedNo Intermediate Object Code isGenerated

3Conditional Control Statements areExecutes faster

Conditional Control Statements areExecutes slower

85

2. 1. Lexical Analyzer2. Syntax Analyzer3. Semantic Analyzer4. Machine independent code Optimizer5. Code Generator6. Machine Dependent Code Optimizer

3. a. Interpreter programs are usually slower in execution then compile programs.b. Translation has to finish every time when the program is running because no object code isproduced.

4. A debugger or investigating device is a PC program that is utilized to test and troubleshoot differentprojects (the "objective" system).

5. a. Debuggers offer a query processor, symbol resolver, expression interpreter, and debug supportinterface at its top levelb. Debuggers also offer more sophisticated functions such as running a program step by step8.9 References/Suggested Readings1. System programming, John J Donovan, Tata McGraw-Hill Education, 1st Edition.

2. Systems Programming and Operating System, D M Dhamdhere, Tata McGraw-Hill

Education, 1st Edition.3. System programming, A.A. Puntambekar, Technical publications, 3rd Edition.

8.10 Model Questions1. Explain various phases of Compilers.2. Explain various types of translators.3. Explain the concept of Bootstrapping.

86

Lesson-9 Finite automata and grammar

Structure of lesson

9.0 Objective

9.1 Introduction

9.2 Deterministic finite automata

9.2.1 Formal definition

9.2.2 String processing in DFA

9.3 Non deterministic finite automata


9.3.2 String processing in NDFA

9.4 Differences between DFA and NDFA

9.5 Equivalence of DFA and NDFA

9.5.1 Conversion of NDFA to DFA

9.5.2 Conversion of NDFA to DFA with the help of lazy creation method

9.6 Grammars

9.6.1 Formal Definition

9.7 Types of Grammars

9.8 Regular Grammar

9.9 Context Free Grammar

9.10 Context Sensitive Language

9.11 Unrestricted Grammars

9.12 Summary

9.13 Glossary

9.14 Answers to check your progress/self-assessment questions.


9.16 References

9.0 Objective

After studying this lesson students will able to:

Explain Finite automata

87

Differentiate between deterministic finite automata and non-deterministic finite automata.

Explain String processing in DFA and NDFA.

Explain Equivalence of DFA and NDFA.

Discuss grammar and types of grammar.

9.1 Introduction to Finite automata

Finite-state automaton or just a state machine is a numerical model of calculation used to plan

both PC programs and consecutive rationale circuits. It is considered as a conceptual machine

that can be in one of a finite number of states. The machine is in standout state at once; the state

it is in at any given time is known as the current state. It can transform starting with one state,

then onto the next one started by an activating occasion or condition which is known as a

transition. A specific FSM is characterized by a rundown of its states, and the activating

condition for every transition.

The conduct of state machines can be seen in numerous gadgets in cutting edge society which

performs a foreordained grouping of activities relying upon a succession of occasions with which

they are exhibited. Basic samples are candy machines which administer items when the best

possible blend of coins is saved, lifts which drop riders off at upper floors before going down,

activity lights which change grouping when autos are holding up, and mix locks which oblige the

information on mix numbers in the best possible request. Finite-state machines can demonstrate

countless, among which is electronic outline robotization, correspondence, convention plan,

language parsing and other designing applications. In science and computerized reasoning

exploration, state machines or pecking order of state machines have been utilized to portray

neurological frameworks. In phonetics, they are utilized to depict basic parts of the sentence

structures of common dialects. Considered as a dynamic model of calculation, the finite state

machine is frail; it has less computational force than some different models of processing, for

example, the Turing machine. That is, there are undertakings which no FSM can do, yet some

Turing machines can. This is on account of the FSM memory is restricted by the quantity of

states. We can define automata as an abstract model of digital computer. Following figure shows

one essential feature of a general automation.

88

Figure 9.1 Finite automation machine

The above figure shows features of automata those are explained below:

Input: The very first step in the automation process is of taking inputs a producing output

respectively. The input given to the automata is a string over a given alphabet. Input tape

is used for giving inputs to the automata which is divided into cells which each of them

holds one symbol at time.

Output: The inputs given to the automata through the input tape helps in producing

various outputs.

States of automata: For an automation process to take place, thesystem has finite

number of internal states starting from q1 to qn.

State relation: States of automata are used to produce various outputs. Which internal

automata state to be used is determined by the present state and the present input.

Outputrelation: While in the automation system a next state after taking the input can be

the same present state of new state depending on the input symbols used.

9.2 Deterministic finite automata (DFA)

89

Deterministic finite automaton (DFA) otherwise called deterministic finite state machine is a

finite state machine that acknowledges/rejects finite series of strings and just creates an

exception processing (or keep running) of the automaton for every information string.

Figure 9.2 Informal deterministic finite automata


A deterministic finite automata consists of total 5 tuples (Q, Σ, δ, q0, F):

Q->Finite set of states.

Σ->Finite set of input symbols.

(δ : Q × Σ → Q)->Transition function.

(q0 ∈ Q)->Start state.

(F ⊆ Q)->Accepter states.

A deterministic finite automaton consists of 5 tuples explained above. The very first tuple

element of deterministic finite automata is the finite number of states (Q) that make it finite

reachable. The second element is the Σ which is the finite set of input symbols. The third tuple

considered to be the most important one. It is called transition function and written as (δ: Q × Σ

→ Q).The next tuple is the starting state q0from where the initial start of the machine takes place.

It is one the states of finite set of states discussed above and written as q0 ∈ Q. The last of all the

tuple is the acceptor states written as F also known as the final state of the finite automation

machine where the machine stops working. It is also written as F ⊆ Q as F is also one of the state

of finite set of states.

90

9.2.2 String processing by DFA

The model of deterministic finite automata explained below is to put light on the way how

strings are processed by DFA during the automation process:

Figure 9.3 String processing by DFA

1. Input tape: Input tape is the basic step of the automation process through which inputs are

given to the system for producing various outputs. This input tape is divided into a number of

cells which contain single symbol from input alphabet Σ.The length of the input tape is

determined with the help of left and right end markers (¢,$), the absence of which tape’s length

is considered to be infinite. The between end markers is processed.

2. Reading head: The reading head in model above is used for reading one symbol at a time

from the cells of the input tape.

3. Finite control: Finite control is responsible for taking symbols under the reading head as

input and produces output as a new state by moving the reading head along the tape to the next

cell.

Example 1: Draw a DFA for string containing 01.

91

Q= {q0,q1,q2}

Σ= {0,1}

Start state=q0

Final state F= {q2}

Transition table of string

containing 01 of the above given

figure

Δ 0 1

->q0 q1 q0

q1 q1 q2

q2* q2 q2

9.3 Non-deterministic finite automata (NDFA)

A nondeterministic finite automaton (NFA), or nondeterministic finite state machine, does not

have to comply with these limitations. Specifically, every DFA is additionally a NFA. Utilizing

the subset development calculation, each NFA can be meant an equal DFA, i.e. a DFA

perceiving the same formal language. Like DFA’s, NFA’s just perceive regular languages. Some

of the time the term NFA is utilized as a part of a smaller sense, importance an automaton that

appropriately abuses an above confinement i.e. it is not a DFA.

Figure 9.4: Informal Non-deterministic finite automata.

92


Non-deterministic finite automata are “non-deterministic” implying that the machine can exist in

more than one state at the same time. In NDFA the outgoing transitions could be non-

deterministic.

A Non-deterministic Finite Automaton (NFA ) consists of 5 tuples {Q , ∑ ,q, F, δ }:

Q -> a finite set of states

∑ ->a finite set of input symbols (alphabet)

q0 -> a start state

F -> set of final states

δ -> a transition function, which is a mapping between Q x ∑ ==> subset of Q.

9.3.2 String processing in NDFA

The string processing in NDFA is totally different from DFA.For a given input in NDFA there

can be more than one legal sequence of steps that means once a processing state receives the

input symbol it have more than one choice to move further which makes it a non-determinism.

The same result can be achieved by computing all legal sequences in parallel and then

determistically search the legal sequences that accept input but it contradicts to the statement that

it doesn’t directly corresponding to anything in physical computer systems. Following figures

shows the deterministic and non-deterministic computations with branches:

93

Figure 9.5 DFA and NDFA computations with branches

Example 2: Draw an NDFA for string containing 01.

Q= {q0,q1,q2}

∑={0,1}

Start state=q0

Final state F={q2}

94

Transition table of string containing 01 of the above given figure

9.4Differences between DFA and NDFA

DFA NDFA

DFA stands for deterministic finite automata. NDFA stands for non-deterministic finite

automata.

Each transition leads to only one state that

makes the whole transition scenario

deterministic.

A transition could lead to a subset of states

making the transition scenario non-

deterministic.

DFA doesn’t have the ability to use empty

string transition.

NDFA has the ability to use the empty string

transition.

Accepts input if the last state is in F.i.e. Final

state.

Accepts input if one of the last states is in F.

As per memory usage DFA requires a lot

memory space.

As per memory usage NDFA requires lesser

memory space.

The real timely implementation of DFA can be

done easily.

The real timely implementation of NDFA can't

be done without converting NDFA to DFA.

Backtracking is allowed in DFA. In NDFA, backtracking may or may not be

allowed depending on the problem statement..

δ 0 1

->q0 { q0,q1 } {q0 }

q1 ɸ {q2 }

*q2 {q2} {q2}

95

9.5 Equivalence of DFA and NDFA

Since reading above gives us an idea that both DFA and NDFA recognizes same class of

languages which is known as regular language’s which we will cover in next section. The

equivalence of DFA and NDFA is very useful because designing an NDFA is much easierthen

that of DFA which not only consumes a lot time but memory of the system too. For every non

deterministic finite automata, there exist an equivalent deterministic finite automata.

The equivalence between DFA and NDFA can be done by simulating the moves of NFA in

parallel. Every state of DFA will be represented by some subset of set of states in NDFA .If the

NDFA contains the number of states as n then the equivalent DFA will contain 2n states.

Theorem: A language L is accepted by a DFA if and only if it is accepted by an NDFA.

Proof: Given any NFA N, we can construct a DFA D such that L (N) =L (D).

Construction of a DFA from an NFA can be done by:

Observation: the transition function of an NFA the transition function of an NFA maps

to subsets of states

Idea: Make one DFA state for every possible subset of the NFA states 22 subset of the

NFA states.

Let N = {Q N,∑,δN,q 0,F N }

Goal: Build D= { Q D,∑,δ D,{q 0}, F D { Q } s.t. D,∑, D,{q 0}, D } L(D)=L(N)

Construction:

1. QD= all subsets of QN (i.e., power set)

2. FD=set of subsets S of QNs.t. S ∩ FN≠Φ

3. δD: for each subset S of QN and for each input symbol a in ∑:

δD(S,a) = U δN(p a)

p in s

9.5.1 Conversion of NDFA to DFA

The transition from one state to another state is not deterministic in the case of non-deterministic

finite automata i.e. for a particular symbol there may be more than one move leading to different

states. Here we will discuss how the conversion from NDFA to DFA takes place:

96

9.5.2 Conversion of NDFA to DFA with the help of lazy creation method

Check Your Progress/Self-Assessment Questions.

Q.1 Differentiate DFA and NDFA

97

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

___________________________________

Q.2 Write down the formal definition of NDFA

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

___________________________________

Q.3 Finite automata require minimum _______ number of stacks.

a) 1

b) 0

c) 2

9.6 Grammar

A set of production rules that are used for string processing in theory of computation is called

grammar, or formal grammar. The formation of the language's alphabet according to the syntax

required by any language is done with the help of these production rules. A formal sentence

structure is an arrangement of tenets for modifying strings, alongside a "begin symbol" from

which revamping begins. Thusly, a linguistic use is normally considered as a language generator.

Then again, it can likewise some of the time be utilized as the premise for a "recognizer", a

capacity in registering that figures out if a given string has a place with the dialect or is

linguistically off base. To depict such recognizers, formal dialect theory uses separate

formalisms, known as automata theory. One of the intriguing consequences of automata theory is

that it is unrealistic to plan a recognizer for certain formal languages. Parsing is the procedure of

98

perceiving an articulation (a breaking so as to string in normal languages) it down to an

arrangement of symbols and investigating every one against the syntax of the dialect. Most

languages have the implications of their articulations organized by language structure, a practice

known as compositional semantics.

9.6.1 Formal Definition

Formality a grammar is described with the help of 4 tuples proposed by the Noam Chomsky in

the year 1950’s.A grammar is declared with the variable G which consists of the following

components as:

G={V,Σ, P, σ }

Where

V- Is the set of terminal nodes.

Σ-Is the set of non terminal symbols with the restriction that V and Σ are disjoint.

σ- Start symbol.

P-Set of production rules.

i.e. A –> B

where:

A-Is the sequence of symbols having at least one non terminal.

B-Is the result of replacing some nonterminal symbolA with a sequence of symbols (possibly

empty)from V and Σ.

In order to understand the formal grammer we will take an example of sentitial form to which we

will try to derieve a valid form with a parse tree.

V = {“the”, ”a”, ”mouse”, ”pig”, ”saw”, “chased“}

= {S, NP, VP, D, N, V}

Where

S –Determine sentence D – Determine determiner

NP – Determine noun phrase N –Determine noun

VP – Determine verb phrase V – Determine verb

= S(Starting from the intial symbol) we have different production rules given underneath:

S –> NP VP,

99

NP –> D N,

VP –> V NP,

D –> ”the”, D –> “a”,

N –> ”mouse”, N –> ”pig”,

V –> “saw”, V –> “chased”

We will use left most derivation in order to form the sentence using production rules. The

leftmost derivation comprises:

S –> NP VP

S –> D N VP

S –> “the” N VP

S –> “the” “cat” VP

S –> “the” “cat” V NP

S –> “the” “cat”“chased” NP

S –> “the” “cat”“chased” D N

S –> “the” “cat”“chased”“a” N

S –> “the” “cat”“chased”“a”“dog”

Since the last production rule forms the required sentence for which different number of

production rules used. Production rules allow us to form the required sentence sequentially. We

can also use parse trees in order to understand the formation of the sentence which gives us a

clear picture. Underneath a parse tree is given for derivation of the sentence through the

production rules.

9.7 Types of Grammars

100

The grammars are used to form various sentences using various

production rules. According to the Chomsky hierarchy the grammar can be divided intodifferent types discussed are underneath:

Grammar Automata

Regular grammars Finite-automata

Context-free grammar Push-down automata

Context-sensitive grammar Linear-bounded automata

Unrestricted grammar Turing machine

9.8 Regular Grammar

Regular grammar is generally restriction based grammars where right side of the grammar is

restricted in which the right side may contain empty string or a single terminal symbol or single

terminal symbol followed by the non-terminal symbol. In regular grammars the left hand side of

the grammars contains only a single non terminal symbol.

For language {anbmIm,n≥1},which tells that value of a and b is such that a should be greater or

equal then 1 and b should be greater and equal to 1 also. This language is considered to be

101

regular language in comparison to the language {anbnIn≥1}. The grammar G used here where

N={S,A,B}, ={a,b} with S as he starting symbol and production rules as:

1. S->aA

2. A->aA

3. A->bB

4. B->bB

5. B->€

9.9 Context Free Grammar

The employment of an automaton is to perceive a dialect—to get a word as info, and answer the

yes-or-no inquiry of whether it is in the language. Be that as it may, we can likewise solicit what

kind of procedure we have to produce a dialect. In wording of human discourse,

acknowledgment relates to listening to a sentence and choosing whether or not it is linguistic,

while era relates to making up, and talking, our own linguistic sentence. A context-free grammar

is a model considered by the Chomsky school of formal linguistics. The idea is that sentences are

recursively generated from internal mental symbols through a series of production rules. For

instance, if S, N, V, and A corresponds to sentences, nouns, verbs, and adjectives, we might have

rules such as:

S → N V N,

N → AN,

and so on, and finally rules that replace these symbols with actual words. We call these rules

context-free because they can be applied regardless of the context of the symbol on the left-hand

side, i.e., independent of the neighboring symbols. Each sentence corresponds to a parse tree and

parsing the sentence allows us to understand what the speaker has in mind. Formally, a context-

free grammar G comprises of a limited letters in order V of variable symbols, a beginning

symbol S ∈V , a limited letters in order T of terminal symbols in which the last word must be

composed, and a limited set R of creation decides that let us supplant a solitary variable symbol

with a string made out of variables and terminals.

102

A → s where A ∈ V and s ∈ (V ∪ T) ∗We can say that the grammar G will generate language i.e.L(G) ⊆ T ∗ containing all terminal

words w.

If in give instance the grammar posses with s V = {S}, T = {(,),[,]} which give rise to language

D2 with two types of brackets:

S → (S)S,[S]S,Ԑ

The above production rule gives us the provision of using three production rule i.e. (S)S or [S]S ,

or can be replaced by the empty string symbol Ԑ.

A language generated from the context free grammar is called as a context free language.

A language {a nb n | n ≥ 0} is considered to be context free language as it can be generated from

the production rule S → aSb, Ԑ . Whereas the language of palindromes Lpal isconsidered to be

context free due to S → aSa,bSb,a,b, Ԑ production rule.

9.10 Context Sensitive Language

In a context-free grammar, the left-hand side of every creation tenet is a solitary variable symbol.

Context-free grammars permit creations to rely on upon neighboring variables, and in this

manner supplant one limited string with another. Then again, we request that the grammar is

non-contracting, in that creations never diminish the length of the string. Along these lines they

are of the structure:

u → v where u,v ∈ (V ∪ T) ∗ and |u| ≤ |v |.

On the off chance that we like, we can request that u contain no less than one variable. To create

the void word, we permit the standard S → Ԑ the length of S doesn't show up in the right-hand

side of any production rule. We say that a language is context-sensitive on the off chance that it

can be produced by a context-sensitive grammar. A context sensitive grammar for a copy

language can be written as V = {S,C,A, B}, T = {a,b}, and the production rules

103

S → aSA,bSB,C

C A → C a,a

C B → Cb,b

aA → Aa , a B → B a , bA → Ab , b B → Bb .

These standards let us produce a palindrome-like string, for example, abSBA. We then change S

to C and utilization C to change over capitalized variables A, B to lower-case terminals a,b,

moving the terminals to one side i.e. right to turn around the request of the second half. With the

remainder of these changes, we eradicate C, abandoning us with a word in Lcopy.

Case in point, here is the determination of abab:

S → aSA

→ abSBA

→ abC BA

→ abCbA

→ abCAb

→ abab .

9.11 Unrestricted Grammars

A grammar G= (N, Σ, P, S) where N denotes the non-terminal symbols, Σ is t terminal symbols,

P is the production rules used of the form of a->b where a and b are strings of the symbol in the

union of N and Σ with a as non empty string and last S to be the start symbol belonging to

N.From the name it suggests that it is kind of grammar where generally no restrictions are

implied on both sides of the production rules.It might be demonstrated that unlimited grammar

Table 1: Tabular form classification of types of grammars

104

TYPE NAME PRODUCTION RULES Recognizing automaton

/

The storage required /

Parsing complexity

1 Context

sensitive

grammars

aAz –>aBC…Dz

A – non-terminal symbols

a, z – sequences of zero or

more terminal or non-terminal

symbols

BC…D – any sequence of

terminal or non-terminal

symbols

Linear bounded

automaton

(non-deterministic

Turing machine) /

Tape being a linear

multiple of input length /

NP Complete

2 Context free

grammars

A –> BC…D

A – non-terminal symbols

BC…D – any sequence of

terminal or non-terminal

symbols

Pushdown automaton /

Pushdown stack /

O (n3)

3 Regular

grammars,

Finite state

grammars

A –>xB

C –> y

A, B, C – non-terminal

symbols

Finite state automaton /

Finite storage /

O (n)

105

uses the recursively enumerable languages. This is the same as saying that for each unhindered

syntax G there exists some Turing machine fit for perceiving L(G) and the other way around.

Given an unlimited grammar use, such a Turing machine is sufficiently straightforward to build,

as a two-tape nondeterministic Turing machine. The principal tape contains the information word

w to be tried, and the second tape is utilized by the machine to produce sentential structures from

G.

Generalization of various grammars is done according the capability of their parsing. Generally

the grammar lower in type is considered to be easy in parsing where as the number keeps on

increasing the complexity of the parsing or the generalization of the grammars keeps on difficult

in nature. In general the

regular grammars are considered to be good as the level of parsing complexity is very much low

which makes it to be top of all the other grammars followed by context free grammars then

context sensitive grammars and at the end the unrestricted grammars with highest parsing

x, y – terminal symbols

0 Unrestricted

grammars,

General rewrite

grammars

Allows the production rules to

transform any sequence of

symbols into any other

sequence of symbols.

Turing machine /

Infinite tape /

Undecidable

106

difficulty level. A description of the generalization of the various grammars is given in the figure

below which tells the difficulty level of the parsing ability of the grammars.

9.12 Summary

The machine is in standout state at once; the state it is in at any given time is known as the

current state. It can transform starting with one state, then onto the next one started by an

activating occasion or condition; this is known as a transition. A specific FSM is characterized

by a rundown of its states, and the activating condition for every transition. A set of production

rules that are used for string processing in theory of computation is called grammar, or formal

grammar. The formation of the language's alphabet according to the syntax required by any

language is done with the help of these production rules. A formal sentence structure is an

arrangement of tenets for modifying strings, alongside a "begin symbol" from which revamping

begins.

9.13 Glossary

Finite automata: Finite-state automaton (plural: automata), or just a state machine, is a

numerical model of calculation used to plan both PC programs and consecutive rationale circuits.

DFA: is a finite state machine that acknowledges/rejects finite series of strings and just creates

an exception processing (or keep running) of the automaton for every information string.

NDFA: Utilizing the subset development calculation, each NFA can be meant an equal DFA, i.e.

a DFA perceiving the same formal language. Like DFAs, NFAs just perceive regular languages.

Grammar: A set of production rules that are used for string processing in theory of computation

is called grammar, or formal grammar.

Regular grammar: Regular grammar is generally restricted based grammars where right side of

the grammar is restricted in which the right side may contain empty string or a single terminal

symbol or single terminal symbol followed by the non-terminal symbol.

Context-free grammar: A context-free grammar is a model considered by the Chomsky school

of formal linguistics. The idea is that sentences are recursively generated from internal mental

symbols through a series of production rules.

Context sensitive language: It is a language formed by context sensitive grammar in which the

variable on the left side does not show up on the right hand side of the production rules.

107

Unrestricted grammar: From the name it suggests that it is kind of grammar where generally

no restrictions are implied on both sides of the production rules.

Check Your Progress/Self-Assessment Questions.

Q.4 Define Grammar?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

____________________________

Q.5 what are the different types of Grammar?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

_____________________


1.

DFA NDFA

DFA stands for deterministic finite automata. NDFA stands for non-deterministic finite

automata.

Each transition leads to only one state that

makes the whole transition scenario

A transition could lead to a subset of states

making the transition scenario non-

108

deterministic. deterministic.

DFA doesn’t have the ability to use empty

string transition.

NDFA has the ability to use the empty string

transition.

2. Non-deterministic finite automata are “non-deterministic” implying that the machine can exist

in more than one state at the same time. In NDFA the outgoing transitions could be non-

deterministic.

A Non-deterministic Finite Automaton (NFA ) consists of 5 tuples {Q , ∑ ,q, F, δ }:

Q -> a finite set of states

∑ ->a finite set of input symbols (alphabet)

q0 -> a start state

F -> set of final states

δ -> a transition function, which is a mapping between Q x ∑ ==> subset of Q.

3. Option b

4. A set of production rules that are used for string processing in theory of computation is called

grammar, or formal grammar.

5. a)Regular grammar

b) Context-free grammar

c)Context sensitive language

d)Unrestricted grammar


Q1.Given the language L = {ab, aa, baa}, which of the following strings are in L*?

1) abaabaaabaa

2) aaaabaaaa

109

3) baaaaabaaaab

4) baaaaabaa

Q2.Let w be any string of length n is {0,1}*. Let L be the set of all substrings of w. What is the

minimum number of states in a non-deterministic finite automaton that accepts L?

A n-1

B n

C n+1

D 2n-1

Q3.A minimum state deterministic finite automaton accepting the language L={W | W ε {0,1} *,

number of 0s and 1s in are divisible by 3 and 5, respectively} has

A 15 states

B 11 states

C 10 states

D 9 states

Q4. Given an arbitrary non-deterministic finite automaton (NFA) with N states, the maximum

number of states in an equivalent minimized DFA is at least

A N2

B 2N

C 2N

D N!


1. System programming, John J Donovan, Tata McGraw-Hill Education, 1st Edition.


Education, 1st Edition.

3. System programming, A.A. Puntambekar, Technical publications, 3rd Edition.

110

Lesson-10 Phases of Compiler Design

Structure of lesson

10.0 Objective

10.1 Major Parts of Compiler Design

10.2 Phases of Compiler Design

10.2.1 Lexical Analysis

10.2.2 Syntax Analysis

10.2.3 Semantic Analysis

10.2.4 Intermediate Code Generation

10.2.5 Code Generation

10.2.6 Code Optimization

10.3 Error Recovery

10.4 Symbol Table

10.5 Summary

10.6 Glossary


10.8 Answers to Check your progress/self-assessment questions.

10.9 Modal Questions

10.10 References

10.0 Objective

After reading this chapter the students will able to:

Explain the detailed process of compiler.

Explain various phases of compiler.

Implement each phase work with compilation process.

Explain optimization of code and their significance.

10.1 Major Parts of Compiler Design

The two major parts of compiler are Analysis phase and Synthesis phase. In analysis phase, an

intermediate representation is created from given source program. It is also known as front-end

111

of compiler. The parts of this phase are Lexical Analyzer, Syntax Analyzer, and Semantic

Analyzer.

Fig. 10.1 Parts of compiler

In phase of the intermediate representation creates equivalent target program It is also known as

back-end of compiler. The parts of this phase are Intermediate Code generator, Code Generator,

and Code Optimizer. In this, each phase transforms source program from one representation into

another. (See Fig.10.1) Also, they communicate with error handlers and symbol table.

10.2 Phases of Compiler Design

A compiler can have many phases and passes. In this, a pass of compiler is the traversal of a

compiler during the whole program and a phase of a compiler is an evident stage that takes input

from the preceding stage, processes it and produces output which can be use as input for the next

stage. Furthermore, a pass can have more than one phase. The compilation process is a series of

various phases in which each phase takes input from its preceding stage that having its own

depiction of source program and serves its output to the next phase of the compiler. The phases

of compiler are shown in figure below. (See Fig.10.2)

112

Fig. 10.2 Phases of compiler

10.2.1 Lexical Analysis

Lexical Analysis or scanning is an initial phase of a compiler. It takes the adapted source code

from language preprocessors which are written in the form of sentences. The lexical analyzer

divides these syntaxes into a sequence of tokens to remove any whitespace or comments in the

source code. If the lexical analyzer finds an invalid token, it generates an error. The lexical

analyzer works directly with the syntax analyzer. It read character streams from the source code

then checks for permissible tokens and passes the data to the syntax analyzer when it demands.

Tokens

113

Token is a sequence of characters that can be used as a single logical entity. In programming

language, keywords, constants, identifiers, strings, numbers, operators, and punctuations symbols

can be considered as tokens.

Specifications of Tokens

There are different types of tokens in compiler as shown below:

a) Alphabets

Any finite set of symbols {0,1} is a set of binary alphabets, or {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}

is a set of Hexadecimal alphabets, {a-z, A-Z} is a set of English language alphabets.

b) Strings

Any finite sequence of alphabets is called a string. Length of the string is the total number of

occurrence of alphabets. For example, A length of string true is 4 and is denoted by |true| = 4. A

string having no alphabets, i.e. a string of zero length is known as an empty string and is denoted

by ε (epsilon).

c) Special Symbols

A typical high-level language contains the following symbols:-

Arithmetic symbols: Addition (+), Subtraction (-), Division (/), Multiplication (*), Modulus

(%).

Punctuation: Comma (,), Semi-colon (;), Dot (.), Arrow (- >).

Assignment: =

Special Assignment: +=, /=, *=, - =

Comparison: = =, !=, <, <=, >, >=

Preprocessor: #

Location Specifier: &

Logical: &, &&, |, ||, !

Shift Operator: >>, >>>, <<, <<<

Lexemes

Lexemes are supposed to be a sequence of characters i.e. alphanumeric in a token. There are

several predefined regulations for every lexeme to be recognized as a valid token. These rules are

114

distinct by grammar rules by means of a pattern. A pattern explains what can be a token and

these patterns are defined by regular expressions.

Fig.10.3. Lexical Analyzer

Example: A: = B+10 tokens: A and B are identifiers

: = assignment operator

+ add operator

10 a number

A lexical analyzer puts information regarding identifiers into symbol table. Regular expressions

are used to describe tokens. A Deterministic Finite state Automata (DFA) is used in the

execution of lexical analyzer. A language is considered as a finite set of strings above finite set

of alphabets. Computer languages are consider as finite sets and set operations can be performed

on them mathematically. Finite languages can be described regular expressions. The lexical

analyzer requests to scan and identify only a finite set of valid string or token or lexeme that

belongs to the language. It also searches for the pattern definite by the language rules.

Regular expressions have the office to express limited languages by characterizing an example

for limited series of symbols. The sentence structure characterized by standard expressions is

known as general linguistic use. The dialect characterized by standard punctuation is known as

consistent language. Customary expression is a noteworthy documentation for indicate designs.

Every example coordinates an arrangement of strings because of which normal expressions give

names to an arrangement of strings. Programming dialect tokens can be depicted by standard

languages. General languages are straightforward and have productive usage.

A limited automata is a state machine that takes a series of symbols as information and changes

its state hence and limited automata are a perceived for normal expressions. At the point when a

consistent expression string is supply into limited automata, it changes its state for every strict.

115

On the off chance that the data string is effectively handled and the automata achieve its last state

then it is acknowledged i.e. the string without further feed was held to be a legitimate token of

the language. The scientific model of limited automata comprises of:

Q->Finite set of states.

Σ->Finite set of input symbols.

(δ : Q × Σ → Q)->Transition function.

(q0 ∈ Q)->Start state.

(F ⊆ Q)->Accepter states.

10.2.2 Syntax Analysis

Sentence structure investigation or parsing is the second phase of a compiler. A lexical analyzer

can recognize tokens with the assistance of consistent expressions and example rules. Be that as

it may, a lexical analyzer cannot verify the syntax of a specified sentence due to the limits of the

regular expressions. Regular expressions cannot verify balancing tokens such as parenthesis.

Thus, this phase use the context-free grammar (CFG) which is recognized by push-down

automata. CFG is a superset of Regular Grammar. It implies that every Regular Grammar is also

context-free but there exist several problems beyond the scope of Regular Grammar. CFG is a

useful tool to describe the syntax of programming languages.

There are total of 4 components that a context free grammar have:

Non-terminals set (V)

Terminal symbols or set of tokens (Σ)

Production rule (P)

Start symbol (S)

Syntax analyzers

A language structure analyzer or parser takes the info from a lexical analyzer as token streams.

The parser breaks down the source code i.e. token stream against the generation principles to

distinguish any blunders in the code. (See Fig. 10.4) The yield of this stage is a parse tree. A

parse tree is a graphical depiction of an induction. An induction is an arrangement of generation

principles to get the data string. It is suitable to perceive how strings are gotten from the begin

symbol. The begin symbol of the inference turns into the base of the parse tree. In a parse tree,

116

all leaf hubs are terminals, every single value inside hub is non-terminals and all together

traversal give unique data string. In parse tree, vagueness is vital. A sentence structure is said to

be questionable on the off chance that it has more than one parse tree (left or right deduction) for

no less than one string. On the off chance that information is examined and supplanted from left

to right, it is known as left-most induction. On the off chance that we check and supplant the

information with generation rules from right to left, it is known as right-most inference. In this,

the parser achieves two errands, for example, parsing the code, searching for lapses and

producing a parse tree as the yield of the stage. Parsers parse the whole code if several errors

survive in the program.

Fig. 10.4 Syntax Analyzer

Limitations of syntax analyzers

Syntax analyzers get their inputs as tokens from lexical analyzers. Lexical analyzers are in

charge of the legitimacy of a token supplied by the syntax analyzer. Syntax analyzers have the

following drawbacks:

If a token is substantial, it can't determine.

If a token is proclaimed before it is being utilized, it can't determine.

If a token is instated before it is being utilized, it can't determine.

If an operation performed on a token sort is substantial or not, it can't determine.

117

Syntax analyzers take after creation principles characterized by setting free linguistic use. The

way the creation standards are actualized (inference) partitions parsing into two sorts of parsing,

for example, top-down parsing and base up.

Fig. 10.5 Types of Parsing

Top-down parsing

When the parser starts constructing the parse tree from the start symbol and then tries to

transform the start symbol to the input, it is called top-down parsing.

Recursive descent parsing: It is a type of top-down parsing. It is called recursive because it

uses recursive actions to process the input. It suffers from backtracking.

Backtracking: If one derivation of a production fails, the syntax analyzer restarts the process

using unusual rules of similar production. This procedure may process the input string more

than one time to verify the right production.

Bottom-up parsing

Bottom-up parsing start from the leaf nodes of a tree and moving parts in upward direction until

it reach the root node.

Shift-Reduce Parsing: Shift-decrease parsing utilizes two exceptional strides for base up

parsing. These strides are known as movement step and decrease step. The movement step is

the headway of the information pointer to the following data symbol which is known as the

moved symbol. This symbol is pushed on the stack. The moved symbol is regarding as a

solitary hub of the parse tree. At the point when the parser discovers a complete syntax

118

principle (RHS) and replaces it to (LHS), it is perceived as decrease step. This happens when

the highest point of the stack contains a handle. A POP capacity is performed on the stack

which pops off the handle and replaces it with LHS non-terminal symbol.

LR Parser: The LR parser is a non-recursive, movement diminish base up parser. It utilizes

a broad class of connection free linguistic use which makes it fundamentally capable syntax

investigation strategy. LR parsers are otherwise called LR (k) parsers in which L remains for

left-to-right examining of the data stream, R remains for the formation of right-most

determination in converse and k indicates the quantity of look ahead symbols to decisions.

10.2.3 Semantic Analysis

Last phase analysis; how a parser builds parse trees in the syntax analysis phase. The parse tree

construct in that phase is usually not in use for a compiler when it does not carry any information

of how to calculate the tree. The production of context-free grammar that makes the rules of the

language does not contain how to understand them.

Semantics

Semantics of a language provide constructions similar to tokens and syntax structure. Semantics

facilitate interpret symbols, their types, and their relations among each other. Semantic analysis

analyze whether the syntax structure construct in the source program derive any importance or

not.

CFG + semantic rules = Syntax Directed Definitions

The subsequent responsibilities should be performed in semantic analysis are:

Type check

Array bound check

Scope resolution

Semantic Errors

Some of the semantic errors that analyze the semantic analyzer are recognize:

Mismatch of type.

Variable which is not declared (undeclared).

Misuse of reserved identifier.

119

More than once declaration of variable in a scope

Accessing a variable outside of its scope.

Mismatch of actual and formal parameter.

Attribute Grammar

Attribute grammar is an individual type of context-free grammar in which different included data

or attributes are attach to one or a greater amount of its non-terminals to give context-sensitive

data. Every property has very much characterized area of qualities, for example, integer, float,

character, string, and expressions. It is utilized to give semantics to the context-free grammar and

it can likewise help to recognize the syntax and semantics of a programming dialect. Quality

grammar showed up as a parse-tree that can pass qualities or data alongside the hubs of a tree.

Semantic attributes

Semantic attributes may be doled out to their qualities from their space at parsing time and assess

at the season of task or conditions. In view of the way the attributes gets their qualities, they can

be for the most part separated into two classes, for example, combined attributes and acquired

attributes.

Synthesized Attributes

These attributes get values from the attribute values of their child nodes.

Inherited Attributes

As compared to synthesized attributes, inherited attributes can take value from parent node or

siblings.

10.2.4 Intermediate Code Generation

Intermediate code generation is a final phase of the front end of the compiler. There is a need of

intermediate code generation before generation of target code. Without the intermediate

generation of code every machine requires native complier which is not practically possible.

Code optimization techniques can directly applied on intermediate code to improve the

performance. The intermediate code is very simple to be converted to assembly code. The input

to this phase is parse tree or abstract syntax tree.

120

Types of Intermediate Representation:

Syntax tree

Postfix notation

Three address Code

For the three address code generation the semantics rules are similar to syntax tree rules for

postfix notations.

Syntax tree

A syntax tree represents the real hierarchical structure of a input source program. A DAG

(Directed Acyclic Graph) provides the information but in a compact manner due to common sub

expressions are identified. A syntax tree for the statement a: =t*-s+t*-s appear in the Fig 10.5.

Fig 10.6 (a): syntax tree Fig10.6 (b): DAG

Postfix Notation

The postfix notation of a tree is linear representation of the syntax tree. It is list representation of

tree nodes in which each node appears immediately after its children nodes. The postfix notation

of the syntax tree in Fig 10.5 is as follows:

a t s uniminus * t s uniminus * + assign … (1.1)

In postfix notation the edges of the tree do not appear directly. The edges can be easily identified

from the order of the nodes and the number of operands.

121

Three address codes

Three address codes refer to the sequence of statements. Consider the expression below:

a: = b op c

Where a, b and c are the names, constants, temporary variables. op represents the operation to be

performed like boolean, arithmetic , logical etc. There is only one variable on the right hand side

of the statement.

Implementation of Three Address Statements

A three address statements represents the abstract view of intermediate code. For the operator

and the operand these statements can also be implemented as record with fields. There are three

such representations as:

Quadruples

Triples

Indirect Triples

A quadruple is four field record structures which are op, a1, a2, operation and result. The op field

specifies the code for the operator to be used. a1,a2 are the variables and result is used to hold

the result of various stages. Fig. 10.6 specifies the code quadruples for the assignment statement

given below:

a: = t*-s + t * -s …

10.1

Position op A1 A2 Result

(0) uniminus S T1

(1) * T T1 T2

(2) uniminus S T3

(3) * T T3 T4

(4) + T2 T4 T5

(5) : = T5 A

Fig 10.7 Quadruples

Here T1, T2…T5 are used to store the temporary results at different stages and final result is

stored in variable “a”.

122

Triples are used to avoid the temporary variables. Instead of storing results in temporary

variables position of the statement can also be used. The triple representation for the (10.1)

statement is shown in Fig. 10.7.

Position Op A1 A2

(0) uniminus S

(1) * T (0)

(2) uniminus S

(3) * T (2)

(4) + (1) (3)

(5) assign A (4)

Fig.10.8 Triples

Indirect triple representation used pointers instead of listing triples directly. The indirect triple

representation of the (10.1) is shown in Fig. 1.3.

Fig 10.9(a) Pointers Fig 10.9 (b) Indirect triples

10.2.5 Code Generation

It is a final phase in the complier design. It takes intermediate code as a source program and

provided target program as an output. The real aim of this phase is to preserve the semantic

meaning of input program and effective use of provided resources. The position of the code

generator in the complier design is shown in the Fig 10.9.

Position statement

(0) (14)

(1) (15)

(2) (16)

(3) (17)

(4) (18)

(5) (19)

Position op A1 A2

(14) uniminus s

(15) * t (14)

(16) uniminus s

(17) * t (16)

(18) + (1) (17)

(19) assign a (18)

123

Fig 10.10 Code generator position

Design Issues in Code Generator:

Code Generator Input: The code generation phase assumes that input should be free of

errors. The input is provided by the front end with the symbol table information. The input

can be syntax tree, DAG, Triple, quadruples, postfix, byte code etc.

Output Programs: The target programs are output of the code generators. The output

program can be in different forms like assembly language, relocatable machine language or

absolute machine language. Absolute machine language codes are reside in fixed memory

locations and can be immediately executed. In relocatable codes there is a separate execution

of sub programs. Assembly language as output program makes the process of code

generation easier due to symbolic instructions.

Memory management: Mapping of the names presents in the source programs to the data

object address in the run time is done by the co-operation of code generation and the front

end. The type field in the declaration determines the amount of storage required for declared

variable.

Instruction Selection: The nature of the instruction also plays very important role. If the

target machine is not compatible with the data type then there should be a special handling

for general rule. The selection of Instruction depends upon the level of instruction (low or

high), nature of instruction (data type support).

Register allocation and assignment: To select variables that will be stored in registers and

picking the particular register to store the variable is also one of the design issues.

124

Choice of evaluation order: There exits many evaluation orders. Some evaluation orders

require less registers than others. The final order should be perfect one to improve the

efficiency of the overall process.

Different approaches to code generation: There are different algorithms available for the

code generation. To choose the right algorithm by considering all the pros and cons is also a

big challenge.

The Run time storage management is done by using stack data structures.

Basic blocks and flow graphs

Basic block: It is a sequence of consecutive statements in which once control enters at the

beginning leaves at the end without halting. There are algorithms available to convert three

address statements to basic blocks.

Transformations on the basic blocks:

Structure preserving transformation: These transformations include elimination of

common sub expressions, dead code elimination, i.e., elimination of code which is never

used, renaming temporary variables, interchange of statements etc.

Algebraic transformations: Many algebraic transformations are used to expressions

computed by the basic block to the equivalent algebraic set. For example the statements like

Y: =Y+0 and Y: = Y*1 can be easily eliminated from the basic blocks.

Flow Graphs

Flow control information can be added to the basic block by constructing the direct graph called

flow graph. Basic blocks are the nodes of the low graph. There is a initial graph whose leader is

first statement. The direct edge from b1 to b2 indicates that b2 is immediately followed by b1.

A Code Generation Algorithm:

The input to the algorithm is three address statements constituting a basic block. For the three

address statement of the a:=b op c the following steps are carried out.

Step-1: A function getreg is called to find a location L where the result of three address

statement is stored. The location L can be register as well as memory location.

125

Step-2: The address descriptor is consulted to find out the value of “b”. “b” can be both in

memory and register. Copy the location of “b” to L.

Step-3: In this step instruction op c’ is generated. Where c’ is a current location of c. For storage

register is prefer than memory location. Then further the location is “a” is again considered.

Step-4: The operation is performed and values are stored in register. The registers which were

hold variables can be free now to hold the next instruction variables.

10.2.6 Code Optimization

Code optimization provides the efficient code which executes faster, use efficient memory

techniques and provides better performance. Usually code optimizer is located between the front

end and the code generator. Code optimizer usually does the following:

• Works with intermediate code.

• Provides control flow analysis.

• Provides data flow analysis.

• For the improvement of intermediate code do transformations.

There are various techniques used for code optimizations. They are as follows:-

Peephole Optimizations: These are performed after the generation of the machine code. The

code is scanned to find out the adjacent instructions which can be replaced by the single

instruction or fewer instructions.

Loop optimizations: These are applied on the loop statements such as for loop. These are

very important because usually programs spend their huge percentage of time inside the

loops.

Branch optimization: It rearranges the code of program to minimize the logic of branching

and to merge physically separate blocks of code.

Code motion: If variables used inside a loop are not changing within the loop, then

computation can be done outside of the loop and the various related results can be used

within the loop.

Common sub expression elimination: In common expressions, the similar value is

recalculated in some other subsequent expression. The duplicate expression can be removed

by using the previous results.

126

Constant propagation: Constants used in various expressions are merged and new ones are

generated. Some implicit conversions between integer’s values and floating-point types are

done.

Dead code elimination: It Eliminates code that never be reached during execution or where

the results are not used.

Dead store elimination: It Eliminates stores when the value stored is never referenced again.

For example, if two stores to the same location have no intervening load, the first store is

unnecessary and is removed.

Global register allocation: It provides variables and expressions to hardware registers with

the help of “graph coloring" algorithm.

Inlining: It swaps the function calls with program code

Instruction scheduling: It reorders instructions to reduce execution time

Interprocedural analysis: It Uncovers relation between function calls, and removes loads,

computations and stores that cannot be removed with more straight optimizations.

Invariant IF code floating (Unswitching): It eliminates the invariant branching code from

loops to make opportunity for different optimizations.

Reassociation: It rearranges the sequence of calculations in an array s expression, providing

more candidates for common expression removing.

Store motion: It moves the store instructions outside loops.

Strength Reduction: It swaps less efficient instructions with effective ones.

Value numbering: It involves constant propagation, folding of several instructions into a

single instruction and expression elimination.

10.3 Error Recovery

A parser should be able to identify and report any error in the program. When an error is

encountered, the parser should handle it and carry on parsing the rest of the input. Mostly, the

parser has to check the errors but errors may be encountered at different stages of the

compilation process. A program may have the following kinds of errors at different stages:

Lexical: name of any identifier typed incorrectly

Syntactical: missing semicolon or unbalanced parenthesis

Semantically: mismatched value assignment

127

Logical: code not accessible, infinite loop

10.4 Symbol Table

It is an information structure kept up amid every one of the periods of a compiler. All the

identifier names alongside their sorts are put away in symbol table. The symbol table makes it

less demanding for the compiler to rapidly seek the identifier record and recover it. The symbol

table is additionally utilized for degree administration. Symbol table is a critical information

structure made and kept up by compilers with a specific end goal to store data about the event of

different elements, for example, variable names, capacity names, items, classes, interfaces, and

so forth. Symbol table is utilized by both the examination and the blend parts of a compiler.

A symbol table may fill the accompanying needs relying on the language close by:

To store the names of all substances in an organized structure at one spot.

To check if a variable has been pronounced.

To execute sort verifying so as to check, assignments and expressions in the source code are

semantically right.

To focus the extent of a name (scope determination).

A symbol table is just a table which can be either straight or a hash table. It keeps up a passage

for every name in the accompanying arrangement:

<Symbol name, sort, attributes>

For instance, if a symbol table needs to store data about the accompanying variable

announcement:

static int interest;

At that point it must store the section as:

<interest, int, static>

The characteristic statement contains the passages identified with the name.

In the event that a compiler is to handle a little measure of information, then the symbol table can

be actualized as an unordered rundown that is anything but difficult to code however it is just

fitting for little tables just. A symbol table can be actualized in one of the accompanying ways:

Linear (sorted or unsorted) list

Binary Search Tree

128

Hash table

Alongside all, symbol tables are normally executed as hash tables where the source code symbol

itself is regard as a key for the hash capacity and the arrival worth is the data with respect.

10.5 Summary

The description of the front end and back end of complier has been explained in this chapter. The

various phases like lexical, syntax, semantic, intermediate code generation and code generation

along with their suitable examples also has been discussed. The inter relationship between

different phases has been described. The various code optimization techniques also have been

discussed to provide efficient coding.

10.6 Glossary

Token: A token is a sequence of characters treated as a unit in the grammar of programming

languages.

Lexeme: A lexeme is a sequence of character in program that is matched by pattern for a token.

Pattern: A pattern is a rule describing the set of lexeme that represent particular token.

Symbol table: A symbol table is a data structure used by a language translator such as

a compiler or interpreter.

10.7 Check your progress/self-assessment questions.

Q1 What are the conceivable lapse recuperation activities in lexical analysis?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

_______________

Q2 What are the two sections of a compilation? Clarify briefly?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

129

______________________________________________________________________________

_____________________

Q3 Differentiate tokens, patterns, lexeme?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

____________________________

Q4 What does a semantic analysis do?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

___________________________

Q5 Define uncertain grammar ?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

____________________________

Q6 What are the issues with top down parsing?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

_____________________________

130

Q7 What are the advantages of halfway code era?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

____________________________

Q8 What are the various types of intermediate code representation?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

___________________________________

Q9 Define symbol table ?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

____________________________

Q10 What are the properties of optimizing compiler ?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

____________________________

131

10.8 Answers to Check your progress/self-assessment questions.

1. a) Deleting a superfluous character

b) Inserting a missing character

c) Replacing an off base character by a right character

d) Transposing two contiguous characters.

2. Analysis and Synthesis are the two sections of compilation. The analysis part separates the

source program into constituent pieces and makes a middle of the road representation of the

source program. The union part develops the coveted target program from the middle of the road

representation.

3. Tokens: Sequence of characters that have an aggregate significance.

Patterns: There is an arrangement of strings in the data for which the same token is created as

yield. This arrangement of strings is depicted by a standard called an example connected with the

token

Lexeme: A grouping of characters in the source program that is coordinated by the example for a

token.

4. Semantic analysis is one in which certain checks are performed to ensure that components of a

program fit together meaningfully. Mainly it performs type checking.

5. A grammar G is said to be uncertain in the event that it produces more than one parse tree for

some sentence of dialect L(G).i.e. both furthest left and furthest right inferences are same for the

given sentence.

6. The accompanying are the issues connected with top down parsing:

• Backtracking

• Left recursion

• Left figuring

• Ambiguity

132

7. A Compiler for diverse machines can be made by joining distinctive back end to the current

front finishes of every machine. A Compiler for distinctive source dialects can be made by

demonstrating diverse front closures for relating source dialects t existing back end. A machine

autonomous code streamlining agent can be connected to middle code so as to upgrade the code.

8. There are mainly three types of intermediate code representations.

Syntax tree

Postfix

Three address code

9. Symbol table is an information structure utilized by the compiler to stay informed regarding

semantics of the variables. It stores data about degree and tying data about names.

10. The source code ought to be such that it ought to create least measure of target code. There

ought not be any inaccessible code. Dead code ought to be totally expelled from source

language. The streamlining compilers ought to apply taking after code enhancing changes on

source dialect.

1. common sub expression elimination

2. dead code elimination

3. code movement

4. strength reduction

10.9 Modal Questions

Q1 Explain various phases of compiler?

Q2 What is the use of regular expression D.F.A in lexical analysis?

Q3 How lexical analyzer removes white spaces from a source file?

Q4 How do real compilers deal with symbol tables?

Q5 How would you check that no identifier is declared more than once?

Q6 Specify the lexical form of numeric constant in the language in C?


133





134

Lesson-11 YACC

Structure of the lesson

11.0 Objective

11.1 Introduction to YACC

11.2 Format of a YACC file

11.3 LexYacc Interaction

11.3.1 Header files of code

11.4 Rule Section

11.5 User code section

11.6 YACC Declaration Summary

11.7 Just in time compilers

11.7.1 Functioning Of just in time compilers

11.7.2 Classification of just in time compilers

11.8 Platform independent systems

11.9 Various stages accessible

11.9.1 Hardware stages

11.9.2 Software stages

11.10 Summary

11.11Grossary

11.12Answers to check your progress/self-assessment questions.


11.14 References

11.0 Objective

After studying this lesson, students will able to:

Explain the concept of YACC and format to write a code.

Define just in time compilers.

Discuss the Functioning of just in time compilers and there classification.

Define platform independent systems.

135

Explain the types of platforms.

11.1 Introduction to YACC

A compiler is a PC program that changes source code written in a programming language into

another script (the objective dialect, regularly having a twofold structure known as article code).

The most widely recognized explanation behind changing over a source code is to make an

executable system. Its tag as "compiler" is basically utilized for projects that decipher basis

program from an abnormal state programming language to a lower level language (e.g., low

level computing construct or machine code). In the event that the accumulated project can keep

running on a PC whose CPU or working framework is not quite the same as the one on which the

compiler runs, the compiler is known as a cross-compiler. All the more by and large, compilers

are a particular sort of interpreters.A program that interprets from a low level language to a more

elevated amount one is a decompiler. A program that interprets between abnormal state dialects

is normally called a source-to-source compiler or transpiler. A languagerewriter is normally a

program that interprets the type of expressions without a change of dialect. The term compiler-

compiler is once in a while used to allude to a parser generator, an instrument regularly used to

help make the lexer and parser.

YACC gives a general apparatus to forcing structure on the info to a PC program. The YACC

client readies a determination of the info transform; this incorporates tenets portraying the data

structure, code to be conjured when these principles are perceived, and a low-level routine to do

the fundamental information. YACC then creates a capacity to control the info process. Parser

summons scanner for tokens. Parser break down the syntactic structure as indicated by

grammars. At long last, parser executes the semantic schedules.

136

Figure 11.1 : Working of YACC

11.2 Format of a YACC file

Typically a YACC file consists of three things i.e. definitions’, rules and code which is quote

similar to the format of a lexfile. A typical yacc file looks like:

...definitions...

%%

...rules...

%%

...code...

Definition is the basic step written in the YACC file that comprises of all code written

between % {and %} which is copied to the beginning of the resulting C file.

Rules are the production sets which are formed with the combination of patterns and

actions which are taken into braces if the actions are more than one in number.

137

Code is the main part of the file which is lengthy and elaborated part with main priority

part as yylex which is the typical analyser. A default main is used for calling this analyzer if the

code part is left out sometimes. In some of the other references the structure of the YACC file is

given as:

%{

C declaration

%}

YACC declaration

%%

Grammar rules

%%

Additional C code

In order to understand the working of YACC we take an example:

11.3 LexYaccInteraction

138

YACC is responsible for accepting the stream of tokens which are generated by the Lex which

parses file of characteristic and outputs these tokens used by YACC. YACC performs certain

number of actions on the received tokens.yylex routine is repeatedly be called by YACC

program when a lex program supplies tokenizer. The lex standards will likely work by calling

return each time they have parsed a token. We will now see the way lex returns data in a manner

that yacc can utilize it for parsing.

11.3.1 Header files of code

On the off chance that lex is to return tokens that YACC will transform, they need to concede to

what tokens there are. This is done as one takes after. The yacc record will have token definitions

" %token NUMBER" in the definitions section. Exactly when the yacc record is deciphered with

yacc - d, a header archive y.tab.h is made that has definitions like "#define NUMBER 258". This

record can then consolidated in both the lex and YACC program. The lex record can then call

return NUMBER, and the yacc framework can organize on this token. The section codes that are

portrayed from %TOKEN definitions for the most part begin at around 258, with the target that

solitary characters can basically be returned as their whole number worth:

/* in the lex program */

[0-9]+ {return NUMBER}

[-+*/] {return *yytext}

/* in the yacc program */

sum : TERMS ’+’ TERM

11.4 Rule Section

The principles segment contains the grammar of the language you need to parse. This resembles

name1 : THING something OTHERTHING {action}

| othersomething THING {other action}

name2 : .....

139

This is the general sort of case without setting sentence structures, with a course of action of

exercises joined with each organizing right-hand side. It is a not too bad convention to keep non-

terminals (names that can be amplified further) in lower case and terminals (the image's that are

finally organized) in promoted. The terminal image's get composed with return codes from the

lextokenizer. They are regularly portrays starting from % token definitions in the YACC

program or character qualities.

11.5 User code section

The insignificant principle system is

int main()

{

yyparse();

return 0;

}

Expansions to more yearning projects ought to act naturally apparent. Notwithstanding the

principle program of the code segment will for the most part additionally contain subroutines

which is to be utilized either as a part of the YACC or the lex program.

11.6 YACC Declration Summary

`%start' Specify the punctuation's begin symbol.

`%union' Declare the gathering of information sorts that semantic qualities may have.

`%token' Declare a terminal symbol (token type name) with no priority or associativity indicated.

`%type' Declare the kind of semantic qualities for a nonterminal symbol.

%right' Declare a terminal symbol (token type name) that is correct acquainted

`%left' Declare a terminal symbol (token type name) that is left-acquainted

`%nonassoc' Declare a terminal symbol (token type name) that is nonassociative (utilizing it as a

part of a way that would be acquainted is a sentence structure error).

11.7 Just In Time compilers

140

In computing, just-in-time compilation (JIT), generally called component interpretation, is a

procedure to improve the runtime execution of PC undertakings in perspective of code in byte.

Since byte code is interpreted it executes slower than assembled machine code, unless it is truly

requested to machine code, which could be performed before the execution , making the

undertaking loading moderate or during the execution. In this last case which is the reason for

JIT compilation, the venture is secured in memory as byte code, however the code segment

starting now running is preparatively incorporated to physical machine code keeping in mind the

finished objective to run speedier. JIT compilers identify with a cross breed philosophy, with

interpretation occurring continuously, as with interpreters, however with caching made an

interpretation of code to minimize execution corruption. It in like manner offers distinctive

central points over statically requested code at progression time, for instance, handling as of late

bound information sorts and the ability to approve security guarantees. JIT develops two former

considerations in run-time circumstances: Byte code compilation and component compilation. It

changes over code at runtime preceding executing it locally, for instance byte code into

neighborhood machine code. A couple of current runtime circumstances, for instance,

Microsoft's .NET Framework and most use of Java, rely on upon JIT compilation for fast code

execution. In the Java programming dialect and environment, a just-in-time (JIT) compiler is a

system that turns Java byte code (a project that contains instructions that must be interpreted)

into instructions that can be sent particularly to the processor. After you've created a Java

program, the source dialect declarations are gathered by the Java compiler into byte code instead

of into code that contains instructions that match a particular hardware stage's processor (for

instance, an Intel Pentium chip or an IBM System/390 processor). The byte code is stage

independent code that can be sent to any stage and continue running on that stage. Some time

recently, most ventures written in any language,they must be recompiled and sometimes altered

for each PC stage. One of the best inclinations of Java is that need to form and orchestrate a

framework once. The Java on any stage will interpret the orchestrated byte code into instructions

justifiable by the particular processor. Notwithstanding, the virtual machine handles one byte

code instruction at a time. Using the Java just-in-time compiler (genuinely a second compiler) at

the particular structure stage orchestrates the byte code into the particular structure code (as

though the framework had been requested initially on that stage). At the point when the code has

been reassembled by the JIT compiler, it will generally run more quickly in the PC. The just-in-

141

time compiler goes with the virtual machine and is used then again. It arranges the byte code into

stage specific executable code that is instantly executed. Sun Microsystems prescribes that it's

regularly speedier to pick the JIT compiler decision, especially if the procedure executable is

more than once reused.

Figure 11.2 Structure of JIT compiler

11.7.1 Functioning Of Just In Time Compilers

In a byte code-gathered structure, source code is implied as a middle of the road representation

known as byte code1. Byte code is not the machine code for a particular PC, and may be

adaptable among PC architectures. The byte code may then deciphered by, or continue running

on a virtual machine. The JIT compiler scrutinizes the byte codes in various fragments (or in full,

from time to time) and masterminds them alterably into machine dialect so the framework can

run speedier. Java performs runtime watch that watches out for distinctive territories of the code

and this is the reason the entire code is not ordered at once.This ought to be conceivable per-

record, per-limit or even on any self-decisive code area; the code can be gathered when it talks

reality to be executed (therefore the name "in the nick of time"), and a short time later held and

reused later without waiting be recompiled. On the other hand, an ordinary translated virtual

machine will simply decipher the bytecode, generally with much lower execution. A couple of

142

mediators even decipher source code, without the movement of first arranging to byte code, with

much more disgusting execution. Statically assembled code or nearby code is gathered going

before association. A dynamic assemblage environment is one in which the compiler can be used

amid execution. Case in point, most Common Lisp systems have a total limit which can fuse new

limits made amid the run. This gives an extensive parcel of the advantages of JIT, however the

product architect contradicts to the runtime point as it is in control of what parts of the code are

arranged. This can in like manner total effectively made code, which can, in various

circumstances, give liberal execution central focuses over statically aggregated code [citation

needed], furthermore over most JIT systems. An ordinary target of utilizing JIT techniques is to

achieve or surpass the execution of static arrangement, while keeping up the advantages of byte

code understanding: Much of the "diligent work" of parsing the first source code and performing

basic change is regularly dealt with at consolidate time, going before sending: aggregation from

byte code to machine code is much speedier than assembling from source. The sent byte code is

advantageous, not in the least like neighborhood code. Since the runtime has control over the

gathering, as deciphered byte code, it can continue running in a secured sandbox. Compilers

from byte code to machine code are less requesting to form, in light of the fact that the helpful

byte code compiler has authoritatively done a critical piece of the work.

JIT code all around offers clearly preferable execution over mediators. Additionally, it can from

time to time offer ideal execution over static total, the same number of headways are only

conceivable at run-time:

The collection can be moved up to the engaged CPU and the working structure model where

the application runs. Case in point JIT can pick SSE2 vector CPU bearings when it

recognizes that the CPU supports them. However there is starting now no standard JIT that

executes this. To get this level of change specificity with a static compiler, one must either

gather a twofold for each normal stage/building configuration, or else join diverse

adjustments as parts of the code within a lone matched.

The structure has the ability to accumulate bits of knowledge about how the venture is truly

running in nature it is in, and it can patch up and recompile for perfect execution. In any case,

some static compilers can in like manner take profile information as data.

143

The structure can do overall code improvements (e.g. in covering of library limits) without

losing the upsides of element connecting and without the overheads basic to static compilers

and linkers. Specifically, while doing overall inline substitutions, a static aggregation

technique may need run-time checks and ensure that a virtual call would happen if the

genuine class of the thing overrides the inlined procedure, and point of confinement

condition watches out for bunch which must be taken care of within circles. With at the last

possible second plan a significant part of the time this get ready can be moved out of circles,

consistently giving broad additions of pace.

Although this is possible with statically requested rubbish assembled tongues, a byte code

system can more easily patch up executed code for better.

11.7.2 Classification of just in time compilers

All through looking over JIT work, some typical qualities have been developed.We recommend

that JIT outline works can be gathered by properties:

(1) Invocation:- A JIT compiler is unequivocally conjured if the customer must make some

move to realize accumulation at runtime. An absolutely summoned JIT compiler is clear to the

customer.

(2) Executability:- JIT structures conventionally include two dialects: a source dialect to

translate from, and a target dialect to mean (although14 rather than the progressing change of

Kistler's [2001] "ceaseless improvement," just assemblage happened at the same time utilizing

"persistent compilation,"and which happens just once.These vernaculars can be the same, if the

JIT system is simply performing upgrade on-the-fly). We call a JITsystemmonoexecutable in the

event that it can execute one of these tongues, and polyexecutable if can execute more than one.

Polyexecutable JIT frameworks have the upside of choosing when compiler conjuring is

defended, subsequent to either program representation can be used.

144

(3) Concurrency:- This property portrays how the JIT compiler executes, in appreciation to the

task itself. In the occasion that program execution delays under its own specific volition to

permit assemblage, it is not concurrent that the JIT compiler for this circumstance may be

summoned by method for subroutine call, message transmission, or trade of control to a co-

schedule. Interestingly, a concurrent JIT compiler can fill in as the framework executes all the

while in an alternate string or methodology, even on a substitute processor. JIT structures that

limit in hard persistent may constitute a fourth characterizing property, yet there is in every way

little research in the extent at present.It is ambiguous if hard progressing imperatives speak to

any of a kind issues to JIT systems. A couple examples are self-evident. For example, positively

summoned JIT compilers are certainly dominating in late work. Sensitivity varies from system to

structure, yet this is more an issue of arrangement than an issue of JIT advancement. Wear down

synchronous JIT compilers is in a matter of seconds simply starting, and will most likely

increment in hugeness as processor advancement creates.

11.8 Platform independent systems

Credit given to PC programming or figuring schedules and thoughts that are realized between

work on various PC stages are similarly said to be stage free. stage free programming may be

divided into two sorts; one obliges individual building or course of action for each stage that it

supports, and the other one can be particularly continue running on any stage without exceptional

preparation, e.g., programming written in a deciphered dialect or pre-requested advantageous

bytecode for which the arbiters or run-time packs are typical or standard fragments of all stages.

a cross-stage application may continue running on Microsoft Windows on the x86 building

configuration, Linux on the x86 auxiliary arranging and Mac OS X on either the PowerPC or x86

based Apple Macintosh frameworks. Cross-stage activities may continue running on the same

number of as each and every current stage, or on as few as two stages.

Stage freedom in programming implies that you can run some code with practically zero change

on various stages.

145

It depends on upon what you portray as "the stage". From time to time this may be a specific

hardware/machine plan. In diverse gasses may be a "nonexclusive PC". In distinctive cases it

may be a virtual machine and runtime environment (which is the circumstance with Java).

Nothing is "perfectly" stage autonomous - there are reliably a couple corner cases that can get

you out. For example, in case you hardcode record route separators rather than using the stage

independentFile.pathSeparator as a piece of Java then your code won't wear down both Windows

and Linux. As a product engineer, you need to watch out for these things, constantly use the

stage free option where possible and test properly on differing stages in case you consider

comfort.

There are always a couple of restrictions on specific stages that can't be ignored. Representations

are things like the most compelling length of filenames, or the open RAM on a framework.

Despite the sum you endeavor to be stage autonomous, your code may miss the mark if you

endeavor to run it on a stage that is too solidly obliged. It's basic to take note of that some

dialect's are stage autonomous at the source code level (C/C++ is a not too bad case) yet lose

stage flexibility once the code is amassed (since nearby code is stage specific). Java holds stage

opportunity even after code is accumulated in light of the way that it collects to stage free

bytecode (the genuine change to neighborhood code is dealt with at a later time after the

bytecode is stacked by the JVM).There are unexpectedly bugs in dialect executions that simply

happen on particular stages. So paying little heed to the way that your code is speculatively

100% helpful, regardless of all that you need to test it on differing stages to confirm you aren't

running into any shocking.

In java the stage indepence can be taken as:

Java code is stage autonomous as in the same Java application or counts (frequently gathered

to Java bytecode and packaged in a .container record) will run vaguely on Windows and

Linux.

Java libraries (e.g. all the lovely open source toolsets) are by and large stage free, the length

of they are formed in impeccable Java. Most libraries endeavor to stay with unadulterated

146

Java in order to keep up stage self-governance, however there are a couple of circumstances

where this is impossible (e.g. if the library needs to interface direct with one of a kind gear,

or call a C/C++ library that uses neighborhood code).

The Java stage/runtime environment is stage autonomous as in the same libraries (pictures,

frameworks organization, File IO et cetera.) are open and work in the same way on all stages.

This is done deliberately to allow applications that uses these libraries to have the ability to

continue running on any stage. Case in point, the Java libraries that get to the record

framework know the way that Windows and Linux use unmistakable filename way

separators, and make note of this for you. Clearly, this infers in the motor the runtime

environment makes use of stage specific parts, so you require a substitute JRE for each stage.

The JVM itself (i.e. the Java Virtual Machine that is responsible for JIT ordering and running

Java bytecode) is stage autonomous as in it is available on various stages (everything from

brought together PCs to cell phones). However specific types of the JVM are required for

each fundamental stage to make note of unmistakable nearby heading codes and machine

capacities (so you can't run a Linux JVM on Windows and the a different way). The JVM is

packaged as a noteworthy part of the Java stage/runtime environment as above.

By and substantial, Java is in all probability about as close certified stage self-sufficiency as

you can get, yet as ought to be clear there is still a lot of stage specific work done in the

motor.

On the remote possibility that you stick to 100% faultless Java code and libraries that can be

subject to Java as being "effectively" stage autonomous and it generally fulfills the compose

Once Run Anyw

11.9 Various stages accessible

The term stage can suggest the sort of processor and/or other gear on which a given working

framework or application runs, the kind of working framework on a PC or the blend of the kind

147

of hardware and the kind of working framework running on it. An instance of a run of the mill

stage is Microsoft Windows running on the x86 auxiliary designing. Other without a doubt

comprehended desktop PC stages join Linux/Unix and Mac OS X - both of which are themselves

cross-stage. There are various devices, for instance, mobile phones that are in like manner

satisfactorily PC stages however less customarily considered in that way. Application

programming can be made to depend on the components of a particular stage either the

hardware, working framework, or virtual machine it continues running on. The Java stage is a

virtual machine stage which continues running on various working frameworks and hardware

sorts, and is a run of the mill stage for programming to be created for.

11.9.1 Hardware stages

An equipment stage can insinuate a PC's basic building or processor development displaying.

For example: x86 basic designing and its varieties, for instance, IA-32 and x86-64. These

machines often run one adjustment of Microsoft Windows, then again they can run other

working frameworks as well, including Linux, OpenBSD, NetBSD, Mac OS X and FreeBSD. An

ARM basic designing is fundamental on mobile phones and tablet PCs, which run Android, iOS

and other flexible working frameworks.

11.9.2 Software stages

Programming stages can either be a working framework or programming environment, however

more routinely it is a mix of both. An extraordinary unique case to this is Java, which uses a

working framework free virtual machine for its consolidated code, alluded to in the domain of

Java as bytecode. Outlines of programming stages include: lixus,java,mcrosoftwindows,DOS

sort framework.

11.10 Summary

Yacc gives a general apparatus to forcing structure on the info to a PC program. The Yacc client

readies a determination of the info transform which incorporates tenets portraying the data

structure, code to be conjured when these principles are perceived, and a low-level routine to do

148

the fundamental information. Yacc then creates a capacity to control the info process. Parser

summons scanner for tokens. Parser break down the syntactic structure as indicated by

grammars. At long last, parser executes the semantic schedules.In computing, just-in-time

compilation (JIT), otherwise called element interpretation, is a strategy to enhance the runtime

execution of PC projects in perspective of byte code (virtual machine code). Since byte code is

deciphered it executes slower than accumulated machine code, unless it is truly requested to

machine code, which could be performed before the execution, making the undertaking stacking

direct or amid the execution. In this last case ,which is the reason for JIT accumulation the task is

secured in memory as byte code, however the code bit starting now running is preparatively

fused to physical machine code remembering the deciding objective to run speedier.Platform

autonomy in programming implies that you can run some code with almost no adjustment on

various stages. It depends on what you describe as "the platform".Nowand again this may be a

specific gear/machine course of action. In diverse gasses it might be a "nonexclusive PC". In

distinctive cases it may be a virtual machine and runtime environment (which is the circumstance

with Java).

11.11 Grossary

YACC : YACC gives a general apparatus to forcing structure on the info to a PC program. The

Yacc client readies a determination of the info transform which incorporates tenets portraying

the data structure, code to be conjured when these principles are perceived, and a low-level

routine to do the fundamental information. Yacc then creates a capacity to control the info

process.

JIT :In computing, just-in-time compilation (JIT), otherwise called element interpretation, is a

strategy to enhance the runtime execution of PC projects in view of byte code (virtual machine

code).

Platform independentsystems: Stage autonomy in programming implies that you can run some

code with practically no change on numerous stages.

Hardware platforms: An equipment stage can imply a PC's basic designing or processor

development displaying.

Software platforms: Software platforms can either be a working system or programming

environment, however all the more regularly it is a blend of both.

149

Check Your Progress/Self-Assessment Questions

Q1) Explain lex and yacc tools?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

___________________________________

Q2) Give the structure of the lex program?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

___________________________________

Q3) What is an internal command? Give an example?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

___________________________________

150

Q4) what is exit status command?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

___________________________________

.

Q5) Give the structure of the Yacc program?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

_____________________________

11.12 Answer to Check Your Progress/Self-Assessment Questions

Ans:1. Lex:- scanner that can identify those tokens

Yacc:-parser.yacc takes a concise description of a grammar and produces a C routine that can

parse that grammar.

Ans 2: Definition section- any initial ‘c’ program code

% %

Rules section- pattern and action separated by white space

%%

User subroutines section-concsit of any legal code

151

Ans 3:Command which is shell built-in eg:echo

Ans 4:Exit 0- return success,command executed successfully.

Exit 1 – return failure

Ans 5: A typical YACC file looks like:

...definitions...

%%

...rules...

%%

...code...


Q1) Explain the procedure for executing a lex program and yacc program?

Q2) What is the difference between Lex and Yacc?

Q3) What is the treatment of lines written between first %{ and }% ?

Q4) For what purpose is production rule section used for ?

Q5) Define how to “understand” the input lang and what action to take for each “sentence”?






152

Lesson 12 Fundamentals of OS

12.0 Objectives

12.1 Introduction

12.2 Operating System: Operating Systems and its functions

12.3 Types of operating systems

12.4 Real-time OS

12.5 Distributed OS

12.6 Mobile OS

12.7 Network OS

12.8 Summary

12.9 Glossary

12.10 Answers to Check Your Progress / Self Assessment Questions

12.11 References / Suggested Readings


12.0 Objectives

After reading this lesson you will be able to know meaning and type of Operating

Systems

To know about various functions performed by Operating Systems

To know the meaning and importance of RTOS (Real Time Operating Systems)

To know the meaning and importance Mobile OS, Network OS.

12.1 Introduction

Operating system is the most fundamental and most important part of a computer system. It is a

type of system software. Basically it acts as an interface between user and computer and enables

end user to control his/her computer. There are various types of Operating Systems like single

user, multi user, multi tasking, multiprogramming etc. Today, apart from computer operating

system, mobile operating systems i.e. operating system for smart phones are also available.

153

12.2 Operating Systems and its functions

12.2.1 Meaning of Operating System

A digital computer system consists of two major components – hardware and software. These

two are supplementary to each other. In software part, Operating System (OS) is one major and

core software component of any digital computer system. It performs various important tasks and

in its very basic form it also acts an interface between the end user and computer system. This is

just one job of OS, apart from this, it also acts an interface between various components of

computer system, small modules of OS which provide an interface to use various devices

attached to computer system are called “device drivers". This lets the user manage and control

various devices attached. Operating System has been defined as “an operating system is a

program that manages the computer hardware”. Thus, in one way it lets user interact with the

computer system and in other way it lets devices work in a proper manner inside a computer

system. This is depicted in following figure also.

Fig 12.1 Role of Operating System

Some important objectives of OS are :

1. Convenience: It lets the user use computer system in a convenient and user friendly way.

2. Efficiency: It lets computer system use various hardware and software resources available

efficiently.

154

3. Ability to evolve: OS is built in such a way that it permits effective development and

introduction of new functions without interfering with other services.

12.2.2 Functions of OS

Various functions performed by Operating System are explained below.

1. Acts as an interface: First and foremost function of operating system is to enable the user of

a computer system to use it conveniently and efficiently. For this, operating system provides

operating system commands and system calls. To be able to use a computer user has to learn

these operating system commands. These commands differ depending upon the type of user

interface provided by OS. User interface may be CUI (Command User Interface) or GUI

(Graphical User Interface). That part of OS which interprets user commands is called shell.

2. Process Management: Every software or its part running inside a computer system is called

a process. At any time many processes may be running. Sometimes different processes may

communicate with each other to share some data. This is called IPC (Inter Process

Communication). Further, sometimes a process may be dependent on the outcome of some

another process. For this process synchronization may be required, this is called IPS (Inter

Process Synchronization). OS provides various software tools to implement and ensure IPC

and IPS.

3. CPU Scheduling: CPU is most important and expensive, but limited resource inside a

computer system. Each process needs CPU to execute its instructions. Processes compete for

using CPU. OS decides which process will use CPU and for how long. This is also known as

processor management. There are various CPU scheduling algorithms inbuilt into the OS.

4. Memory Management: Every software or process that runs inside a computer, is loaded into

internal memory (RAM – Random Access Memory). RAM is a costly hardware and is

available in limited capacity. It must be used very efficiently. Memory may have to be shared

among processes. Deciding which process will be allocated which part and how much

memory is decided by OS.

5. Disk or Storage Space management: Storage space refers to external memory. Files of user

are saved on the storage devices because it is non volatile memory. For this space is allocated

to store user files. Storage space allocation, de-allocation, defragmentation, division of disk

space into smaller drives etc. is also performed by Operating System.

155

6. Device management: OS manages and controls various Input/Output and Peripheral

devices connected to computer system. This part of OS is called device driver. For

example, it is only the device driver that makes mouse and printer usable. Part of OS that

controls hardware is called kernel.

Check Your Progress / Self Assessment Questions

Que. 1. Which are two types of computer memory?

Que. 2. What is difference between shell and kernel?

12.3 Types of operating systems

There are various types of OS depending upon capacities, capabilities and usability of OS. Some

popular types are explained below.

Single User and Multi User: Traditional and earlier OS were very small and simple having

the capability to manage only a single user at a time. Whereas modern and contemporary OS

are very intelligent and complex and can manage more than one user simultaneously. An

example of single user OS is DOS (Disk Operating System) and example of multi user OS is

Unix.

Single tasking and multi tasking: Some OS allows only one task at a time. User can initiate

another task only after the completion and termination of first one. Memory management is

very easy as there can be only two things in memory, OS and a user process. Whereas some

other OS being larger, more complex and intelligent can manage more than one task

simultaneously, more than one process can be loaded in memory, and therefore memory

needs to be partitioned in multiple parts. Process scheduling has also to be implemented by

OS to decide which process will use CPU. This increases CPU utilization. Most of today’s

OS are multitasking.

Single Threading and multi threading: A thread is a single flow of execution inside a

process. Some multitasking OS can be single threaded whereas others may be multithreaded.

156

Multithreading is multitasking within a process. It increases CPU throughput and also

decreases average turnaround time of a process.


Ques 3. What is multithreading?

157

12.4 Real-time OS

An OS can be a general purpose operating system which is used for a computer that runs general

purpose user applications like word processing, spreadsheets, DBMS etc. But, there are some

other operating systems designed to run some special purpose applications. For example, some

processes running inside a computer are very time critical. They have strict time constraints.

They must be completed exactly within allotted time span or time deadline. Otherwise any type

of loss may occur. Real time operating systems (RTOS) are for real time applications. Real time

applications are broadly of two types, soft and hard. In case of soft real time applications little bit

flexibility in time limit violation is allowed. But in hard real time application there is flexibility

to violate timing constraints. An example of soft real time application is searching the account

balance of a bank customer who wants to execute a transaction at bank ATM. He can wait for the

ATM machine to respond within few minutes only. This is example of soft-real time application.

Some other examples of soft real time applications are Videoconference applications, E-

commerce transactions, online gaming, online chatting and IM (Instant Messaging) etc.

Hard real time applications have harder timing constraints with no flexibility or no choice of

increasing time limits. Such an application is considered a failure if it does not complete within

predefined allotted time. Few examples of such hard real-time systems include, robotic systems,

components of pacemakers, anti-lock braking system of an automobile and auto-pilot aircraft

control system etc.

CPU scheduler of such a Real Time Operating System (RTOS) is also designed to provide a

predictable or deterministic execution time and pattern. This type of operating system is best

suited for embedded systems as they systems often have real time constraints.

An example of RTOS is Free RTOS. It is a class of real time operating system which is designed

to be small sized so that it can fit on a microcontroller. A microcontroller is a small and resource

constrained processor on a single chip that include the processor and ROM to store the program

to be executed, and RAM.

Some features of Real Time Operating Systems include :

1. They must meet their timing constraints.

2. They are multitasking operating systems.

158

3. They are usually small sized so that they can fit a microcontroller.

4. They need less RAM.

5. RTOS use non-pre-emptive scheduling algorithms.

6. They are fault tolerant.


Ques 4. What is RTOS?

Que. 5. Which are two types of RTOS?

Que. 6. Write any three features of RTOS.

12.5 Distributed OS

Distributed application, distributed system and distributed operating system are different. Let us

understand these one by one. A distributed application is software which is divided into parts,

and parts are executed on different computers connected through a network. These parts interact

with each other in order to achieve a specific and common goal. Being run on multiple

computers in parallel they can complete sooner. A distributed application can also be run in

client server environment. Usually, the front end of the application requires less processing

power and run on the client machine, whereas back end being larger and complex requires more

processing power and is therefore run n a powerful server computer.

A distributed system is a collection of independent computers that work in a transparent manner

and appears to its users as a single system. Definition of distributed system: “A distributed

system consists of a group of autonomous computers linked to a computer network and equipped

with distributed system software”

Distributed Operating System (DOS) is a design of operating system which manages distributed

applications that are running on multiple computers on a network. Distributed Operating Systems

are extension of Network Operating Systems. Such an operating system has specific goals like,

information sharing, scalability i.e. possibility to add components, improved reliability, fault

tolerance etc. Distributed systems are broadly of two types, loosely coupled and tightly coupled.

Loosely coupled systems do not share memory or resources. They communicate through

159

message passing. Tightly coupled systems share memory and resources. Parallel computing and

cloud computing are special cases of distributes systems.

Advantages of Distributed Systems

Price Performance Ratio: A group of separate networked microcomputers provide more

efficiency for less cost than what a mainframe computer does.

Higher performance: N processors potentially gives n times the computational power of a

single machine.

Resource sharing: Limited and expensive resources can be shared among multiple

computers without the need to replicate resources for each system.

Scalability: Modular structure of distributed systems makes it easier to add or more systems.

Fault tolerant: Replication of processors and resources makes the system fault tolerant.


Que. 7. Which are two types of Distributed Operating System?

Que. 8. Write any two advantages of Distributed Operating System.

12.6 Mobile OS

Mobile phones of today are called smart phones and have a microprocessor inside. They are very

close to a digital computer. Just like a digital computer, a mobile phone also has ROM, RAM,

and external memory. It also performs booting when it is started and can run multiple

applications. So just like a PC a mobile phone also needs an Operating System.

Such an OS allows not only smart phones but also tablet PCs, PDAs etc. to run applications. It is

specifically designed to run on top of other mobile applications that can run on mobile devices.

Operating systems for smart phones differ from operating system of a general purpose computer

in many ways. This is basically due to difference in the features of both. A smart phone has

many unique features generally not found in a PC, like a touch screen, screen auto rotation,

finger print recognition, Wi-Fi, GPS mobile navigation, video camera, speech recognition

system etc. Also, mobile phones have comparatively less amount of memory available than a PC.

A mobile phone is like an embedded device that carries its own operating system. The mobile

160

OS determines which third-party applications called mobile applications can be run on your

smart phone.

There are different types of Operating Systems for mobile phones.

1. Operating systems that are manufacturer-built proprietary OS. This includes

Apple’s iOS for iPhone, iPad etc.

RIM BlackBerry OS for all BlackBerry phones

HP’ s own Palm Web OS for their Palm series of mobiles

2. Third party proprietary operating systems i.e the operating systems developed by some

company other than the manufacturer of phone. Such operating systems are:

Microsoft Windows Phone 7

Microsoft Windows Mobile

3. Operating systems which are Free and Open Source OS. Example in this category

include, Google’s Android OS used on many type of smart phones like Samsung smart

phones and Symbian OS used on Nokia mobiles.

Some Popular Mobile Operating Systems

1. Android OS by Google Inc.

2. Bada OS by Samsung Electronics

3. BlackBerry OS

4. iPhone by Apple

5. MeeGo OS by Nokia and Intel

7. Symbian OS by Nokia

8. webOS Palm/HP

9. Windows Mobile and Windows Phone 7 and 10 by Microsoft

12.7 Network OS

Network Operating System (NOS) is an operating system that has some special feature for

managing the host computer i.e. the computer on which it is installed and other computers on the

network (LAN). It makes easier to control various computers on the network. It generally refers

161

to OS that enhances a basic operating system by adding networking features. Basically, it acts as

a director that keeps the network running in-order.

Advantages of using NOS are resource sharing like printer sharing, database/file sharing,

application sharing, and security, and other housekeeping aspects of a network.

A major difference between network operating system and distributed operating system is that in

NOS, network being LAN (small area), users are aware of multiplicity of machines, whereas in

case of DOS, which may be spread over a large area, users not aware of multiplicity of machines.

Also, DOS is an extension of NOS and it supports even higher levels of abstraction, co-operation

and integration of the machines available on the network.

The environment provided by and managed by NOS consists of group of loosely connected i.e

such machines that does not share memory, but are connected by external interfaces that run

under the control of NOS. Just like an ordinary OS, NOS also provides services to its users and

application software that run on the top of OS layer, but the type of services and the manner in

which they are provided are quite different from that of an ordinary OS.

Broadly, there are two variants of NOS, Client Server and Peer to Peer. In Client Server model, a

host computer acts like a server and provide services demanded by client computers. It acts like a

master controller. In case of peer to peer model any computer can provide any service to any

other. For example, one computer may act as print server, another one may be file server etc.

Operating systems like UNIX, Mac OS, Novell Netware, Microsoft Windows Server, and

Windows NT are examples of a NOS.


Que. 9.True/False: DOS is an extension of NOS?

Que 10. Which are two types of NOS?

162

12.8 Summary

Operating system is the most fundamental and most important part of a computer system. It is a

type of system software. A digital computer system consists of two major components –

hardware and software. These two are supplementary to each other. In software part, Operating

System (OS) is one major and core software component of any digital computer system. Major

jobs of an OS include: it acts an interface between user and computer, Process Management,

CPU Scheduling, Memory Management, Disk or Storage Space management. Device

management. Basic types of OS include, Single User and Multi User, Single tasking,

multitasking, single threading and multithreading.

RTOS is Real Time Operating System, its CPU scheduler is designed to provide a predictable or

deterministic execution time and pattern. An example of RTOS is FreeRTOS. Distributed

application, distributed system and distributed operating system are different. Distributed

Operating System (DOS) is a design of operating system which manages distributed applications

that are running on multiple computers on a network.

Mobile phones and smart phones also have an Operating System, these are called mobile OS.

Such an OS allows not only smart phones but also tablet PCs, PDAs etc. to run applications. It is

specifically designed to run on top of other mobile applications that can run on mobile devices.

Network Operating System (NOS) is an operating system that has some special feature for

managing the host computer i.e. the computer on which it is installed and other computers on the

network (LAN).

12.9 Glossary

OS : Operating System

DOS : Disk Operating System, Distributed Operating System

NOS : Network Operating System

PDA : Personal Digital Assistant

Android: Mobile Operating System by Google.

Windows: PC Operating System by Micro soft.

CUI : Command User Interface

163

GUI: Graphical User Interface.

RTOS: Real Time Operating System

LAN : Local Area Network

FreeRTOS: A Real Time Operating System


Ans 1. Broadly, there is two types of computer memory. Internal memory (RAM and ROM) and

External memory (Hard Disk, CD, Pen Drive).

Ans 2. Shell is that part of OS that interprets user commands. Kernel is that part of OS that

controls and manages hardware.

Ans 3. Multithreading is multitasking within a process.

Ans. 4. RTOS is Real Time Operating Systems

Ans 5. RTOS are of two types, Soft RTOS and Hard RTOS.

Ans 6. Features of RTOS include : they are multitasking, they are small sized, they use non-

preemptive CPU scheduling.

Ans. 7. Loosely coupled and tightly coupled

Ans. 8. Price performance ratio. Resouce Sharing.

Ans 9. True.

Ans 10. Client Server model and peer to peer model


Operating System Concepts by Abraham Silberschartz, Peter Baer Galvin

Fundamentals of Operating System by Anshuman Sharma, Anurag Gupta


164

Which are various functions of an OS?

Which are various types of OS?

What is difference between a Network Operating System and Distributed Operating

System?

What is Mobile Operating System?

Which are various Mobile Operating Systems?

165

Lesson 13. Booting Techniques and Device Drivers

13.0 Objectives

13.1 Introduction

13.2 Booting techniques and subroutines

13.3 Introduction to Device Drivers

13.4 USB and Plug and Play systems

13.5 Summary

13.6 Glossary

13.7Answers to Check Your Progress / Self Assessment Questions



13.0 Objectives

To know about the meaning and techniques of booting

To know about subroutines

To explore the details of device drivers

To explore the concept of USB (Universal Serial Bus)

To know about PnP (Plug and Play) devices

13.1 Introduction

When any electronic device like a PC or mobile phone is switched on, it loads its Operating

System, this is called booting. There are different types of booting. A subroutine is a sub

program that assists main program and called by main program to perform specific task. A

device driver is system software needed by any device that is connected to a PC for example a

mouse or a keyboard etc. A bus is electronic circuit or a collection of wires through which data

travels from one place to another. A bus can be internal or external. USB is a similar type of bus.

166

Modern devices when connected to a computer do not ask for a restart. They start functioning

immediately; they are called Plug and Play (PnP) devices.

13.2 Booting Techniques

Every computer needs an operating system. Operating system is system software that provides an

interface between a computer and its user. It is compulsory part of computer software. Initially

operating system of a computer is saved into external memory which may be a hard disk. When a

computer starts up, operating system is loaded from external memory into internal memory so

that it can start its working. This process of loading operating system of an electronic device into

its memory is called booting; this is also termed as booting up a computer. On larger computers

like mainframes and minicomputers and alternative term “Initial Program Load" is used for

booting.

Figure 13.1 Booting Techniques and Types

13.2.1 Types of booting

Depending upon how and when booting takes place, it is of different types. There can be cold

booting or hard booting, warm booting or soft booting, local booting and remote booting.

167

Cold Booting: When compute is switched on initially, booting takes place, this is called hard

booting or cold booting. During hard booting POST (Power On Self Test) is also performed, so it

make take comparatively more time. When booting process completes, computer lands in the

normal, operative, run time environment and control is given to user for further working. It is

called cold booting because power supply, motherboard and other resources start their work from

cool condition. A cold boot completely cuts the power from the motherboard and let it reset all

its components and clears memory completely. Cold boot means the hard drive stops and then

start spinning again, and various components may cool down.

Warm Booting: Sometimes a computer may need to be restarted because it may hang, or after

installing a new hardware or software it may ask for a reboot. When a computer is restarted or

rebooted, it is called warm booting. Warm booting takes less time because it does not perform

POST. Actually, various components on the motherboard of the computer do not stop its

working during warm boot, so the BIOS (Binary Input Output System) cannot perform a full

start-up sequence with a POST routine during warm boot. Warm booting can be performed in

many ways, for example by hitting restart button at CPU, by pressing Ctrl+Alt+Del keys from

keyboard or also by using some software.

13.2.2 Techniques of Booting

DOS/Local Booting: When bootstrap loader is available on the hard disk of the machine itself

and the operating system that it loads is also available locally on the same machine, it is called

local booting, since the concept was very popularly used in PC with DOS, therefore it is also

called DOS booting. Initially, a small software called boot loader or boot strap loader (BSL)

loads an operating system from external memory into internal memory after completion of the

power-on self-tests. Boot strap loaded is started by another software which is on ROM (Read

only memory). During booting a PC also performs some sort of initial processing called POST

(Power on Self Test). Exact sequence of booting is given below.

User turns on the Power

Internally, processor pins and registers are reset to specific value.

After this, CPU starts execution of instructions at first address of BIOS (0xFFFF0).

168

In turn, BIOS runs POST (Power-On Self Test) and performs some other necessary

checks.(BIOS is the firmware in the ROM).

After this, BIOS jumps to Master Boot Record program which is at 1st sector of disk

MBR runs Primary Boot-loader

Boot loaders loads Operating System into memory.

Flash Booting: There is a need of minimizing the booting time for embedded systems for

optimizing the processing time from the starting stage of the boot loader to mounting the

Operating System. There is an efficient fast booting technique for embedded systems based on

flash memory. In case of embedded systems a small flash boot loader program stays in flash

memory and is always the first and automatic application that runs when device is switched on.

Commonly used flash boot loader programs are based on Control Area Network (CAN) to

provide startup data to Electronic Control Unit of the device.

Remote/Network Booting: Another technique of booting whereby a computer that does not

have its own hard disk and operating system (and is therefore called dumb terminal) is given

booting instruction from a remote intelligent machine called server.

Thus remote booting refers to booting up a client machine from a server. Remote boot capability

is necessary for diskless workstations and network computers, but sometimes it may also prove

helpful for restarting failed desktop machines. An advantage of remotely booting a computer via

network is that it does not need to have its own hard disk and its own OS, so it results in a less

costly machine. Any further upgrades or changes to Operating System are also easier to make.

Further, removal of internal hard drives from a machine makes it diskless, and therefore it will

consume less power and will also generate less heat. It also means that can be packed more

densely, and there is less need for localized cooling.

But there is a problem also, if due to some reason, server that performs remote booting fails, the

remote machine will not boot and will not function, but having redundant or backup of server

can solve this problem. For example, Diskless Remote Boot in Linux (DRBL) is a network file

system server providing a diskless or system less environment for client machines.

169

13.2.3 Booting Subroutines:

A subroutine is a subprogram called by main program that performs some specific task and exits.

Main program may send some values called arguments to the subroutine which are used by

subroutine for processing the task it has been assigned. A subroutine, at end, may send some

processed results back to the main or calling program, but in that case subroutine is more

specifically called a function subprogram. Some boot programs use BIOS I/O routines to read

from the disk.

Subroutine are heavily used in booting by ROM BIOS (Read Only Memory - Binary Input

Output System), these are known as booting subroutines.

Generally, a hard disk is divided into multiple partitions and OS is saved on one of the partitions

called primary partitions. For a multi-boot system, multiple operating systems can be loaded on

different partitions. Some special area generally a single disk sector of usually 512 bytes is

reserved near the beginning of a partition, called PBR or Partition Boot Record, and code saved

on this partition is called boot block. After loading this boot block in internal memory of

computer, firmware or ROM BIOS passes control to that code. That initial boot code completes

loading the operating system in several steps. Fetching data from a disk drive is very complex.

and therefore the boot code generally uses subroutines that are in the ROM for reading the disk.

But there can be some limitations in system booting since full disk reading capacities are not

available till the booting is complete and operating system has been loaded Until that time, all

the limitations present in ROM disk subroutines utilities will affect the booting process.


Que. 1. What are different types of booting?

Que. 2. What is MBR?

Que. 3. Where is Boot block loaded?

13.3 Introduction to Device Drivers

170

A device driver is a type of system software. It is a small but complex software module that

enables communication between a computer and its connected peripheral device. A device driver

is used to tell operating system how does the device work. Basically, it is another software layer

that lies between the application layer and the hardware layer containing actual device. A device

driver is actually part of kernel of an operating system, and therefore it can access kernel

functions. Very basic purpose of a device driver is to instruct CPU about how to communicate

with Input Output device by translating the operating system's input output related commands in

a language that the device can understand. A device driver has two interfaces, on one side it has

to interact with OS kernel, and on the other side it has do interact with the device for which it is

built.

One major function of a device driver is to fetch data from the device buffer and pass on it to the

operating system kernel for further processing. It also does the job of handling and reporting I/O

errors if any. A device driver is a major part of an operating system. Approximately, 70% part of

operating system code is device drivers. A typical version of Linux has approximately 36

different device drivers. Most of the operating system bugs are also due to bugs in device drivers,

approximately, 70% of operating system failures are caused by driver bugs. Device drivers are

usually operating system and version specific, for example Windows XP device drivers may not

be compatible with Windows 7 or later versions. Many device drivers are part of OS kernel, but

if a device is produced after the OS release, it will come with its own device driver software on a

CD that accompanies the device or can be downloaed from the manufacturer's website.

Types of drivers

There are broadly two types of device drivers real and virtual and there are two modes of

working of device drivers, user mode drivers and kernel mode drivers. User mode drivers are

those that work in user mode, for example a printer driver is made for printer which is used by

user. On the other hand kernel mode driver is a driver which works in kernel mode for example

driver for cache memory and various mother board components is kernel level. Some kernel-

mode drivers conform to the Windows Driver Model (WDM) and support Plug and Play (PnP),

and power management, and are also source-compatible across various versions of Windows.

Windows 98/Me and Windows 2000 and later operating systems. Kernel level drivers are

171

generally divided into three levels, First being Highest Level. Some file system related drivers

(NTFS, FAT, CDFS) fall in this category, they depend on next level drivers. Intermediate level

or middle level drivers are device-specific. Drivers for specific peripheral devices and software

bus drivers fall into this category. Third is lowest level, system supplied hardware bus drivers

and legacy drivers fall into this category.

Virtual device drivers

Another type of device drivers is virtual device driver. They handle software interrupts in

contrast to hardware interrupts handled by their counterparts. Whereas an ordinary device driver

has a .dll or .exe file in windows, a virtual device driver has .vxd or .vlm file. The basic purpose

of virtual device drivers is to emulate a hardware device in "virtual machine" environments.

They are further of two types static and dynamic. Actually, device drivers in windows are of

two types : Virtual Device Drivers (VXD) and Windows Driver Model (WDM). Virtual device

drivers are older, and are less compatible with new versions of windows, while real mode drivers

or WDM drivers are supposed to be fully compatible to all windows versions till Windows 98.

Features of a device driver

Abstraction : Device driver works as “black box” that make a device work as per well-defined

internal programming interface only, hiding the details of how the device works.

Unification : Device driver makes similar devices look and work in similar manner.

Protection: Only authorized applications can access the device via using its associated device

driver.


Que. 4. What is WDM?

Que. 5. Which are two modes of working of a device driver?

Que. 6. What is NTFS?

13.4 .USB and Plug and Play systems

172

13.4.1 USB :

USB is Universal Serial Bus. It is a type of external expansion bus for a PC. An expansion bus is

a collection of tiny wires that allow for computer expansion with the use of an expansion board

or expansion card which is a PCB (Printed Circuit Board) on the motherboard providing some

additional features to a computer system like providing an input/output pathway for transferring

data between computer memory and an expansion device.

Figure : 13.2 USB Logo

Figure 13.3 USB Configuration ( Source : www.usb.org/developers/docs/whitepapers/usb_20g.pdf)

USB standard was introduced in 1995 to replace the variety of connectors connected to different

types of ports like serial, parallel, keyboard, mouse, SCSI and Ethernet ports. It was a joint

venture of big hardware and software companies like Intel, HP, Compaq, Lucent, Microsoft,

NEC and Philips. With USB, any device can be attached at any USB port provided at backside or

at the front side of CPU cabinet. All USB ports are of same shape and size. Also, the wires

required to connect the device with the CPU cabinet are also alike and any wire can be used with

173

any device. It was not so earlier with different types of serial and parallel ports which have their

own shape and size and required different types of jacks to be plugged in. Now a days, most of

the computer manufacturers have started giving only USB ports at the backside of CPU cabinet

to connect keyboard, mouse, printer, scanner etc. And every hardware manufacturer have started

supplying only USB cables with the device. So now a newly purchase printer can be connected

to a PC via USB cable only. An additional advantage of USB is that it allows variety of

peripheral devices to be self-configuring. This allows the user to plug in a device at a USB port

and let the OS install the drivers required by the device. USB promises to one day make adding

new devices truly plug and play. As the name implied USB is a serial bus. It means data flows in

a series of pulses along one pair of wires internally (in a parallel connection, data flows parallel

in many pairs of wires, and communication is faster). An advantage of serial bus is that wires

used are tiny, thinner, longer and easier to use and carry. Also, USB is "party line" means, all

devices on the bus share same communication channel. Up to 127 devices can be daisy chained

on the same USB connection as shown in figure 13.4 below.

Figure 13.4 USB Configuration ( Source : www.usb.org/developers/docs/whitepapers/usb_20g.pdf)

The USB wire also carries a limited amount of power to the devices and that is why a mobile

phone can also be connected to a PC with USB wire and it can start charging by dragging some

power from PC. Further, the same USB can also act as data cable and power cable when a smart

phone is connected to a PC using USB cable.

174

USB standards and version are developed an industrial body called the USB Implementers

Forum (USB-IF). Since its release, USB is continuously going under revisions to improve

performance. Initial version, USB 1.1 released in 1996 had approximately 1.5 Mbps, USB 2.0

had 480 Mbps, USB 3.0 is a fastest version of the USB till now, it is also called as Super Speed

USB, it is theoretically capable of transferring data at 4,800 Mbps (Mega bits per second) i.e. 4.8

Gbps (Giga bits per second). Various USB version are shown in figure below.

Figure 13.5 USB Versions. (Source : http://www.digitalcitizen.life/simple-questions-what-

usb-or-universal-serial-bus)

USB connectors : There are different types of USB connectors for diffent standards of USB.

USB connectors are named as A-type, B-Type in its original specification. Later connector C-

Type was also developed to be used with USB 3.0. A-type connector is a flat, rectangular

interface and A-A cables are used to connect a-type connectors to USB port. B-type connector is

used on USB peripheral device like a printer. It is square shaped, and it requires A-B cable. C-

type connector is newest type used with USB 3.1 it is reversible and symmetrically designed and

can be plugged into any USB-C type port irrespective of its vertical orientation. Micro-USB A

type connector has compact 5 pin design and it is used with charging cable of smart phones.

Figure 13.6 USB Connectors.

175


Que. 7. What is USB ?

Que. 8. Which is latest standard of USB?

Que. 9. Name any three types of connectors of USB ?

13.4.2 Plug n Play

Plug and Play (abbreviated as PnP) is hardware and software technique and describes devices

that get ready to work as soon as they are plugged in or connected. It is a software technique

given initially by Microsoft it operating system Windows 95 and lets the user plug a device and

lets the PC automatically detect and configure or install it quickly with no user involvement and

makes it ready to use immediately. Microsoft highlighted the technique to increase its sale of

Windows 95 though a similar technique had already long been built by Apple Inc. in its

Macintosh computers. PnP has 3 requirements :

PnP BIOS : Firmware of a PC must support PnP so that a device is automatically detected as

soon as it is connected.

ECSD (Existing System Configuration Data) : A software that contains information about

various installed PnP devices.

PnP OS : Underlying Operating system must also support PnP. For example, in windows there is

a component called Plug and Play manager that interacts with Hardware Abstraction Layer of

operating system kernel for PnP to work.


Que. 10. With which OS PnP was started by Microsoft?

Que. 11. Which are three requirements of PnP?

176

13.5 Summary

When a computer starts up, operating system is loaded from external memory into internal

memory so that it can start its working. This process of loading operating system of an electronic

device into its memory is called booting; this is also termed as booting up a computer.

Depending upon how and when booting takes place, it is of different types. There can be cold

booting or hard booting, warm booting or soft booting, local booting and remote booting, A

subroutine is a subprogram called by main program that performs some specific task and exits.

Main program may send some values called arguments to the subroutine which are used by

subroutine for processing the task it has been assigned. A device driver is a type of system

software. It is a small but complex software module that enables communication between a

computer and its connected peripheral device. A device driver is used to tell operating system

how does the device work. USB is Universal Serial Bus. It is a type of external expansion bus for

a PC. An expansion bus is a collection of tiny wires that allow for computer expansion with the

use of an expansion board or expansion card. Plug and Play (abbreviated as PnP) is hardware and

software technique and describes devices that get ready to work as soon as they are plugged in or

connected. It is a software technique given initially by Microsoft it operating system Windows

95

13.6 Glossary

Booting : Loading operating system into internal memory of computer at start up.

Cold Booting : When computer starts up from the state when it was powered off.

Warm booting : Rebooting or restarting a computer.

Subroutine: a sub program called by main program to do a specific task.

DOS : Disk Operating System

MBR : Master Boot Record

Sector 0 : First sector of hard disk where MBR is loaded.

USB : Universal Serial Bus

Expansion Bus : Collection of tiny wires that allow for computer expansion with the use

of an expansion board

PnP : Plug and Play, a technique given by Microsoft where by a device is automatically

detected and configured by computer when device is attached to it.

177


Ans 1. Cold booting and warm booting

Ans 2. Master Boot Record

Ans 3. Sector 0 of hard disk

Ans. 4. Windows Driver Model. It is model given by Microsoft for real mode device drivers.

Ans 5. User mode and Kernel Mode

Ans 6. NTFS is NT File System (New Technology File System).

Ans. 7. USB is Universal Serial Bus, it is external expansion bus.

Ans. 8. USB 3.1

Ans. 9. Type-A, Type-B, Type-C, Type-A Mini, Type-A Micro

Ans 10. Microsoft Windows 95

Ans. 11. PnP BIOS, ECSD, PnP OS


Fundamentals of Operating System by Anshuman Sharma, Anurag Gupta

System programming : Dinesh Gupta, Kalyani Publisher


1 Which are different types of booting?

2. Which are different techniques of booting?

3. What is a subroutine?

4. Explain USB.

5. Which are different types of connectors used with USB?

6. What is PnP?

Unit 4

178

Lesson 14.

System Programming API.

Chapter Index

14.0 Objectives

14. 1 Introduction

14.2 I/O programming

14.3 Systems Programming (API’s)

14.4 Summary

14.5 Glossary


14.7References / Suggested Readings


14.0 Objectives

To know the meaning and importance of I/O Programming

To study about various I/O Programming techniques.

To know the meaning of API.

To explore System Programming API.

14.1 Introduction

I/O programming is concerned with programming of I/O devices, i.e. how to fetch input and how

to save output of a device. Mostly devices are accessed via I/O ports. A Port is nothing but just a

special memory address and maps to input output pins on the device, these pins are actually

hardware interface to the device. I/O Programming is of two types, Synchronous I/O

Programming and Asynchronous I/O Programming.

API stands for Application Programming Interface. It can be available in the form of library

files, routines, modules, software tools, inbuilt-functions, DLL (Dynamically Link Library) files

etc. and is used in developing other software and applications. API act as support software.

Importance of API is to make it easier to develop a program by providing all the building blocks,

179

which are then put together by the programmer. Depending on its type and domain, API can be

system API, web API, database API etc. It supports software reusability.

14.2 I/O programming

I/O programming is also known as device programming. Any Input Output device is useless if it

does not communicate with the outer world or with the computer it is attached to. So it needs to

employ I/O techniques for both i.e. getting data from outside world or user and to give stored

data back to the user. I/O devices are controlled either directly or indirectly by their own

processor called controller. Each Input Output device is accessed by main processor through

input output ports. A port is just an address just like memory address. A port maps onto input

pins or registers on the device. Data can be received from or sent to the device in two ways,

synchronously or asynchronously. Synchronous transmission keeps both receiver and sender

synchronized with clock pulse and requires hand shaking. Asynchronous transmission does not

need clock signal and hand shaking between sender and receiver. Any hardware device that can

be programmed uses two processes sensor and actuator,

Input/output for most computers is asynchronous. Input Output device only initiates an

operation, actual transfer takes time depending on the nature of the device, data transfer

operation etc. Each device has a controller chip (an electronic circuit) which controls the device

as instructed by CPU.

14.2.1 IN and OUT instructions

Assembly language provides some instructions for interfacing with input output devices. Two

popular instructions are In, Out. IN and OUT are used to transfer data between microprocessor's

accumulator register that can be AL, AX or EAX and an I/O device while executing IN and

OUT, I/O port address is stored in Register DX or the fixed byte address immediately following

the op-code.

180

Figure 14.1 Sample IN, OUT instructions

(Source : http://ece-research.unm.edu/jimp/310/slides/8086_IO1.html)

IN instruction starts fetching information from device into memory, storing the data transferred

from device at the memory locations at effective address. The previous contents of the memory

are overwritten by the storage of the new information.

OUT instruction sends data from memory to an output device. The data is transferred from

memory address specified by the effective address in OUT instruction.

For both the IN and OUT instructions transfer one record of information at a time. A record is

the physical unit of information naturally handled by I/O device.

When IN or OUT instructions start a data transfer operation device may be in ready state waiting

for a new command or it may be already busy with a previous data transfer still running. If it is

busy, once the device finishes previous data transfer, it immediately starts executing next waiting

command. Device status whether ready or busy is indicated by the ready/busy bit in the device

controller. The setting and clearing of this bit is totally controlled by the device itself.

So, if an input output device is ready, its ready/busy bit will be set and it will take only one uint

of time to complete an input output instruction. But if device is busy, its ready/busy bit will be

cleared, and it may take long unpredictable units of time to wait and to become it free to start

executing next instruction and to become busy again. This extra waiting time is

called interlock time.

181

14.2.2 Synchronous and Asynchronous I/O Programming

Synchronous I/O Programming

This I/O technique waits for a function call or an input output instruction execution to complete

before starting next work. This is convenient, easy and efficient method from the programmer's

point of view. But it is not a good choice in multitasking environment where many jobs are

running simultaneously, it can create many problems.

Asynchronous I/O Programming

It does not force other processes to wait. Therefore, it is the preferred technique for programming

device drivers of many multitasking operating systems like Windows. Supporting asynchronous

I/O is one of the major design goals of developing device drivers. It is more complicated than

synchronous I/O programming.


Que 1. What is controller?

Que. 2. Which two states a device can be in at any time?

Que. 3. Which two instructions are used largely for device input output?

Que. 4. Which are two types of I/O Programming techniques?

14.3 Systems Programming (API’s)

14.3.1 What is System Programming

Operating Systems restrict application software running on the top layer from directly accessing

critical system resources as a preventive security measure. For this, operating systems provides

different interfaces to application software that they can use for managing these system

resources. For example, operating system may restrict a user application from directly drawing to

182

the screen or read or write system memory directly. For performing these operationsan

application software have to use system service through system calls. This is actually what is

done by C language functions like printf or scanf etc. i.e they make calls to system routines that

perform some tasks on their behalf. Similarly dynamic memory allocation functions of C like

malloc, calloc etc. do not allocate memory directly rather they also use specific in built system

routines for this purpose.

Such system calls are just like an interface provided by an OS for application software to use to

access system resource, and this is part of system API. Writing this type of system API is system

programming.

14.3.2 What is API?

API is Application Programming Interface. It is a set of files, routines, modules, routines,

software tools, functions etc. used in developing other software and applications. They act as

support software. Importance of API is to make it easier to develop a program by providing all

the building blocks, which are then put together by the programmer. API provides a way to

interact different software components in many ways by enabling data sharing and content

sharing among software components. It also supports application reuse. Actually API have been

there since long when the operating systems started to emerge. But with the increase in

complexity of Operating Systems, Internet Applications, Data Base Management Software,

System Software etc. their popularity and use has been increasing. Many system software

provide their own API to reduce the efforts of application programmers who can now utilise the

system library to make their software, it lets them concentrate on main job rather than worrying

about how a particular low level functionality is to be created. APIs are developed by

experienced senior application and system programmers. API management refers to ensuring that

API performs well and consistently and does not affect the performance or security of the

backend software components they offer to use and they interact with. API Publisher is an

organization that develops APIs and offers it to internal, partner or third-party developers of

client applications. There are different types of API for example, System programming API (like

file system API, device drive API) or it may be for an Operating System( for example Windows

API or Android API), or a DBMS (for example, Oracle API) or it can be for a web based (for

183

example e-bay API). Whatever type it is, it makes convenient to develop applications for that

system using a programming language. For example, a developer developing mobile phone apps

for Android based mobiles will like to use an Android API to interact with underlying system

hardware, like rear camera or keypad etc.


Que. 5. What is API ?

Que. 6. Name any two types of API.

14.3.3 Types of System API

An API can be placed in a category depending on its type of action performed and by the system

function it relates to. Many APIs are used alone where as some other can be used together to

perform a task or a function. Some of the more common types are :

a. List APIs : These are used to return lists of information about something on the system.

b. Retrieve APIs: These APIs return requested information to an application program.

c. Create, change, and delete APIs : This category of APIs work with objects of a

specified type on a system.

d. Other APIs: Other types of miscellaneous API that perform a variety of actions on the

system.

14.3.4 API Parameters

Once it is know that what type of API is available and what type of API is to be used while

developing an application, it is required to know what is signature of a function made available

in API. Signatures of a function include, its name, return type and its arguments or parameters.

Any API-function can have different types of parameters, three types of parameters are :

Mandatory : These are types of parameters that must be given in the order specified.

Function will not work, and may return error if mandatory parameters are not given.

Optional: It means that you may or may not specify these parameters. If not specified,

function may assume default values.

184

Omissible: It means that it can be omitted. When these parameters are omitted a null

pointer must be specified as an argument.

14.3.5 Who should use API

It is little difficult to learn API due to its complexity and large number of functions, procedures,

hard-to-remember keywords etc., so ther are actully used by experienced application and system

programmers who develop complex application-level and system-level applications.


Que. 7. Which are different types of System API?

Que. 8. Which are different types of API parameters?

14.3.6 Windows API

The Windows API is collection of modules, functions, procedures, interfaces, objects, structures,

unions, macros, constants, data types and many other such programming elements used to create

Windows based applications, it is known as WinAPI or Win32API. It has been created for

programmers of C and C++, and is directly used to create Windows applications. Win API is

organized in different categories like, Core services, Security Services, Graphics Programming,

User Interface Development, Multimedia capabilities, Windows core shell, Networking related

services etc.

First, Core Win API services deal with fundamental resources on Windows like file

system, I/O devices, processes and threads, windows registry etc.

Second, Security services provides interfaces for authentication, authorization, encryption

decryption etc.

Graphics subsystem includes the GDI (Graphics Device Interface), GDI+, DirectX or

OpenGL.

185

UI Development provides functionality to create windows and windows based controls

like text boxes, dialog boxes, windows etc..

Multimedia related API makes available various tools for working with audio, video,

graphics devices.

Windows shell provides access to the basic functionality provided by the OS shell.

Networking API provides access to the network related functions like FTP, Telnet, SMTP

etc. available in Windows.

Microsoft provides documentation of its Win API using its MSDN (Microsoft Developer

Network). Actual implementation of the Win API functions is the form of DLL (Dynamic Link

Libraries) files like Kernel32.dll, User32.dll, GDI32.dll etc. placed in the Windows system

directory. With each next release of Windows, Win API is growing in number of functions

available.

14.3.7 UNIX API

The core part of UNIX OS is its kernel. It provides UNIX's services directly or indirectly. The

kernel is a large complex program, actually a collection of interacting programs with many entry

points. These entry points provide services which kernel performs. Collection of these kernel

entry points constitutes UNIX's API. Kernel is a collection of separate functions, bundled

together into a large package, and its API is collection of signatures or prototypes of these

functions. UNIX API is in the form of system calls. These system calls are interface provided by

UNIX for application software to use to access system resource, and this is part of system API.

A system call just like an ordinary function call in the sense that it also causes the control to

jump to the function body, and a return to the caller function after completing function

execution. But its significantly different since it is actually a call to a function that is part of the

UNIX kernel and is provide by developer of the Kernel.

186

Figure 14.1 : UNIX API. (Source : http://alandix.com/academic/tutorials/courses/Prog-

I.pdf)

14.4 Summary

I/O programming is concerned with programming of I/O devices. I/O programming is also

known as device programming. Mostly devices are accessed via I/O ports. A Port is nothing but

just a special memory address and maps to input output pins on the device, these pins are

actually hardware interface to the device. I/O Programming is of two types, Synchronous I/O

Programming and Asynchronous I/O Programming. Assembly language provides some

instructions for interfacing with input output devices. Two popular instructions are In, Out.

187

API stands for Application Programming Interface. It can be available in the form of library

files, routines, modules, software tools, inbuilt-functions, DLL (Dynamically Link Library) files

etc. and is used in developing other software and applications. There are different types of API

for example, System programming API (like file system API, device drive API) or it may be for

an Operating System (for example Windows API or Android API), or a DBMS (for example,

Oracle API) or it can be for a web based (for example e-bay API). Whatever type it is, it makes

convenient to develop applications for that system using a programming language. Types of

System API are List API, Retrieve API, Create/Change/Delete API etc. The Windows API is

collection of modules, functions, procedures, interfaces, objects, structures, unions, macros,

constants, data types and many other such programming elements used to create Windows based

applications, its is known as WinAPI or Win32API. Win API is organized in different categories

like, Core services, Security Services, Graphics Programming, User Interface Development,

Multimedia capabilities, Windows core shell, Networking related services etc. Microsoft

provides documentation of its Win API using its MSDN (Microsoft Developer Network). Win

API is organized in different categories like, Core services, Security Services, Graphics

Programming, User Interface Development, Multimedia capabilities, Windows core shell,

Networking related services etc. UNIX API is in the form of system calls. These system calls

are interface provided by UNIX for application software to use to access system resource, and

this is part of system API.

14.5 Glossary

API : Application Programming Interface

Port : A Port is just a special memory address and maps to input output pins on the device, these

pins are actually hardware interface to the device.

API management : It refers to ensuring that API performs well and consistently and does not

affect the performance or security of the backend software components they offer to use and they

interact with.

API Publisher : It is an organization that develop APIs and offer it to internal, partner or third-

party developers of client applications.

DLL : Dynamically Link Library files on windows.

188

Win API : API provided and used by Windows, also known as Win32 API.

MSDN : Microsoft Developer Network. Documentation of Windows API.


Ans. 1. Controller is a chip on the device that controls it as directed by main CPU.

Ans. 2. A device at any time can be in ready state or busy state.

Ans. 3. IN and OUT.

Ans. 4. Synchronous and asynchronous.

Ans 5. Application Programming Interface is group of library routines.

Ans 6. System programming API, Web API.

Ans 7. List APIs, Retrieve API, Create or Change API

Ans. 8. Mandatory, Optional and Omissible.


Win32 Programming by Brent E. Rector Publisher: Addison-Wesley

Win32 API Programming with Visual Basic by Steven Roman, Publisher :O'Reilly


1. What is meaning of I/O Programming?

2. Which system calls does Windows provide for I/O Programming?

3. What is API? What is its use?

4. Write a note on Windows API.

Self Learning Material System Programming

Documents