1. Introduction to DBMS - Amizone · 1. Introduction to DBMS ... Data Redundancy ... addition, it may lead to data inconsistency i.e. various copies of same information may

1. Introduction to DBMS 1.1 INTRODUCTION

DataBase Management System has evolved from a specialized computer application to a

central component of computing environment. Database system plays a vital role in

organizing data about a particular enterprise. Consider an example of a company which

stores data about following :

Employees (Employee No., Name, Address, Salary).

Departments (Department No., Name, Location)

Project (Name, Project No., Department No., Location)

Which may have following relations :

An Employee works in a Departments.

An Employee works on many Projects.

A Department handles many Project.

Therefore, a system is needed which can effectively organize the data and also use it to

analyze and guide operations of the company.

Now–a–days, the amount of information to be stored is increasing tremendously and thus

the need of flexible and powerful system is also increasing day–by–day which has the

ability not only to effectively organize or maintain large collection of data but also

provides easy access to data.

1.2 CONCEPT OF DBMS

This section covers basic definitions related to DBMS and also explain various

components of DBMS. Data Base Management System consists of two components :

(i) Database i.e. collection of data

(ii) System or set of programs which are used to access and manage the database.

By incorporating these two components, DBMS organize the information, maintain it and

retrieve it efficiently as and when required. So to use and understand the system, as well

maintained database and set of programs are needed.

1.2.1 Concept of Data and Database

The word Data is derived from a Latin word which means ‘to give’ thus data are given

facts from which additional facts can be inferred. Data, are facts or undoubted

information used for different computations or calculations. For example – the facts

related to an employee in a company like his Employee No., Name, Salary, Designation

etc. are data but when these data are retrieved or processed to find answers of questions

like :

What is Employee No.of employee whose salary is more than 10,000 ?

What is name of employee whose Employee No. is 24 ?

Then it becomes information. Thus information is a processed form of data and database

is a logicially coherent collection of data, not the information, with same meaning.

A database, is a collection of interrelated data which represents some aspects of real

world. Database has some inherent meaning and is related to a particular group of users

or applications. For example – Database of a college, may contain data about students,

faculties, courses etc. which are related to each other with certain relations like – faculty

teaches students, students are enrolled in courses etc. Thus, we can say that database

contains the data, related to a real world enterprise, and is designed, built and populated

with data for specific applications related to the enterprise.

1.2.2 Definition of DBMS

A database management system (DBMS) is essentially a collection of interrelated data

and a set of programs to access this data. This collection of data is usually called the

database. Database systems are designed to maintain large volumes of data. Management

of data involves :

• Defining the structures for the storage of data.

• Providing the mechanisms for the security of data against unauthorized access.

The primary objective of a DBMS is to provide an environment that is convenient

and efficient to use, in retrieving information from and storing information into the

database.

The user of the DBMS is provided the following facilities among others:

• Adding empty files to the database.

• Inserting new data into the existing files.

• Retrieving data from the files.

• Updating data in the files.

• Deleting data from the files.

• Removing existing files from the database.

Therefore, DBMS can be used for different purposes besides data storage which are as

follows :

(i) Efficient access to data.

(ii) For avoiding data redundancy and inconsistency.

(iii) For providing security of data.

(iv) For enforcing different integrity constraints.

(v) For providing access for data to multiple users concurrently.

1.3 HISTORY OF DBMS

Most of the software applications focus on the manipulation of the data from the starting

days of computer. So, there is a need arises for a system that helps in storing and

manipulating the data. The first general purpose DBMS was designed by Charles

Bachman in early 1960’s and called as Integrated Data Store. It founded the basis for the

network data model and influenced database system through 1960’s. In 1966, IBM

released the first commercially available DBMS called IMS (Information Management

System) which based on the hierarchial data model and assumes all data relationship to

be structured as hierarchies. Conference On Data Systems Languages (CODASYL) set

standards for network database product in 1969.

Dr. E. F. Codd, an IBM researcher, proposed relational data model in theoritical paper in

1970. The publication of Codd’s paper in early seventies set off a flurry of activities in

both research and commercial system developments communites and they worked to

bring out a relational DBMS. IBM developed a relational model prototype in 1976. In

1980’s, the relational model was developed as a standard approach for DBMS. SQL is

developed as a part of IBM’s system R project which becomes a standard query

language. So IBM released first commercially available Database product based on

relational model SQL/DS for interactive operating system in 1981. IBM produced DB2

for its mairyraness with batch operating system in 1982. SQL was standardized and was

adopted as a query language by ANSI and ISO. Many developments were being done in

1980’s and 1990’s in the area of database system, which include the release of Paradox,

DBase, Foxpro and Access. Different researches worked out to develop a more powerful

and rich data model which can support complex data types.

Later on, Enterprise Resource Planning (ERP) and Management Resource Planning

(MRP) evolved. Both of these packeges identify a set of common tasks e.g. human

resource planning, inventory management etc. of a large organization and provide a

general application layer to carry out these tasks. Thenafter, DBMS get into the

revolutionary age of internet. Data stored in the DBMS can now be accessed with the

help of web browsers from any where and at any time. Stored data is being provided on

the web in the form of HTML and XML documents. So in 2000, the fashionable area for

innovation is XML database. XML databases aim to remove the traditional division

between documents and data, allowing an organization’s information resource to be held

in one place, whether they are highly structured or not. All the database vendors try to

develop more advanced DBMS which can support complex data like video, streaming

data, digital libraries on the web. Thus, the database system evolved from sequential file

access to the object oriented database system used in present scenerio.

1.4 FILE SYSTEM V/S DBMS

Initially, a computer system used by an enterprise mainly performs data processing tasks

i.e. to insert the information about employees, retrieve information about employees of

particular department, accounting functions on salary of employees etc. Since these

systems performed normal record keeping functions, they were called data processing

system. Thus data processing system is an automated system for processing data of an

organization. The conventional data processing approach is to develop a program (or

many programs) for each application. This results in one or more data files for each

application (fig. 1.1). Some of the data may be common between files. However, one

application may require the file to be arranged on a particular field, e.g. amount. A major

drawback of conventional method is that the storage and access techniques are built into

the program. Therefore, though the same data may be required by two applications, the

data will have to be stored in to different places because each application depends on the

way that the data is stored.

There are various drawbacks of the conventional data file processing environment. Some

of them are listed below.

(i) Data Redundancy

Some data elements like name, address, identification code, are used in various

applications. Since data is required by multiple applications, it is stored in multiple data

files. In most cases, there is a repetition of data files. This is referred to as data

redundancy, and it leads to various other problems.

(ii) Data Integrity Problem

Data redundancy is one reason for problems of data integrity. Since, the same data is

stored in different places, it is inevitable that some inconsistency will creep in. For

example if an instructor of Microprocessor Technology begins to take course in

Computer Architecture this needs to be reflected in more than one place (figure 1.1). If

the change is not made in all the places, the university will have different information in

different places about the same instructor.

(iii) Data Availability Constraints

When data is scattered in different files, the availability of information from combination

of files is constrained to some extent.

Figure 1.1 One to one correspondence between applications and data files

1.4.1 Advantage of DBMS over File System

File system stores data in the form of records and data which are files managed by

operating system and uses application program to extract information from the file.

A major advantage the database approach has over the conventional approach is that a

database system provides centralized control of data.

(i) Reduced Redundancy

Unlike conventional approach each application does not have to maintain its own data

files. Data can be integrated and used by multiple applications at the same time.

(ii) Ensure Consistency

It becomes very difficult to maintain consistent format of files in file system. Different

programmers can use different programming languages, which may cause duplication of

information in several files. This duplication results in higher storage and access cost. In

addition, it may lead to data inconsistency i.e. various copies of same information may

not agree. For example, consider an employee management system, if address of an

employee which is stored at two places and is updated at only one place then the system

will give conflicting information and become inconsistent. The DBMS can guarantee that

the database is never inconsistent, by providing a fix format of data and by ensuring that

a change made to any entry, automatically applies to the other entries as well. This

process is known as propagating updates.

(iii) Data Manipulation Capabilities

File system requires an application program for processing the data stored in files

according to needs of user. If the user needs get changed then a different application

program is required. For example consider the employes management system. Suppose

we want to find name of employees in “Jaipur” then either new application program is

developed or we have to find out the name of employee having city as Jaipur manually in

the case of files system. This method is not an efficient process as developing a new

application program takes a lot of time and it is possible that after development of

program, our needs changes from finding employees in Jaipur to find employees in

‘Malviya Nagar, Jaipur’. Database system can solve such problems by simply firing

queries to the database as needed and retreive answer in response.

(iv) Data Independence (Reduced Programming Efforts)

In non-database systems, the requirements of the application dictate the way in which the

data is stored, and the access techniques. Besides, the knowledge of the organization of

the data and the access techniques are built into the logic and code of the application.

These systems are data dependent. Consider this example, suppose the university

(mentioned previously) has an application that processes the student file. For

performance reasons, the file is indexed on the roll number. The application would be

aware of the existing index, and the internal structure of the application would be built

around this knowledge. Now, consider that for some reason, the file is to be indexed on

the registration date. In this case, it is impossible to change the structure of the stored data

without affecting the application too. Such an application is a data dependent one. It is

desirable to have data independent applications. Suppose two application X and Y need

to access the same file. However, both application require a particular field to be stored in

different formats. Application X requires the field “customer-balance” to be stored in

decimal format, while application Y requires it be stored in binary format. This would

pose a problem in the old systems. In a DBMS, differences may exist in the way that data

is actually stored, and the way that it is seen and used by a given application. To conform

to the changing requirements of the enterprise, the DataBase Administrator (DBA) may

need to change the storage structure or access techniques. The DBA should be able to do

this without having to modify the existing applications. If applications are data

dependent, programmer effort, that could otherwise be available for the creation of new

applications, would be necessary to modify existing applications to match the changes

made.

(v) Atomicity and Transaction Management

File system does not ensure completion of transaction and it may cause problem of data

inconsistency. For example, consider employee management system where company

wants to shift an employee from sales department to finance department. The procedure

for this transaction is to perform two operations, reduction in number of employees in

sales department and increament in the number of employee in finance department, but in

file system may combine of both operations can not be guaranteed as we can not make a

single unit of these two operations and if only one operation is performed and system

crashes then the database will become inconsistent. This problem can easily be solved by

database management system. It ensures completion of whole transaction which

combinses more than one operation or no operation will be performed on behalf of the

transaction. This property is called ‘atomicity’.

(vi) Security

File system does not provide any security to the data stored, as there are no authentication

rights provided to user for the file. Complete file is at expose of user. The DBA has to

guarantee that only authorized persons have access to the database. The DBA defines the

security checks to be carried out. Different checks can be applied to different operation

on the same data. For instance, a person may have the access rights to query on a file, but

may not have the rights to delete or update that file. The DBMS allows such security

checks to be established for each piece of data in the database.

(vii) Integrity

Inconsistency between two entries can lead to integrity problems. However, even if there

is no redundancy, the database can still be inconsistent. For example, a student may be

enrolled in 10 courses in a semester when the maximum number of courses, one can

enroll is 7. Another example could be that of a student enrolling in a course that is not

being offered that semester. Such problems can be avoided in a DBMS by establishing

certain integrity checks to be carried out whenever any update operation is done.

1.5 DISADVANTAGES OF DBMS

Inspite of many advantages, DBMS does not proves to be powerful or advantageous

system in certain scenarios due to following :

(i) Overhead for providing security, integration of data, transaction management,

concurrency control etc.

(ii) More investment is required for hardware and software.

(iii) Special training is required to use DBMS.

(iv) Its performance may not be adequate for certain specialized applications

(v) Many applications may need to manipulate the data in ways not supported

by the query language.

So, it is quite advantageous to use file system in certain situations, which are :

(i) Database and application are simple and not expected to change .

(ii) Concurrent access is not required.

(iii) Real time applications as time constraints are not easy to maintain with

DBMS.

1.6 DESCRIBING AND STORING DATA IN DBMS

DBMS is always concerned with some real world enterprise. Data stored in DBMS

describe real world entities and represent relationships between these entities. For

example, there are employees, departments and projects in a company and data in the

company database describe these entities, in terms of their attributes and relationship to

other entities. Data can be described through different data model and at different levels

of abstraction.

1.6.1 Data Abstraction

Data abstraction is one of the fundamental characteristic of any database management

system, which helps in making data more accurate and easy to use. Abstraction refers to

the act of representing essential features without including background details or

explanations. So, data abstraction refers to the act of representing data without giving

details that how data are stored or maintained. Data abstraction prevents irrelevant

information at a particular level. Complexity of data is hide through several levels of

abstraction so as to simplify user interaction with the system. Different levels of

abstraction are :

(i) Physical Level or Internal Level

It is the lowest level of abstraction which specifies storage details that how data are

actually stored on disks or on tapes. It specifies in the manner in which records are stored

either as the collection of pages or as the collection of records. Complex low level data

structures are described in detail at this level. The design of data structure described at

this level is called physical schema. The data structure at this level may include B trees,.

B+ trees, hashing etc.

(ii) Logical Level or Conceptual View

The next higher level of abstraction describes what data are stored in the database, and

what relationship exists among those data. There is only one conceptual schema per

database. This schema also contains the method of deriving the objects in the conceptual

view from the internal views. The description of data at this level is in a format

independent of its physical representation. It also includes features that specify the checks

to retain data consistency and integrity. The logical level of abstraction is used by

database administrators, who decide what information is to be kept in the database

Figure 1.2 Level of Data Abstraction

(iii) View Level

It is the highest level of abstraction which describes different views of the entire

database. These views are designed according to the requirements of user who wants to

access only a part of the database. A database may have several views, according to the

demand of individual user or the group of users. The data in these views are not exactly

stored in DBMS but they are computed using specification of view described by user. An

analogy to the concept of data types in programming language may clarify the distinction

among levels of abstraction. Most high-level programming languages support the notion

of a record type.

At physical level, a customer, account, employee record can be described as a block of

consecutive storage locations for example words or bytes. The language compiler hides

this level of details from the programmers. Similarly, the database system hides many of

the lowest level storage details from database programmers.

Customer_Loan Cust_ID : 101 Loan_No : 1011 Amount_in_Dollars : 8755.00

CREATE TABLE Customer_Loan ( Cust_ID NUMBER(4) Loan_No NUMBER(4) Amount_in_Dollars NUMBER(7,2))

Cust_ID TYPE = BYTE (4), OFFSET = 0 Loan_No TYPE = BYTE (4), OFFSET = 4 Amount_in_Dollars TYPE = BYTE (7), OFFSET = 8

External

Conceptual

Internal

At logical level, each such record is described by a type definition and the

interrelationship among these record type is defined. Programmers using a programming

language work at this level of abstraction. Similarly, database administrators usually

work at this level of abstraction.

Finally, at the view level, computer users see a set of application programs that hide

details of the data type. Similarly, at the view level, several views of the database are

defined, and database users see these views. In addition to hiding details of the logical

level of the database, the views also provide a security mechanism to prevent users from

accessing parts of the database.

Diagram: Example of three levels

Figure 1.3 Database Abstraction

1.6.2 Instances and Schemas

The collection of information stored in the database at a particular moment is called an

instance of the database.The overall design of the database is called the database

schema. Schemas are changed infrequently. Database systems have several schemas,

partitioned according to the levels of abstraction. At the lowest level is the physical

schema, at the intermediate level is the logical schema and at the highest level is a

subschema. In general database system supports one physical schema, one logical

schema, and several subschemas.

1.7 DATA INDEPENDENCE

Three levels of abstraction, along with the mappings from internal to conceptual and from

conceptual to external, provides two distinct levels of data independence: Logical and

physical data independence.

1.7.1 Logical Data Independence

Indicates that conceptual/logical schema can be changed without affecting the existing

external(view) schemas. The change would be absorbed by the mapping between the

external and conceptual(logical) levels. Consider a change in the conceptual view such as

merging two records into one or adding fields to an existing record. This would require a

change in the mapping from the external view to the conceptual view so as to leave the

external view unchanged. Some changes such as the deletion of a conceptual view field

or record, may require changes in the external view and application program.

1.7.2 Physical Data Independence

Indicates that the physical storage structures or devices used for storing the data could be

changed without necessitating a change in the conceptual view or any of the external

views. The change would be absorbed by the mapping between the conceptual and

internal levels. Modifications at physical level are occasionally to improve performance

1.8 DATABASE LANGUAGES

Most of the database management systems provide specialized languages called database

languages, to interact with database or to get some job done from the database.

Commands of these languages provides facility to the user to operate and manage the

database efficiently. Importance of database to the user is dependent on the ease with

which information can be obtained from it and one of the biggest reason of popularity of

relational database management system is that it allows a rich class of questions to ask

from the database in an easy manner. Consider our sample Employee database, a user

may ask :

(i) Who is getting highest salary ?

(ii). How many employees are working under Mr. Ramesh ?

(iii) In which department employee’s strength is highest ?

Such questions which involve the data stored in a database are called queries and

database management systems provides a specialized language called query language to

ask queries from database.

A DBMS provides facility of data manipulation like retrieval of data, insertion of data,

modification of existing data and deletion of data etc. Such facilities are given by Data

Manipulation Language (DML) commands. Query language is only one part of DML.

DBMS also supports some commands that can make changes in the structure of the

database i.e. schema, such commands are known as Data Definition Language (DDL)

commands. The DML and DDL are collectively known as data sublanguages, when

embedded within a host language. DDL make changes in meta data (data about data)

stored in data dictionary and DML retrieve data from a database (query) or make changes

in an instance of the database.

Relational model supports a powerful mathematical logic based language, called

relational calculus which is a nonprocedural language for defining query solutions.

Queries in this language have precise meaning.

There are two types of relational calculus.

(i) Tuple Relational Calculus

(ii) Domain Relational Calculus

Relation algebra is another formal, procedural query language, based on a collection of

operators for manipulating relations.

1.9 TRANSACTION MANAGEMENT:

A transaction is a sequence of one or more SQL statements that together forms a logical

unit of work. Each statement in the transaction performs a part of the task, but all of them

are required to complete the task. All the statements forming a transaction must be

executed for the database to be in a consistent state. A transaction occurs when the

database is modified. Here is an example of a typical transaction. A customer orders a

product. The order processing program will:

(i) Query the table containing product details to see if the product is in stock.

(ii) Insert the order details into a table that holds the order details.

(iii) Update the table containing product details to reduce the quantity on hand by the

quantity order.

These three actions, in the sequence shown above, form a single logical transaction. The

concept of transaction processing is critical for programs that update a database because

it ensures the integrity of the database.

As a rule, the statements in a transaction are executed as a single unit of work in the

database. Either all the statement will be executed successfully, or none of the statements

will be executed.

The DBMS is responsible for ensuring the consistent state of the database even in the

case of the application program aborting in the middle of the transaction, or a hardware

failure in the middle of a transaction. Consider the above example of a transaction. What

do you suppose will happen if the order processing program aborted after step(ii). The

database would reflect a partial transaction. It would be in an inconsistent state. To

maintain consistency, the DBMS undoes all the changes made if a transaction is not

completed. Thus, a DBMS guarantees that if a transaction executes some updates, and

then for whatever reasons, a failure occurs before the transaction reaches its normal

termination, then those updates will be undone.

To ensure integrity of the data, we require that the database system maintains the

following properties of the transactions:

(i) Atomicity

Either operations of the transactions are reflected properly in the database or none are.

The basic idea behind ensuring atomicity is as follows. The database system keeps track

of the old values of any data on which a transaction performs a write, and, if the

transaction does not complete its execution the old values are restored to make it appear

as though the transaction never executed. It is handled by a component called the

Transaction Manager.

(ii) Consistency

If the database is consistent before an execution of the transaction, the database remains

consistent after the execution of the transaction. Execution of a transaction in isolation

preserves the consistency of the database. Ensuring consistency for an individual

transaction is the responsibility of the application programmer who codes the transaction.

(iii) Durability

The durability property guarantees that, once a transaction completes successfully, all the

updates that it carried out on the database persist, even if there is a system failure after the

transaction completes execution. We can guarantee durability by ensuring that either

(a) The updates carried out by the transaction have been written to disk before the

transaction completes

(b) Information about the updates carried out by the transaction and written to disk is

sufficient to enable the database to reconstruct the updates when the database

system is restarted after the failure. Ensuring durability is the responsibility of a

component of the database system called the Recovery Manager.

(iv) Isolation

Even if the consistency and atomicity properties are ensured for each transaction, if

several transactions are executed concurrently, their operations may interleave in some

undesirable way, resulting in an inconsistent state. The isolation property of a transaction

ensures that the concurrent execution of transaction results in a system state that is

equivalent to a state that could have been obtained had these transactions executed one at

a time in some order.

1.10 STORAGE MANAGEMENT

Main memory of a computer can not be used to store a medium size database which may

be in gigabytes or even in terabytes therefore data are stored in disks and moved between

disk and main memory. Database management system should structure the data in such a

way that movement of data between main memory and disk may be reduced up to the

lowest level possible. Minimizing data movement will improve data access speed because

data movement to and from the disk is slow relative to the speed of central processing

unit.

The goal of database management system is to simplify and facilitate access to data. The

information that how the data are stored on physical media is hidden from the user. The

performance of system regarding data access speed can be measured in terms of response

time i.e. time elapsed between submission of command and data to the system and getting

the result of computation. Response time depends on the efficiency of the data structures

used to represent these data in the database and on how efficiently the system is able to

operate on these data structures. The storage manager component of database

management system is responsible for storing, retrieving and updating data in the

database. It provides an interface between the low–level data stored in the database and

the application programs and queries submitted to the system, it also translates the

various DML statements into low–level file system commands.

1.11 DATABASE USERS

Different people can use the database in a different manner. The interaction between user

and the database may be of several types according to the user. We can have four types of

users differentiated by the way they expect to interact with the DBMS :

1.11.1 Database Administrator

Databases of an enterprise are typically important enough and complex enough that the

task of designing and maintaining it requires a professional, called the DataBase

Administrator (DBA). DBA has central control over the system. The database

administrator is responsible for following functions :

(i) Schema Design and Maintenance

The DBA creates the database schema after interacting with the users of the system

and analyzing what data are to be stored in the database. He or she writes a set of

definitions that are translated by DDL compiler into a set of tables and are stored the in

data dictionary.

(ii) Physical Schema and Organisation Modification

DBA is responsible for designing physical schema and deciding how the data will be

stored on the physical media. DBA defines storage structure and access methods by

writing a set of definitions.

(iii) Authorization and Security

DBA is responsible for ensuring that unauthorized data access is not permitted. The

DBA also decides which parts of the database, various users can access and in which

mode (read, write or both). Database administrator grants different types of

authorizations to different users. For example a clerk may be authorized to view salaries

of different employees but he may not be authorized to update salaries. This authority can

be given to the accounts officer only. So different access rights are given to different

usess according to requirement.

(iv) Integrity Constraint Specification

Data stored in the database must satisfy certain constraints for example in an employee

database, Employee No. must be unique for each employee and Address value must not

be left blank. These constraints are called integrity constraints and it is the responsibility

of DBA to identify all such constraints and apply them to the database.

(v) Recovery From Failure

The DBA is responsible for taking necessary steps required for restoring the database if

the system fails DBA should keep backups of the database time to time and maintains

logs of system activities so that recovery may become possible.

(vi) Database Upgradation

DBA always tries to know, understand and analyze, changing requirements of user and

make upgradations in the database accordingly.

1.11.2 Application Programmers

They are computer professionals and write application programs. They embedded DML

call in the host language program. These DML calls are converted in the host language

normal procedure call by DML precompiler. The resulting program is then run through

the host language compiler, which generates appropriate object code.

1.11.3 Sophisticated Users

They work like an analyst and submit queries to a giving process or directly to the

DBMS, which breaks down the query into instructions that the storage manager

understands.

1.11.4 Specialized Users

Speciallized users write specialized database applications like computer aided design

systems, knowledgebase and expert systems etc. Such systems are different from

traditional data processing framework and uses complex data types.

1.11.5 Naive Users

They use the DBMS only by interacting through application programs written previously.

For example the clerk at ticket booking window, he uses an application program to do his

job of making reservations for a passenger.

1.12 DATABASE STRUCTURE

A database system is divided in different components. The exact architecture of a DBMS

depends on the operating system on which it has to work because the operating system

will provide the basic services, which database system uses base for accomplishing it’s

tasks.

Database system can be divided in to two broad parts.

(i) Query Processor Components

(ii) Storage Manager Components

1.12.1 Query Processor Components

These components are used in evaluating DDL and DML queries and includes following

components.

(i) DML Compiler

It first attempts to transform user’s request into an equivalent but more efficient form and

then translates that into a set of low level instructions that can be used by query

evaluation engine.

(ii) Embedded DML Pre–Compiler

It converts DML statements embedded in an application program to normal procedure

calls in host language. This pre compiler consults DML complier to generate the

appropriate code

(iii) DDL Interpreter

It makes data dictionary, which contains metadata.

(iv) Query Evaluation Engine

It executes low level instructions generated by the DML complier.

1.12.2 Storage Manager Components

It provides the interface between the low level data stored in the database and the

application programs and queries submitted to the system. These components include :

(i) File Manager

It manages disk space allocation and the data structures used to store the data

(ii) Buffer Manager

It is responsible for fetching data from disk to main memory and decides the caching

strategy suitable for the application.

(iii) Transaction Manager

It ensures consistency of the database after any transaction performed on it.

(iv) Authorization and Integrity Manager

It tests for satisfaction of integrity constraints and checks authority of users to access

data. It uses all the integrity constraints and authorization rules specified by the DBA.

1.12.3 Data Structures Used by DBMS

A DBMS can use several kind of data structures as a part of physical system

implementation. Each structure has it’s own importance. Following are some common

data structures.

(i) Data Files

It stores the database itself on the disk.

(ii) Data Dictionary

Information pertaining to the structure and usage of data contained in the database, the

metadata, is maintained in a data dictionary. The data dictionary is a database itself,

documents the data. Each database user can consult the data dictionary to learn what each

piece of data and the various synonyms of the data fields mean. In a system where the

data dictionary is part of the DBMS(Integrated system) the data dictionary stores

information concerning the source of each data-field value, the frequency of its use, and

an audit trail concerning updates, including the who and when of each update.

Currently data dictionary systems are available as add-ons to the DBMS.

The data dictionary stores:

• Names of relations

• Names of the attributes of each relation

• Domains and lengths of attributes

• Names of views defined on the database, and definitions of those views

• Names of authorized users

• Accounting information about users

• Number of tuples in each relation

• Method of storage used for each relation

• Name of the index

• Name of the relation being indexed

• Attributes on which the index is defined

• Type of index formed

(iii) Indices

These are used to provide fast access to data items that hold particular values.

(iv) Statistical Data

It stores statistical information about the data stored in the database, like number of

records, blocks etc. in a table. This information can be used to execute a query efficiently.

Figure 1.4 Diagram of Database Structure

SUMMARY

• Data are raw facts or figures.

• Database is a collection of interrelated, shared, and controlled data, typically describing

activities and functions of an organization.

• DataBase Management System is a software designed to manage all the operations of a

database like insertion, modification, deletion, sorting of data and creation, alteration,

removal of database structure etc.

• Data model is a conceptual method of structuring data.

• Data in the database is abstracted in three levels physical, logical and view level.

• Schema is the overall design of the database.

• Instance is the snapshot of the database or the data stored in the database at a particular

point of time.

• Logical data independence means no need of changes in the logical structure of data

when there are changes at view level.

• Physical data independence means no need of changes in the physical storage details

when there are changes at logical level.

• Data manipulation language (DML) is used to retrieve and manipulate data.

• Data definition language (DDL) is used to specify database schema.

• Transaction is a program unit or set of operations that represents a logical unit of tasks.

• Atomicity, consistency, isolation and durability are four properties that all the

transactions should hold. These properties are known as ACID properties.

• Atomicity ensures that all actions associated with a transaction are executed to

completion, or none is performed.

• Transaction must not leave the database in inconsistent state.

• Concurrent transactions must be executed in serializable manner. Isolation property of

transaction ensures it.

• Durability ensures persistence of successfully completed transactions.

• Storage manager is a program module that provides the interface between the low level

data stored in the database and the application programs and queries submitted to the

system.

• Transaction manager ensures consistency of the database even after system failure.

• Database administrator is a manager whose responsibilities are focused on technical

aspects of the system.

• Query processor components and storage manager components are two broad parts of

database system.

• Query processor components are used in evaluating DDL and DML queries.

• Storage manager components provide the interface between low level data stored in

database and application program and queries submitted to the system.

REVIEW QUESTIONS

Q.1 Why would you choose a database system instead of simply storing data in operating

system files ? When would it make sense not to use database system ?

Q.2 Explain the differences between internal, external and conceptual schemas. How are

these different schema layers related to the concept of logical and physical data

independence ?

Q.3 Explain the role of different users of database ?

Q.4 What is a transaction ? Explain ACID properties of transaction ?

Q.5 What are log files ? How they are helpful in recovery.

Q.6 Explain roles of following for representing information about real world in database ?

(a) Data Definition Language

(b) Data Manipulation Language

(c) Buffer Manager

(d) Data Model

Q.7 Explain different database languages available.

Q.8 Explain transaction management in DBMS in detail.

Q.9 Discuss layered structure of DBMS.

Q.10 Describe main characteristics of the database approach and contrast it with file based

approach. [Main/Back Exam 2004]

Q.11 Describe components of DBMS environment and discuss how they are related to each

other. [Main/Back Exam 2004]

Q.12 What is DBMS ? How are data stored in DBMS ? Explain.

[Main/Back Exam 2003]

Q.13 What is transaction ? How is transaction managed in DBMS ?


Q.14 What are the advantages and disadvantages of using database system ?

Q.15 What is the difference between logical data independence and physical data

independence ? Which is easier to accomplish ? and Why ?


Q.16 Give overall system structure of DBMS and explain function of each component.

[1995]

1. Introduction to DBMS - Amizone · 1. Introduction to DBMS ... Data Redundancy ... addition, it may lead to data inconsistency i.e. various copies of same information may

Documents