1. Introduction to DBMS 1.1 INTRODUCTION DataBase Management System has evolved from a specialized computer application to a central component of computing environment. Database system plays a vital role in organizing data about a particular enterprise. Consider an example of a company which stores data about following : Employees (Employee No., Name, Address, Salary). Departments (Department No., Name, Location) Project (Name, Project No., Department No., Location) Which may have following relations : An Employee works in a Departments. An Employee works on many Projects. A Department handles many Project. Therefore, a system is needed which can effectively organize the data and also use it to analyze and guide operations of the company. Now–a–days, the amount of information to be stored is increasing tremendously and thus the need of flexible and powerful system is also increasing day–by–day which has the ability not only to effectively organize or maintain large collection of data but also provides easy access to data. 1.2 CONCEPT OF DBMS This section covers basic definitions related to DBMS and also explain various components of DBMS. Data Base Management System consists of two components : (i) Database i.e. collection of data (ii) System or set of programs which are used to access and manage the database. By incorporating these two components, DBMS organize the information, maintain it and retrieve it efficiently as and when required. So to use and understand the system, as well maintained database and set of programs are needed. 1.2.1 Concept of Data and Database The word Data is derived from a Latin word which means ‘to give’ thus data are given facts from which additional facts can be inferred. Data, are facts or undoubted information used for different computations or calculations. For example – the facts related to an employee in a company like his Employee No., Name, Salary, Designation etc. are data but when these data are retrieved or processed to find answers of questions like : What is Employee No.of employee whose salary is more than 10,000 ? What is name of employee whose Employee No. is 24 ? Then it becomes information. Thus information is a processed form of data and database is a logicially coherent collection of data, not the information, with same meaning. A database, is a collection of interrelated data which represents some aspects of real world. Database has some inherent meaning and is related to a particular group of users or applications. For example – Database of a college, may contain data about students, faculties, courses etc. which are related to each other with certain relations like – faculty
16
Embed
1. Introduction to DBMS - Amizone · 1. Introduction to DBMS ... Data Redundancy ... addition, it may lead to data inconsistency i.e. various copies of same information may
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1. Introduction to DBMS 1.1 INTRODUCTION
DataBase Management System has evolved from a specialized computer application to a
central component of computing environment. Database system plays a vital role in
organizing data about a particular enterprise. Consider an example of a company which
stores data about following :
Employees (Employee No., Name, Address, Salary).
Departments (Department No., Name, Location)
Project (Name, Project No., Department No., Location)
Which may have following relations :
An Employee works in a Departments.
An Employee works on many Projects.
A Department handles many Project.
Therefore, a system is needed which can effectively organize the data and also use it to
analyze and guide operations of the company.
Now–a–days, the amount of information to be stored is increasing tremendously and thus
the need of flexible and powerful system is also increasing day–by–day which has the
ability not only to effectively organize or maintain large collection of data but also
provides easy access to data.
1.2 CONCEPT OF DBMS
This section covers basic definitions related to DBMS and also explain various
components of DBMS. Data Base Management System consists of two components :
(i) Database i.e. collection of data
(ii) System or set of programs which are used to access and manage the database.
By incorporating these two components, DBMS organize the information, maintain it and
retrieve it efficiently as and when required. So to use and understand the system, as well
maintained database and set of programs are needed.
1.2.1 Concept of Data and Database
The word Data is derived from a Latin word which means ‘to give’ thus data are given
facts from which additional facts can be inferred. Data, are facts or undoubted
information used for different computations or calculations. For example – the facts
related to an employee in a company like his Employee No., Name, Salary, Designation
etc. are data but when these data are retrieved or processed to find answers of questions
like :
What is Employee No.of employee whose salary is more than 10,000 ?
What is name of employee whose Employee No. is 24 ?
Then it becomes information. Thus information is a processed form of data and database
is a logicially coherent collection of data, not the information, with same meaning.
A database, is a collection of interrelated data which represents some aspects of real
world. Database has some inherent meaning and is related to a particular group of users
or applications. For example – Database of a college, may contain data about students,
faculties, courses etc. which are related to each other with certain relations like – faculty
teaches students, students are enrolled in courses etc. Thus, we can say that database
contains the data, related to a real world enterprise, and is designed, built and populated
with data for specific applications related to the enterprise.
1.2.2 Definition of DBMS
A database management system (DBMS) is essentially a collection of interrelated data
and a set of programs to access this data. This collection of data is usually called the
database. Database systems are designed to maintain large volumes of data. Management
of data involves :
• Defining the structures for the storage of data.
• Providing the mechanisms for the security of data against unauthorized access.
The primary objective of a DBMS is to provide an environment that is convenient
and efficient to use, in retrieving information from and storing information into the
database.
The user of the DBMS is provided the following facilities among others:
• Adding empty files to the database.
• Inserting new data into the existing files.
• Retrieving data from the files.
• Updating data in the files.
• Deleting data from the files.
• Removing existing files from the database.
Therefore, DBMS can be used for different purposes besides data storage which are as
follows :
(i) Efficient access to data.
(ii) For avoiding data redundancy and inconsistency.
(iii) For providing security of data.
(iv) For enforcing different integrity constraints.
(v) For providing access for data to multiple users concurrently.
1.3 HISTORY OF DBMS
Most of the software applications focus on the manipulation of the data from the starting
days of computer. So, there is a need arises for a system that helps in storing and
manipulating the data. The first general purpose DBMS was designed by Charles
Bachman in early 1960’s and called as Integrated Data Store. It founded the basis for the
network data model and influenced database system through 1960’s. In 1966, IBM
released the first commercially available DBMS called IMS (Information Management
System) which based on the hierarchial data model and assumes all data relationship to
be structured as hierarchies. Conference On Data Systems Languages (CODASYL) set
standards for network database product in 1969.
Dr. E. F. Codd, an IBM researcher, proposed relational data model in theoritical paper in
1970. The publication of Codd’s paper in early seventies set off a flurry of activities in
both research and commercial system developments communites and they worked to
bring out a relational DBMS. IBM developed a relational model prototype in 1976. In
1980’s, the relational model was developed as a standard approach for DBMS. SQL is
developed as a part of IBM’s system R project which becomes a standard query
language. So IBM released first commercially available Database product based on
relational model SQL/DS for interactive operating system in 1981. IBM produced DB2
for its mairyraness with batch operating system in 1982. SQL was standardized and was
adopted as a query language by ANSI and ISO. Many developments were being done in
1980’s and 1990’s in the area of database system, which include the release of Paradox,
DBase, Foxpro and Access. Different researches worked out to develop a more powerful
and rich data model which can support complex data types.
Later on, Enterprise Resource Planning (ERP) and Management Resource Planning
(MRP) evolved. Both of these packeges identify a set of common tasks e.g. human
resource planning, inventory management etc. of a large organization and provide a
general application layer to carry out these tasks. Thenafter, DBMS get into the
revolutionary age of internet. Data stored in the DBMS can now be accessed with the
help of web browsers from any where and at any time. Stored data is being provided on
the web in the form of HTML and XML documents. So in 2000, the fashionable area for
innovation is XML database. XML databases aim to remove the traditional division
between documents and data, allowing an organization’s information resource to be held
in one place, whether they are highly structured or not. All the database vendors try to
develop more advanced DBMS which can support complex data like video, streaming
data, digital libraries on the web. Thus, the database system evolved from sequential file
access to the object oriented database system used in present scenerio.
1.4 FILE SYSTEM V/S DBMS
Initially, a computer system used by an enterprise mainly performs data processing tasks
i.e. to insert the information about employees, retrieve information about employees of
particular department, accounting functions on salary of employees etc. Since these
systems performed normal record keeping functions, they were called data processing
system. Thus data processing system is an automated system for processing data of an
organization. The conventional data processing approach is to develop a program (or
many programs) for each application. This results in one or more data files for each
application (fig. 1.1). Some of the data may be common between files. However, one
application may require the file to be arranged on a particular field, e.g. amount. A major
drawback of conventional method is that the storage and access techniques are built into
the program. Therefore, though the same data may be required by two applications, the
data will have to be stored in to different places because each application depends on the
way that the data is stored.
There are various drawbacks of the conventional data file processing environment. Some
of them are listed below.
(i) Data Redundancy
Some data elements like name, address, identification code, are used in various
applications. Since data is required by multiple applications, it is stored in multiple data
files. In most cases, there is a repetition of data files. This is referred to as data
redundancy, and it leads to various other problems.
(ii) Data Integrity Problem
Data redundancy is one reason for problems of data integrity. Since, the same data is
stored in different places, it is inevitable that some inconsistency will creep in. For
example if an instructor of Microprocessor Technology begins to take course in
Computer Architecture this needs to be reflected in more than one place (figure 1.1). If
the change is not made in all the places, the university will have different information in
different places about the same instructor.
(iii) Data Availability Constraints
When data is scattered in different files, the availability of information from combination
of files is constrained to some extent.
Figure 1.1 One to one correspondence between applications and data files
1.4.1 Advantage of DBMS over File System
File system stores data in the form of records and data which are files managed by
operating system and uses application program to extract information from the file.
A major advantage the database approach has over the conventional approach is that a
database system provides centralized control of data.
(i) Reduced Redundancy
Unlike conventional approach each application does not have to maintain its own data
files. Data can be integrated and used by multiple applications at the same time.
(ii) Ensure Consistency
It becomes very difficult to maintain consistent format of files in file system. Different
programmers can use different programming languages, which may cause duplication of
information in several files. This duplication results in higher storage and access cost. In
addition, it may lead to data inconsistency i.e. various copies of same information may
not agree. For example, consider an employee management system, if address of an
employee which is stored at two places and is updated at only one place then the system
will give conflicting information and become inconsistent. The DBMS can guarantee that
the database is never inconsistent, by providing a fix format of data and by ensuring that
a change made to any entry, automatically applies to the other entries as well. This
process is known as propagating updates.
(iii) Data Manipulation Capabilities
File system requires an application program for processing the data stored in files
according to needs of user. If the user needs get changed then a different application
program is required. For example consider the employes management system. Suppose
we want to find name of employees in “Jaipur” then either new application program is
developed or we have to find out the name of employee having city as Jaipur manually in
the case of files system. This method is not an efficient process as developing a new
application program takes a lot of time and it is possible that after development of
program, our needs changes from finding employees in Jaipur to find employees in
‘Malviya Nagar, Jaipur’. Database system can solve such problems by simply firing
queries to the database as needed and retreive answer in response.
(iv) Data Independence (Reduced Programming Efforts)
In non-database systems, the requirements of the application dictate the way in which the
data is stored, and the access techniques. Besides, the knowledge of the organization of
the data and the access techniques are built into the logic and code of the application.
These systems are data dependent. Consider this example, suppose the university
(mentioned previously) has an application that processes the student file. For
performance reasons, the file is indexed on the roll number. The application would be
aware of the existing index, and the internal structure of the application would be built
around this knowledge. Now, consider that for some reason, the file is to be indexed on
the registration date. In this case, it is impossible to change the structure of the stored data
without affecting the application too. Such an application is a data dependent one. It is
desirable to have data independent applications. Suppose two application X and Y need
to access the same file. However, both application require a particular field to be stored in
different formats. Application X requires the field “customer-balance” to be stored in
decimal format, while application Y requires it be stored in binary format. This would
pose a problem in the old systems. In a DBMS, differences may exist in the way that data
is actually stored, and the way that it is seen and used by a given application. To conform
to the changing requirements of the enterprise, the DataBase Administrator (DBA) may
need to change the storage structure or access techniques. The DBA should be able to do
this without having to modify the existing applications. If applications are data
dependent, programmer effort, that could otherwise be available for the creation of new
applications, would be necessary to modify existing applications to match the changes
made.
(v) Atomicity and Transaction Management
File system does not ensure completion of transaction and it may cause problem of data
inconsistency. For example, consider employee management system where company
wants to shift an employee from sales department to finance department. The procedure
for this transaction is to perform two operations, reduction in number of employees in
sales department and increament in the number of employee in finance department, but in
file system may combine of both operations can not be guaranteed as we can not make a
single unit of these two operations and if only one operation is performed and system
crashes then the database will become inconsistent. This problem can easily be solved by
database management system. It ensures completion of whole transaction which
combinses more than one operation or no operation will be performed on behalf of the
transaction. This property is called ‘atomicity’.
(vi) Security
File system does not provide any security to the data stored, as there are no authentication
rights provided to user for the file. Complete file is at expose of user. The DBA has to
guarantee that only authorized persons have access to the database. The DBA defines the
security checks to be carried out. Different checks can be applied to different operation
on the same data. For instance, a person may have the access rights to query on a file, but
may not have the rights to delete or update that file. The DBMS allows such security
checks to be established for each piece of data in the database.
(vii) Integrity
Inconsistency between two entries can lead to integrity problems. However, even if there
is no redundancy, the database can still be inconsistent. For example, a student may be
enrolled in 10 courses in a semester when the maximum number of courses, one can
enroll is 7. Another example could be that of a student enrolling in a course that is not
being offered that semester. Such problems can be avoided in a DBMS by establishing
certain integrity checks to be carried out whenever any update operation is done.
1.5 DISADVANTAGES OF DBMS
Inspite of many advantages, DBMS does not proves to be powerful or advantageous
system in certain scenarios due to following :
(i) Overhead for providing security, integration of data, transaction management,
concurrency control etc.
(ii) More investment is required for hardware and software.
(iii) Special training is required to use DBMS.
(iv) Its performance may not be adequate for certain specialized applications
(v) Many applications may need to manipulate the data in ways not supported
by the query language.
So, it is quite advantageous to use file system in certain situations, which are :
(i) Database and application are simple and not expected to change .
(ii) Concurrent access is not required.
(iii) Real time applications as time constraints are not easy to maintain with
DBMS.
1.6 DESCRIBING AND STORING DATA IN DBMS
DBMS is always concerned with some real world enterprise. Data stored in DBMS
describe real world entities and represent relationships between these entities. For
example, there are employees, departments and projects in a company and data in the
company database describe these entities, in terms of their attributes and relationship to
other entities. Data can be described through different data model and at different levels
of abstraction.
1.6.1 Data Abstraction
Data abstraction is one of the fundamental characteristic of any database management
system, which helps in making data more accurate and easy to use. Abstraction refers to
the act of representing essential features without including background details or
explanations. So, data abstraction refers to the act of representing data without giving
details that how data are stored or maintained. Data abstraction prevents irrelevant
information at a particular level. Complexity of data is hide through several levels of
abstraction so as to simplify user interaction with the system. Different levels of
abstraction are :
(i) Physical Level or Internal Level
It is the lowest level of abstraction which specifies storage details that how data are
actually stored on disks or on tapes. It specifies in the manner in which records are stored
either as the collection of pages or as the collection of records. Complex low level data
structures are described in detail at this level. The design of data structure described at
this level is called physical schema. The data structure at this level may include B trees,.
B+ trees, hashing etc.
(ii) Logical Level or Conceptual View
The next higher level of abstraction describes what data are stored in the database, and
what relationship exists among those data. There is only one conceptual schema per
database. This schema also contains the method of deriving the objects in the conceptual
view from the internal views. The description of data at this level is in a format
independent of its physical representation. It also includes features that specify the checks
to retain data consistency and integrity. The logical level of abstraction is used by
database administrators, who decide what information is to be kept in the database
Figure 1.2 Level of Data Abstraction
(iii) View Level
It is the highest level of abstraction which describes different views of the entire
database. These views are designed according to the requirements of user who wants to
access only a part of the database. A database may have several views, according to the
demand of individual user or the group of users. The data in these views are not exactly
stored in DBMS but they are computed using specification of view described by user. An
analogy to the concept of data types in programming language may clarify the distinction
among levels of abstraction. Most high-level programming languages support the notion
of a record type.
At physical level, a customer, account, employee record can be described as a block of
consecutive storage locations for example words or bytes. The language compiler hides
this level of details from the programmers. Similarly, the database system hides many of
the lowest level storage details from database programmers.