-
33 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
DATA STORAGE, RETRIEVAL AND DATA BASE MANAGEMENT SYSTEMS
Data
Data are raw facts or observations or assumptions or occurrence
about physical
phenomenon or business transaction.
They are objective measurement of attributes of entities like
people, place, things and
events.
Data is a collection of facts, which is unorganized but can be
organized into useful
information.
Data should be accurate but need not be relevant, timely or
concise.
It can exist in different forms e.g. picture, text, sound or all
of these together.
CONCEPTS RELATED TO DATA
Double Precision: Real data values are commonly called single
precision data because each
real constant is stored in a single memory location. This
usually gives seven significant digits
for each real value. In many calculations, particularly those
involving iteration or long
sequences of calculations, single precision is not adequate to
express the precision required.
To overcome this limitation, many programming languages provide
the double precision data
type. Each double precision is stored in two memory locations,
thus providing twice as many
significant digits.
Logical Data Type: Use the Logical data type when you want an
efficient way to store data that
has only two values. Logical data is stored as true (.T.) or
false (.F.)
Characters: Choose the Character data type when you want to
include letters, numbers,
spaces, symbols, and punctuation. Character fields or variables
store text information such as
names, addresses, and numbers that are not used in mathematical
calculations. For example,
phone numbers or zip codes, though they include mostly numbers,
are actually best used as
Character values.
Strings: A data type consisting of a sequence of contiguous
characters that represent the
characters themselves rather than their numeric values. A String
can include letters, numbers,
spaces, and punctuation. The String data type can store
fixed-length strings ranging in length
from 0 to approximately 63K characters and dynamic strings
ranging in length from 0 to
approximately 2 billion characters. The dollar sign ($)
type-declaration character represents a
String.
Variable is something that may change in value. E.g. - No. Of
words in different pages of a
book.
-
34 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
KEY: relational means of specifying uniqueness. A database key
is an attribute utilized to sort
and/or identify data in some manner. Each table has a primary
key which uniquely identifies
records. Foreign keys are utilized to cross-reference data
between relational tables.
The primary key of a relational table uniquely identifies each
record in the table. It can either
be a normal attribute that is guaranteed to be unique (such as
Social Security Number in a
table with no more than one record per person) or it can be
generated by the DBMS (such as a
globally unique identifier, or GUID, in Microsoft SQL Server).
Primary keys may consist of a
single attribute or multiple attributes in combination.
Examples:
Imagine we have a STUDENTS table that contains a record for each
student at a university. The student's unique student ID number
would be a good choice for a primary key in the STUDENTS table. The
student's first and last name would not be a good choice, as there
is always the chance that more than one student might have the same
name.
A candidate key is a combination of attributes that can be
uniquely used to identify a
database record without any extraneous data. Each table may have
one or more candidate
keys. One of these candidate keys is selected as the table
primary key.
Referential integrity: A feature provided by relational database
management systems
(RDBMS's) that prevents users or applications from entering
inconsistent data. Most RDBMS's
have various referential integrity rules that you can apply when
you create a relationship
between two tables.
For example, suppose Table B has a foreign key that points to a
field in Table A. Referential integrity would prevent you from
adding a record to Table B that cannot be linked to Table A. In
addition, the referential integrity rules might also specify that
whenever you delete a record from Table A, any records in Table B
that are linked to the deleted record will also be deleted. This is
called cascading delete. Finally, the referential integrity rules
could specify that whenever you modify the value of a linked field
in Table A, all records in Table B that are linked to it will also
be modified accordingly. This is called cascading update. Consider
the situation where we have two tables: Employees and Managers. The
Employees table has a foreign key attribute entitled Managed By
which points to the record for that employees manager in the
Managers table. Referential integrity enforces the following three
rules: 1. We may not add a record to the Employees table unless the
Managed By attributes points
to a valid record in the Managers table. 2. If the primary key
for a record in the Managers table changes, all corresponding
records in
the Employees table must be modified using a cascading update.
3. If a record in the Managers table is deleted, all corresponding
records in the Employees
table must be deleted using a cascading delete.
Alternate Key: The alternate keys of any table are simply those
candidate keys which are not
currently selected as the primary key. An alternate key is a
function of all candidate keys
minus the primary key.
-
35 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
Secondary Key: Secondary keys can be defined for each table to
optimize the data access.
They can refer to any column combination and they help to
prevent sequential scans over the
table. Like the primary key, the secondary key can consist of
multiple columns. A candidate
key which is not selected as a primary key is known as Secondary
Key.
Index Fields: are used to store relevant information along with
a document.
Currency FieldsThe currency field accepts data in dollar form by
default.
Date Fields The date field accepts data entered in date
format.
Integer Fields The integer field accepts data as a whole
number.
Text Fields The text field accepts data as an alpha-numeric text
string.
Information
It is the data that has been converted into a meaningful and
useful context for specific end
users.
To obtain information data form is aggregated, manipulated and
organized, its content
analysed and evaluated and placed in proper context for human
use.
Information exists as reports, in a systematic textual format or
as graphics in an organized
manner.
Information must be relevant, timely, accurate, concise and
complete and should apply to
the current situation.
It should be condensed into useable length.
Data storage hierarchy
Character: It is the basic building block of data which consists
of letters,
numeric digits or special characters. These are put together in
a FIELD.
Field: It is a meaningful collection of related characters. It
is the smallest logical
data entity that is treated as a single unit in data processing.
For example, If we
are processing employees data of a company, we may have
1. Employee code field
2. Employee name field
3. An hours worked field
4. Hourly pay rate field
5. Tax rate deduction field.
-
36 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
Record: Fields are grouped together to form a record. An
employee
record would be a collection of fields of one employee.
Record can be divided into Physical and Logical Record
Basis Physical Record Logical Record
Meaning A physical record refers to the
actual portion of a medium on
which data is stored.
A Logical record refers to the way a
user views a record. It contains all
the data related to a single item.
Independence Portion of same logical record
may be located in different
physical records or part of logical
records may be located in one
physical record.
A logical record is independent of
its physical environment
Example A group of pulses recorded on a
magnetic tape or disk, series of
holes pushed into paper tape.
It can be a payroll record for an
employee, or a record of all the
changes made by a customer in a
departmental store.
File: A file is a number of related records that are treated as
a unit. For
example, a collection of employee records for one company would
be
an employee file.
FILE
Employee 2
Employee 1
Employee No XXX
XXX
Character
Salary
Field
-
37 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
Transaction File and Master File
Basis Master File Transaction File
Data Life Master file contains relatively
permanent records for identification
and summarizing statistical
information.
These files contain temporary
data which is to be processed
in combination with master file
Content It contains current or nearly current
data, which is updated regularly.
These files generally contain
information used for updating
master files.
Data Size It rarely contains detailed transaction
details.
It contains detailed data.
Examples The product files, customer files,
employee files etc.
Purchase orders, job cards,
invoices, etc.
Access
method
These are usually maintained on direct
access storage devices
These are usually maintained
on sequentially as well as direct
access storage devices.
Redundancy It can never be redundant as it has to
be updated regularly.
Once the transaction files are
used to update the master file,
it is no longer required and will
be considered redundant.
File Organization
I. Serial File Organization
Records are arranged one after the other in no particular order-
other than,
chronological order in which records are added to the file. This
type of organization is
commonly found with transaction data, where records are created
in a file in the order
in which transaction takes place.
II. Sequential File Organization
1. In sequential file, records are stored one after another in
an ascending or
descending order determined by the key field of the records.
2. In Payroll example, the records of the employee file may be
organized
sequentially by employee code sequence.
-
38 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
3. Sequentially organized files that are processed by computer
systems are
normally stored on storage media such as magnetic tape, punched
paper,
punched cards or magnetic disks.
4. To access these records, the computer must read the file in
sequence from the
beginning. The first record is read and processed first, then
the second record
in the file sequence, and so on. To locate a particular record,
the computer
program must read in each record in sequence and compare its key
field to the
one that is needed. The retrieval search ends only when desired
key matches
with the key field of the currently read record.
Merits:
Simple to understand
Only record key required to locate record.
Efficient and Economical if the activity rate is high i.e.
proportion of file
records processing.
Inexpensive I/O devices may be used.
Reconstruction of files relatively easy since a built in back up
us usually
available.
Demerits:
Even in low activity rate entire fields are processed.
Transaction must be stored and placed in sequence prior to
processing.
When files are accumulated in between timelines of data
deteriorates.
High data redundancy since same data stored in several files
sequenced
on different key.
Applications:
Payroll systems
Electricity billing or any other billing where each record need
to be
accessed.
-
39 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
III. Direct File Access Organization
A- Self -Addressing method: A record key is used as its relative
address. Therefore, we can
computer the records address directly from the record key and
the physical address of the
first record in the file.
B- Indexed Sequential File Organization :
1. A computer provides a better way to store information like
the card catalogue; indeed, most public libraries today keep their
card catalogues on a computer. For each book in the library, a data
record is created that contains information gathered from the
various card catalogues. For example, the title of the book, the
author's name, the physical location of the book, and any other
relevant information. A record is generally composed of several
fields, with each field used to store a particular piece of
information. For example, we might store the author's last name in
one field and the first name in a separate field. All the records
(one for each book) are collected and stored in a file. The file
containing the records is typically called the data file.
2. Indexes are created so that a particular record in the data
file can be located quickly. For example, we could create an author
index, a title index, and a subject index. The indexes are
typically stored in a separate file called the index file.
3. An index is a collection of "keys", one key for each record
in the data file. A key is a subset of the information stored in a
record. When an index is created, the key values are extracted from
one or more fields of each record. The value of each key determines
its order in the index (i.e., the keys are sorted alphabetically or
numerically). Each key has an associated pointer that indicates the
location in the data file of the corresponding complete record. To
find a particular record, a matching key is quickly located in the
index, and then the associated pointer is used to locate the
complete record.
4. Consider the problem of locating a particular book in a
library containing thousands of books. Public libraries long ago
developed the card catalogue as a means to efficiently locate a
particular book. Usually there were at least three card catalogues,
one with cards arranged in order by the name of the author, another
arranged by the title of the book, and a third arranged by subject
heading. Each card contained information about the book, most
importantly its location in the library. Therefore, by knowing the
name of the author, the title of the book, or the appropriate
subject heading, you could use the card catalogues
DIRECT ACCESS
DIRECT SEQUENTIAL
ACCESS
RANDOM ACCESS
SELF DIRECT
ADDRESSING METHOD
(A)
INDEX SEQUENTIAL
ADDRESSING METHOD
(B)
ADDRESS GENERATION
METHOD
INDEXED RANDOM
-
40 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
to quickly determine the location of a particular book. The card
catalogues can be thought of as indexes.
5. Consider the author index. There is a filing cabinet
containing a card for each book in the library, filed in
alphabetical order by the author's name. Each drawer in the cabinet
is labelled, perhaps "A-E", "F-J", and so on. There are two broad
kinds of searches that you might want to perform on the author
index.
6. First, you might want to make a list containing the name of
every book in the library. To do this you would start in the first
drawer with the first card, and look at each card in order until
you reached the last card in the last drawer. This is called a
"sequential" search because you look at each card in the catalogue
in sequential order.
7. Second, you might want to know the names of the books in the
library that were written by Thomas Jefferson. Instead of examining
every card in the catalogue, you are first guided by the labels on
the drawers to the second drawer, the "F-J" drawer. You are then
guided by the tabs inside the drawer to the names that start with
the letter "J". This is called a "random" search. For any
particular card, you can use the labels (or indexes) to go almost
directly to the desired card.
8. Actually locating the Thomas Jefferson card(s) involves both
a random and sequential search. We use random access to go directly
to the correct drawer and correct tab inside the drawer. The labels
(or indexes) allow us to very quickly get close to the card of
interest. After locating the "J" tab inside the "F-J" drawer, we
then use sequential access to locate the particular Thomas
Jefferson card(s) of interest.
Merits:
Allows efficient and economical use of sequential processing
techniques
when activity rate is high.
Permits quick access to records in relatively sufficient way.
This activity
is a small fraction of the total work load.
Demerits:
Less efficient in the use of storage space than other
organization
Slow access to records because of using indexes. Relatively
expensive
hardware and software resources are required.
Application:
Inventory control where sequential access and also inquiry
required.
Students registration system.
C- Random File Organization
Randomizing procedure is characterised by the fact that records
are stored in such a
way that there is no relationship between the keys of the
adjacent records. The
technique provides for converting the record key number to a
physical location
represented by a disk address through a computational
procedure.
-
41 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
Transactions can be processed in any order and written at any
location through the
stored file. The desired records can be directly accessed using
randomizing
procedure without accessing all other records in the file.
Merits:
Access to records for inquiry and updating possible
immediately.
Immediate updating of several files as a result of single
transaction is possible.
No need for sorting.
Demerits:
Risk to records in the on-line file line, loss of accuracy,
breach of security.
Special backup and reconstruction procedures are
established.
Less efficient in the use of storage space than sequentially
organized file.
Relatively expensive software and hardware resources
required.
Application:
Any type of inquiry such as
Railway reservation or Air reservation system.
o The Best File Organization
File management involves logical organization of data supplied
to a computer in a predetermined
way. Storing data in a particular place is called a FILE. The
file is created using a set of instructions
called PROGRAM. The data created in the file depends on the
following factors:-
1. Data Dependence
2. Data Redundancy
3. Data Integrity
File Management Software
It is a software package that helps the users to organize data
into files, process them and
retrieve the information.
The users can create report, formats, enter data into records,
search records, sort them
and prepare reports.
They are designed for micro computers and menu- driven allowing
end users to create files
by giving easy to use instructions.
Following are the criteria in choosing file organisation
method:
1. File Volatility
-
42 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
(i) File Volatility is the number of additions and deletions to
the file in a given period
of time. E.g. Payroll file of a company where the employee
register is constantly
changing is a highly volatile file, and therefore direct access
method is better.
2. File Activity
(i) File activity refers to the proportion of records accessed
on a run to the no. Of
records in a file.
(ii) In case of real time files where each transaction is
processed immediately only one
master record is accessed at a time, direct access method is
appropriate.
(iii) In case where almost every record is accessed for
processing sequentially ordered
file is appropriate.
3. File Interrogation
(i) File interrogation refers to the retrieval of information
from a file.
(ii) If the retrieval of individual records must be fast to
support a real time
operation such as Airline reservation then some kind of
direct
organization is required.
(iii) If on the other hand, requirements for data can be
delayed, then all the
individuals requests of information can be batched and run in a
single
processing run with a sequential file organization.
4. File Size
(i) Large files which require many individual references to
records with immediate
response must be organized under direct access method.
(ii) In case of small files, it is better to search the entire
file sequentially or with a more
efficient binary search, to find an individual record than to
maintain complex
indexes or complex direct addressing schemes.
Problems of the File Processing Systems:
i. Data Redundancy: Same data is stored in different files since
the data files are
independent. This result in lot of duplicated data and a
separate file maintenance program
is necessities to update each file.
ii. Data Dependence: The component of a file processing system
depends on one another,
and therefore changes were made in the format and structure of
data in a file. Changes
have to be made in all programs that use this file.
iii. Data integrity: The same data is found in different forms
in different files. Checking the
validity of data could not be uniformly implemented with the
result that data in one file
-
43 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
may be correct and in another file wrong. Special computer
programs were written to
retrieve data from such independent files which are time
consuming and expensive.
iv. Data Availability: Since data is scattered in many files, it
would be necessary to look into
many files before relying on a particular data. Due to non-
uniformity in the file design, the
data may have different identification numbers in different
files and obtaining the
necessary data will be difficult.
v. Management control: Uniform policies and standards cannot be
set since the data is
scattered in different files. It is difficult to relate such
files and difficult to implement a
decision due to non- uniform coding of the data files.
DATA BASE MANAGEMENT SYSTEMS
Database
A Database is a collection of related and ordered information
organised in such a way that
information can be assessed quickly and easily. Hence, an
organised logical group of related files
would constitute a database.
According to G.M.Scott, - A database is a computer file system
that uses a particular file
organisation to facilitate rapid updating of individual records,
simultaneous updating of related
records, easy access to all records by all application programs
and rapid access to all stored data
which must be bought together for a particular routine report or
inquiry or a special purpose
report or inquiry
Types of Databases:-
1. Operational Databases: These databases keep the information
needed to support the
operation of an organization .These are mainly day to day
working database e.g.
customer, employee and inventory database, etc.
2. Management Databases: These databases keep the selected
information and data
extracted mainly from operational and external database.
3. Information warehouse Databases: A Data warehouse stores the
data of current and
previous years. It is a central source of data that has been
standardized and integrated
so that it can be used by managers and other end user
professionals throughout an
organization.
4. Distributed Databases: These are the databases of local work
group and department at
branch offices, manufacture plants and other work sites,
regional offices, etc. Main aim
of these databases is to ensure that organization database is
distributed but updated
concurrently.
-
44 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
Advantages:
Local computer on the network offers immediate response to local
needs.
Systems can be expanded in modular fashion as needed.
Since many small computers are used, the system is not dependent
on one
large that could shut down the network if failed.
Equipment operating and management costs are often lower.
Micro computers tends to be less complex than large systems,
therefore the
system is more useful to local users.
5. End user database: These databases consist of various data
files of word, Excel and
database which end user has generated.
6. External Databases: These are also known as online databases
provided by various data
banks or organizations at nominal fee.
7. Test Databases: These are informative databases available
normally on CD- Rom disk
for certain price.
8. Images databases: These databases contain alpha numeric
information. These are
available either on Internet or in CD at certain price.
9. Object oriented databases: This is a type of database
structure developed to be
suitable to changing application needs. When integrated database
structures were
developed, the need for OODB was felt. Database with relational
qualities that are
capable of manipulating text, data, objects, images and audio/
video clips are used by
organisations. With OODB, OOP has been developed. In OOP (object
oriented
programming), every object is described as a set of attributes
describing what the
object is. The behaviour of the object is also included in the
program. Objects with
-
45 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
similar qualities and behaviour can be grouped together. OOP is
more useful in
decision making.
10. Partitioned Database (Partial Distribution): Some databases
are centrally managed and
some managed in a decentralised manner. This approach is called
partitioned
database. For e.g., financial, marketing, administrative data
can be maintained in
headquarters whereas production data may be maintained in
decentralised locations.
Factors to be addressed in maintaining a database:
1. Installation of Database:
Correct installation of the DBMS product.
Ensuring that adequate file space is available
Allocate the disc space for database properly.
Allocation of data files in standard sizes for input out
balancing.
2. Memory usage:
How are buffers being used?
How the DBMS uses main memory?
What the programs in main memory have?
3. Input/ Output ( I/O) Contention:
Achieving maximum I/O performance is one of the most
important
aspects if timing. Understanding how the data are accessed by
end-
users is critical to I/O contention.
Clock speed of CPU requires more time management of I/O.
Simultaneous or separate use of I/O Devices.
Spooling, buffering, etc. can be used.
4. CPU usage:
Multi programming and multi-processing improves performance
in
query processing.
Monitoring CPU load.
Mixture of online/ background processing need to be
adjusted.
Mark jobs that can be processed in run off period to unload the
machine
during peak working hours.
-
46 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
Components of a Database Environment
1. Database files: These files have data elements stored in
database file organization formats.
The database is created in such a way so as to balance the data
management objective to
speed, multiple access paths, minimum storage, program data
independence and
preservation of data integrity.
2. A Database Management System (DBMS): DBMS is a set of system
software program that
manages the database files. Request for access to files,
updating of records and retrieval of
data is done by DBMS. The DBMS has the responsibility for data
security, which is vital in a
database environment since database is accessed by many
users.
3. The users: Users consist of both traditional users and
application programmers, who are
not traditionally considered as users. Users interact with the
DBMS indirectly via
application programs or directly via a simple query
language.
Classification of DBMS Users:
Nave users who are not aware of the presence of the database
system supporting
the usage.
Online users who may communicate with database either directly
through online
terminal or indirectly through user interface or application
programs. Usually they
acquire some skill and experience in communicating with the
database.
Application programmers who are responsible for developing the
application
programs and user interfaces.
DBA who can execute centralized control and is responsible for
maintaining the
database.
The user interaction with the DBMS includes the definition of
the logical relationships in
the database, input and maintenance of data, changing and
deletion and manipulation of
data.
4. A host interface system: This is that part of DBMS which
communicates with the
application programs. The host language interface interprets
instructions in high level
language application programs, such as COBOL and BASIC programs
that requests data
from files so that the data needed can be retrieved. During this
period the OS interacts
with the DBMS. Application programs do not contain information
about the file, thus the
program is independent of a database system.
5. The application programs: These programs perform the same
functions as they do in
conventional system, but they are independent of the data files
and use standard data
definitions. This independence and standardisation make rapid
special purpose program
development easier and faster.
-
47 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
6. A Natural Language Interface System: The query permits online
update and inquiry by
users who are relatively un -sophisticated about computer
systems. This language is often
termed English- like because instructions of this language are
usually in form of a simple
command in English, which are used to accomplish an enquiry
task. Query language also
permits online programming of simple routines by managers who
wish to interact with the
data. The natural language may also facilitate managers to
generate special reports.
7. The data dictionary: Data dictionary is a centralized
depository of information, in a
computerized form, about the data in database. The data
dictionary also contains the
scheme of the database i.e. the name of each item in the
database and a description and
definition of its attributes along with the names of the
programs that use them and who is
responsible for the data authorization tables that specify users
and the data and programs
authorized for their use. Their descriptions and definitions are
referred to as the data
standards. Maintenance of a data dictionary is the
responsibility of the DBA.
8. Online access and update terminals: These may be adjacent to
computer or even
thousands of miles away. They may be dumb terminals, smart
terminals or
microcomputers.
9. The output system or report generators: This provides routine
job reports, documents and
special reports. It allows programmers, managers and other users
to design output reports
without writing an application program in a programming
language.
10. File Pointer: It is pointers that is placed in the last
field of a record and contains the
address of another related record thus establish a link between
records. It directs the
computer system to move to that related record.
11. Linked List: A Linked list is a group of data records
arranged in an order, which is based on
embedded pointers. An embedded pointer is a special data field
that links one record to
another by referring to the other record. The field is embedded
in the first record, i.e. It is
a data element within the record.
Factors contributing to the Architecture of a Database:
1. External View
It is also known as user view.
As the name suggests, it includes only those application
programs which are user
concerned.
It is described by users/ programmers by means of external
schema.
2. Conceptual View
It is also known as global view.
It represents the entire data base and includes all data base
entries
-
48 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
It is defined by conceptual schema and describes all records,
relationships,
constraints and boundaries.
3. Internal view
It is also known as physical view
It describes the data structure and the access methods
It is defined by internal schema and indicates how data will be
stored
Out of the above three, External view is USER DEPENDENT and the
rest two are
USER INDEPENDENT.
Data Independence
1. In a database an ability to modify a schema definition at one
level is done without
affecting a schema in the next higher level.
2. It facilitates logical data independence
3. It assures physical data independence.
Structure of Database
The logical organizational approach of the database is called
the Database structure. There are
three basic structures available, viz. Hierarchical, and
Relational and Network database
structure.
Ext. Schema
2
Ext. Schema
1
Ext. Schema
3
Conceptual Schema
Physical Schema
-
49 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
Hierarchical Database Structure
In this type of architecture records are logically arranged into
a hierarchy of
relationships.
Records are logically arranged in a tree pattern. Hierarchy
structure implements one to
one and one to many relationships. All records in hierarchy are
called nodes.
Each node is related to other in a parent- child relationship as
each parent record may
have one or more child record but no child record may have more
than one parent
record.
The top parent record in the hierarchy is called the root.
Features of Hierarchy Database:-
i. Hierarchically structured database are less flexible than any
other database structure
because the hierarchy of records must be determined and
implemented before a
search can be conducted, or in other words, the relationships
between records are
relatively fixed by the structure.
ii. Managerial use of query language to solve the problem may
require multiple searches
and proof which is very time consuming. Thus, analysis and
planning activities, which
frequently involve ad-hoc management queries of the database,
may not be supported
as effectively by a hierarchical DBMS as they are by other
database structures.
iii. Ad-hoc queries made by managers that require different
relationships other than that
are already implemented in the database may be difficult or time
consuming to
accomplish.
iv. Records are logically structured in inverted tree
pattern.
v. It implements one to one and one to many relationships.
vi. Each record or node in hierarchy is related to other records
in a parent- child
relationship.
vii. Child to many parents type logical structure finds
difficulty in processing.
viii. Processing with group records of natural relations can be
done faster.
-
50 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
Relational Database Structure
An example of such a situation may be the representation of
Actors, Movies, and Theatres.
In order to know who plays what and where, we need the
combination of
These three attributes. However, they each relate to each other
cyclically. So to resolve
this, we would need to establish parent tables with Actor -
movie, movie - Theatre, and
Theatre - Actor. These would each contain a portion of the
Primary Key in the Actor,
Movie, and Theatre table.
ACTOR MOVIE THEATRE
Kamalhaasan Manmadhan Ambu Satyam
Dhanush Aadukalam PVR
Karthi Siruthai INOX
Trisha Manmadhan Ambu Satyam
Tammanna Siruthai PVR
i. This is a model where more than one data file is
compared.
ii. More than one file is compared at a time with the help of a
common key field.
iii. Each file is converted into a table and the analysis is
done on the tables with the
help of common key field.
iv. The row of the table represents the list of records and the
column represents data
field.
v. It is not necessary to maintain the entire file in a single
physical location but it can
be maintained geographically at any place.
vi. This is more suitable for wider analysis of data from
different locations.
ACTOR
MOVIE
MOVIE
THEATRE
THEATRE
ACTOR
MOVIE
THEATRE
ACTOR
-
51 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
vii. Queries are easily possible because software interacts with
different records at the
same time.
Network Database Structure
This structure is more useful when data is transmitted from one
place to another
place that is one-to-one mode, many-to-many model. This type of
structure is
found in organizations where online data processing is carried
out.
DBMS (Language)
I. Data Definition Language:
DDL defines the conceptual schema providing a link between the
logical and physical
structures of the database. The logical structure of a database
is schema. A subschema
is the way a specific application views the data from the
database.
Following are the functions of DDL:
i. They define the physical characteristics of each record,
field in the record,
fields type and length, fields logical name and also specify
relationships among
the records.
ii. They describe the schema and subschema.
iii. They indicate the keys of the record
iv. They provide means for associating related records or
fields
v. They provide for data security measures.
vi. They provide for logical and physical data independence.
II. Data manipulation Language
DML is a Database Language used by database users to retrieve,
insert, delete and
update data in a database.
-
52 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
Following are the functions of DML:
They provide the data manipulation techniques like deletion,
modification,
insertion, replacement, retrieval, sorting and display of data
or records.
They facilitate use of relationships between the records
They enable the user and application program to be independent
of the physical
data structure and database structures maintenance by allowing
to process data on
a logical and symbolic basis rather than on a physical location
basis.
They provide for independence of programming languages by
supporting several
high-level procedural languages like COBOL, C++, etc.
STRUCTURE OF DBMS
I. DDL Compiler
It converts data definition statements into a set of tables.
Tables contain meta data (data about the data) concerning the
database.
It gives rise to a format that can be used by other components
of database.
II. Data Manager
It is the central software component
It is referred to as database control system
It converts operation in users queries to physical file
system.
III. File manager
It is responsible for file structure
It is responsible for managing the space
It is responsible for acting block containing required
record.
It is responsible for requesting block from disk manager.
It is responsible for transmitting required record to data
manager.
IV. Disk Manager
It is a part of the operating system
It carries out all physical input/output operations.
It transfers block/page requested by file manager.
-
53 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
V. Query manager
It interprets users online query
It converts to an efficient series of operations.
In a form it is capable of being sent to data manager.
It uses data dictionary to find structure of relevant portion of
database.
It uses information to modify query.
It prepares an optimal plan to access database for efficient
data retrieval.
VI. Data Dictionary
It maintains information pertaining to structure and usage of
data and meta data.
It is consulted by the database users to learn what each piece
of data and various
synonyms of data field means.
DATA BASE ADMINISTRATOR
A DBA is a person who actually creates and maintains the
database and also carries out the
policies developed by the DA. Job of the DBA is a technical one.
He is responsible for defining the
internal layout of the database and also for ensuring that the
internal layout optimizes system
performance, especially in main business processing areas.
Main functions of a DBA are:-
1. Determining the physical design of a database and specify the
hardware resource
requirement for the purpose. This can be done by determining the
data requirement
schedule and accuracy requirements, the way and frequency of
data access, search
strategies, physical storage requirements of data, level of
security needed and the
response time requirement.
2. Define the contents of the database.
3. Use of data definition language (DDL) to describe formats
relationships among various
data elements and their usage.
4. Maintain standard and control to the database.
5. Specify various rules, which must be adhered to while
describing data for a database.
6. Allow only specified users to access the database by using
access controls thus prevent
unauthorised access.
7. DBA also prepares documentation which includes recording the
procedures, standard
guidelines and data descriptions necessary for the efficient and
continuous use of
database environment.
-
54 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
8. DBA ensures that the operating staff perform its database
processing related
responsibilities which include loading the database, following
maintenance and security
procedures, taking backups, scheduling the database for use and
following, restart and
recovery procedures after some hardware or software failure in a
proper way.
9. DBA monitors the database environments.
10. DBA incorporates any enhancements into the database
environment, which may include
new utility program or new system releases.
Structured Query Language
SQL is a query language that enables to create relational
database which are sets of related
information stored in tables.
It is a set of commands for creating, updating and accessing
data from database.
It allows programmers, managers and other users to ask ad-hoc
queries of the database
interactively without the aid of programmers. It is a set of
about 30 English like commands
such as Select..From.where.
SQL has following features:
a. Simple English like commands
b. command syntax is easy
c. Can be sued by non- programmers.
d. Can be used for different type of DBMS
e. Allows user to create, update database.
f. Allows retrieving data from database without having detailed
information about
structure of the records and without being concerned about the
processes the DBMS users
to retrieve the data.
g. Has become standard practice for DBMS.
Since SQL is used in many DBMS, managers who understand SQL are
able to use the same set of
commands regardless of the DBMS software that they may use.
PROGRAM LIBRARY MANAGEMENT SYSTEM
Program library management system provides several functional
capabilities to facilitate effective
and efficient management of the data centre software inventory.
The inventory may include
application and system software program code, job control
statements that identify resources
used and processes to be performed and processing parameters
which direct processing.
Some of the capabilities are as follows:
-
55 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
a. Integrity- each source program is assigned a modification
number and version number and
each source statement is associated with a creation date.
Security to program libraries, job
control language sets and parameters file is provided through
the use of passwords,
encryption, data compression facilities and automatic backup
creation.
b. Update- Library management systems facilitate the addition,
deletion, re-sequencing, and
editing of library members.
c. Reporting- With use of its facilities a list of additions,
deletions and modifications along
with library catalogue and library member attributes can be
prepared for management
and auditor review.
d. Interface- Library software packages may interface with the
operating system, job
scheduling, access control system and online program
management.
Need for Documentation:
It provides a method to understand the various issues related
with software
development.
It provides means to access details related to system study,
system development,
system testing, system operational details.
It provides details associated with further modification of
software.
4 types of documentation are required prior to delivery of
customized software to
a customer :
Strategic and application plans
Application systems and program documentation
Systems software and utility program documentation
Database documentation, Operation manual, User manual,
Standard
manual, Backup manual and others.
DATA WAREHOUSE
A Data warehouse is a computer database that collects,
integrates and stores an organisations
data with the aim of producing accurate and timely management
information and supporting data
analysis. It provides tools to satisfy the information needs of
employees or all organizational levels
and not just for complex data queries. It made possible to
extract archived operational data and
overcome inconsistencies between different legacy data
formats.
A Data Mart is a subset of a Data Warehouse. Most organizations
do start designing a data mart to
attend to immediate needs. To keep it simple, consider Data Mart
as a data reserve that satisfies
-
56 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
certain aspect of business or just one application (or a
process). Data Warehouse is a super set
that engulfs all such mini Data marts to form one big reservoir
of information.
Characteristics of Data warehouse
1. It is subject oriented, means data are organized according to
subject instead of
application. The organized data (according to subject) contains
only the information
necessary for decision support processing.
2. Encoding of data is often inconsistent when the data resides
in many separate
applications in the operational environment but when data are
moved from the
operational environment into the data warehouse they assume a
consistent coding
convention.
3. Data warehouse contains a place for storing historical data
to be used for comparison,
trends and forecasting.
4. Data are not uploaded or changed in anyway once they enter
the data warehouse but
are only loaded and accessed.
COMPONENTS OF A DATA WAREHOUSE (W.R.T figure)
Data Sources
Data sources refer to any electronic repository of information
that contains data of interest for management use or analytics.
This definition covers mainframe databases (e.g. IBM DB2, ISAM,
Adabas, Teradata, etc.),client-server databases (e.g. IBM DB2,
Oracle database, Informix, Microsoft SQL Server etc.), PC databases
(eg Microsoft Access), spreadsheets (e.g. Microsoft Excel) and any
other electronic store of data. Data needs to be passed from
these
-
57 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
systems to the data warehouse either on a
transaction-by-transaction basis for real-time data warehouses or
on a regular cycle (e.g. daily or weekly) for offline data
warehouses.
Data Transformation
The Data Transformation layer receives data from the data
sources, cleans and standardises it, and loads it into the data
repository. This is often called "staging" data as data often
passes through a temporary database whilst it is being transformed.
This activity of transforming data can be performed either by
manually created code or a specific type of software could be used
called an ETL tool. Regardless of the nature of the software used,
the following types of activities occur during data
transformation:
Comparing data from different systems to improve data quality
(e.g. Date of birth for a customer may be blank in one system but
contain valid data in a second system. In this instance, the data
warehouse would retain the date of birth field from the second
system)
standardising data and codes (e.g. If one system refers to
"Male" and "Female", but a second refers to only "M" and "F", these
codes sets would need to be standardised)
integrating data from different systems (e.g. if one system
keeps orders and another stores customers, these data elements need
to be linked)
performing other system housekeeping functions such as
determining change (or "delta") files to reduce data load times,
generating or finding surrogate keys for data etc.
Data Warehouse
The data warehouse is a relational database organised to hold
information in a structure that best supports reporting and
analysis. Most data warehouses hold information for at least 1 year
and sometimes can reach half century, depending to the
Business/Operations data retention requirement. As a result these
databases can become very large.
Reporting
The data in the data warehouse must be available to the
organisation's staff if the data warehouse is to be useful. There
are a very large number of software applications that perform this
function, or reporting can be custom-developed. Examples of types
of reporting tools include:
Business intelligence tools: These are software applications
that simplify the process of development and production of business
reports based on data warehouse data.
Executive information systems: These are software applications
that are used to display complex business metrics and information
in a graphical way to allow rapid understanding.
OLAP Tools: OLAP tools form data into logical multi-dimensional
structures and allow users to select which dimensions to view data
by.
Data Mining: Data mining tools are software that allows users to
perform detailed mathematical and statistical calculations on
detailed data warehouse data to detect trends, identify patterns
and analyse data.
-
58 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
Metadata
Metadata, or "data about data", is used to inform operators and
users of the data warehouse about its status and the information
held within the data warehouse. Examples of data warehouse metadata
include the most recent data load date, the business meaning of a
data item and the number of users that are logged in currently.
Operations
Data warehouse operations comprises of the processes of loading,
manipulating and extracting data from the data warehouse.
Operations also cover user management, security, capacity
management and related functions
Optional Components
In addition, the following components also exist in some data
warehouses:
1. Dependent Data Marts: A dependent data mart is a physical
database (either on the same hardware as the data warehouse or on a
separate hardware platform) that receives all its information from
the data warehouse. The purpose of a Data Mart is to provide a
sub-set of the data warehouse's data for a specific purpose or to a
specific sub-group of the organisation.
2. Logical Data Marts: A logical data mart is a filtered view of
the main data warehouse but does not physically exist as a separate
data copy. This approach to data marts delivers the same benefits
but has the additional advantages of not requiring additional
(costly) disk space and it is always as current with data as the
main data warehouse.
3. Operational Data Store: An ODS is an integrated database of
operational data. Its sources include legacy systems and it
contains current or near term data. An ODS may contain 30 to 60
days of information, while a data warehouse typically contains
years of data. ODS's are used in some data warehouse architectures
to provide near real time reporting capability in the event that
the Data Warehouse's loading time or architecture prevents it being
able to provide near real time reporting capability.
Different methods of storing data in a data warehouse
All data warehouses store their data grouped together by subject
areas that reflect the general usage of the data (Customer,
Product, Finance etc.). The general principle used in the majority
of data warehouses is that data is stored at its most elemental
level for use in reporting and information analysis.
Within this generic intent, there are two primary approaches to
organising the data in a data warehouse.
The first is using a "dimensional" approach. In this style,
information is stored as "facts" which are numeric or text data
that capture specific data about a single transaction or event, and
"dimensions" which contain reference information that allows each
transaction or event to be classified in various ways. As an
example, a sales transaction would be broken up into facts such as
the number of products ordered, and the price paid, and dimensions
such as date, customer, product, geographical location and sales
person. The main advantages of a dimensional approach are that the
Data Warehouse is easy for business staff with limited information
technology
-
59 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
experience to understand and use. Also, because the data is
pre-processed into the dimensional form, the Data Warehouse tends
to operate very quickly. The main disadvantage of the dimensional
approach is that it is quite difficult to add or change later if
the company changes the way in which it does business.
The second approach uses database normalisation. In this style,
the data in the data warehouse is stored in third normal form. The
main advantage of this approach is that it is quite straightforward
to add new information into the database, whilst the primary
disadvantage of this approach is that it can be quite slow to
produce information and reports.
The Advantages of using a Data Warehouse are:
1. Enhanced and user access to a wide variety of data.
2. Increased Data consistency
3. Increased productivity and decreased computational cost.
4. It is able to combine data from different sources, in one
place.
5. It provides an infrastructure that could support change to
data and replication of the
changed data back into the operational systems.
Concerns in using data warehouse
Extracting, cleaning and loading data could be time consuming.
Data warehousing project scope might increase. Problems with
compatibility with systems already in place e.g. transaction
processing
system. Providing training to end-users, who end up not using
the data warehouse. Security could develop into a serious issue,
especially if the data warehouse is web
accessible.
Types of Data Warehouses
With improvements in technology, as well as innovations in using
data warehousing techniques, data warehouses have changed from
Offline Operational Databases to include an Online Integrated data
warehouse.
Offline Operational Data Warehouses are data warehouses where
data is usually copied and pasted from real time data networks into
an offline system where it can be used. It is usually the simplest
and less technical type of data warehouse.
Offline Data Warehouses are data warehouses that are updated
frequently, daily, weekly or monthly and that data is then stored
in an integrated structure, where others can access it and perform
reporting.
Real Time Data Warehouses are data warehouses where it is
updated each moment with the influx of new data. For instance, a
Real Time Data Warehouse might incorporate data from a Point of
Sales system and is updated with each sale that is made.
-
60 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
Integrated Data Warehouses are data warehouses that can be used
for other systems to access them for operational systems. Some
Integrated Data Warehouses are used by other data warehouses,
allowing them to access them to process reports, as well as look up
current data.
BACKUP AND RECOVERY
Recovery is a sequence of tasks performed to restore a database
to some point-in-time.
'Disaster recovery' differs from a database recovery scenario
because the operating system
and all related software must be recovered before any database
recovery can begin.
Database files that make up a database: Databases consist of
disk files that store Data.
When you create a database either using any database software
command-line utility, a
main database file or root file is created. This main database
file contains database tables,
system tables, and indexes. Additional database files expand the
size of the database and
are called dbspaces.
A dbspace contains tables and indexes, but not system
tables.
A transaction log is a file that records database modifications.
Database modifications
consist of inserts, updates, deletes, commits, rollbacks, and
database schema changes. A
transaction log is not required but is recommended. The database
engine uses a
transaction log to apply any changes made between the most
recent checkpoint and the
system failure. The checkpoint ensures that all committed
transactions are written to disk.
During recovery the database engine must find the log file at
specified location. When the
transaction log file is not specifically identified then the
database engine presumes that
the log file is in the same directory as the database file.
A mirror log is an optional file and has a file extension of
.mlg. It is a copy of a transaction
log and provides additional protection against the loss of data
in the event the transaction
log becomes unusable.
Online backup, offline backup, and live backup: Database backups
can be performed while
the database is being actively accessed (online) or when the
database is shutdown (offline)
When a database goes through a normal shutdown process (the
process is not being
cancelled) the database engine commits the data to the database
files An online database
backup is performed by executing the command-line or from the
'Backup Database' utility.
When an online backup process begins the database engine
externalizes all cached data
pages kept in memory to the database file(s) on disk. This
process is called a checkpoint.
The database engine continues recording activity in the
transaction log file while the
database is being backed up. The log file is backed up after the
backup utility finishes
backing up the database. The log file contains all of the
transactions recorded since the last
database backup. For this reason the log file from an online
full backup must be 'applied'
to the database during recovery. The log file from an offline
backup does not have to
participate in recovery but it may be used in recovery if a
prior database backup is used.
-
61 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
A Live backup is carried out by using the backup utility with
the command-line option. A
live backup provides a redundant copy of the transaction log for
restart of your system on
a secondary machine in the event the primary database server
machine becomes
unusable.
Full and Incremental database backup: Full backup is the
starting point for all other types
of backup and contains all the data in the folders and files
that are selected to be backed
up. Because full backup stores all files and folders, frequent
full backups result in faster
and simpler restore operations.
Incremental backup stores all files that have changed since the
last FULL, DIFFERENTIAL
OR INCREMENTAL backup. The advantage of an incremental backup is
that it takes the
least time to complete.
For example, you're running a backup on Friday: this first
backup always would be a full backup by default. Then, upon your
working with theses files on Monday, Leo Backup performs the
incremental backup: this backup will transfer only those files that
changed since Friday. A Tuesday backup will carry only those files
that changed since Monday. And the same course for the following
days.
Core phases in developing a backup and recovery strategy
1. Create backup and recovery commands: The commands should be
verified with the actual
results produced to ensure that desired results are
produced.
2. Time estimates from executing backup and recovery commands
help to get a feel for how
long will these tasks take. This information helps in
identifying what command will be
executed and when.
-
62 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
3. Document the backup commands and create procedures outlining
backups which are kept
in a file. Also identify the naming convention used as well as
the king of backups
performed.
4. Incorporate health checks into the backup procedures to
ensure that the database is not
corrupt. Database health check can be performed prior to backing
up a database or on a
copy of the database from the back up.
5. Deployment of backup and recovery consists of setting up
backup procedures on the
production server. Verification of the necessary hardware in
place and any other
supporting software required to perform these tasks must be
done. Modify procedures to
reflect the change in development.
6. Monitor backup procedures to avoid unexpected errors. Make
sure that any changes in
the process are reflected in the documentation.
Data Centre and the challenges faced by the management of a
data
centre:
i. A Data centre is a centralized repository for the storage,
management and dissemination
of data and information.
ii. Data centre is a facility used for housing a large amount of
electronic equipment, typically
computers and communication equipment.
iii. The purpose of a data centre is to provide space and
bandwidth connectivity for server in a
reliable, secure and scalable environment.
iv. It also provides facilities like housing websites, providing
data serving and other services
for companies. Such type of data centre may contain a network
operation s centre (NOC)
which is restricted access area containing automated system that
constantly monitor
server activity, web traffic, network performance and report
even slight irregularities to
engineers so that they can stop potential problems before they
occur.
Challenges:
Maintaining Infrastructure A Data centre needs to set up an
infrastructure comprising of
a member of electronic equipment, typically computers and band
width connectivity for
server in a reliable secure and saleable environment.
Skilled Human Resources a Data centre needs skilled staff expert
at network management
having software and hardware operating skill.
Selection of Technology- A Data centre also faces the challenge
of proper selection of
technology crucial to the operation of the data centre.
Maintaining system performance A Data centre has to maintain
maximum uptime and
system performance, while establishing sufficient redundancy and
maintaining security.
-
63 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
DATA MINING
Data mining is the extraction of implicit, previously unknown
and potentially useful information
from data. It searches for relationship and global patterns that
exist in large databases but are
hidden among the vast amount of data. These relationships
represent valuable knowledge about
database and objects in the database that can be put to use in
the areas such as decision support,
prediction, forecasting and estimation.
In other words, data mining is concerned with the analysis of
data and the use of software
techniques used for finding patterns and regularities in sets of
data. It is the computer responsible
for finding the patterns by identifying the underlying rules and
features in the data.
Stages in data mining
1. Selection: Selecting or segmenting the data according to some
criteria so that sub sets of
the data can be determined.
2. Pre- processing: This is the data cleansing stage where
certain information is removed
which is deemed unnecessary and may slow down queries. Also the
data is re-configured
to ensure a consistent format as there is a possibility of
inconsistent formats because the
data is drawn from several sources.
3. Transformation: The data is not merely transferred across but
transformed in that overlays
may be added. For example, Demographic overlays are commonly
used in market
research. The data is made usable and navigable.
4. Data mining: This stage is concerned with the extraction of
patterns from the data. A
pattern can be defined as a given set of facts. One popular
example of data mining is using
past behaviour to rank customers. Such tactics have been
employed by financial
companies for years as a means of deciding whether or not to
approve loans and credit
cards.
5. Integration and Evaluation: The patterns identified by the
systems are interpreted into
knowledge which can then be used to support human decision
making. For example,
prediction and classification tasks, summarising the contents of
a database or explaining
observed phenomenon.