Top Banner
33 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT) DATA STORAGE, RETRIEVAL AND DATA BASE MANAGEMENT SYSTEMS Data Data are raw facts or observations or assumptions or occurrence about physical phenomenon or business transaction. They are objective measurement of attributes of entities like people, place, things and events. Data is a collection of facts, which is unorganized but can be organized into useful information. Data should be accurate but need not be relevant, timely or concise. It can exist in different forms e.g. picture, text, sound or all of these together. CONCEPTS RELATED TO DATA Double Precision: Real data values are commonly called single precision data because each real constant is stored in a single memory location. This usually gives seven significant digits for each real value. In many calculations, particularly those involving iteration or long sequences of calculations, single precision is not adequate to express the precision required. To overcome this limitation, many programming languages provide the double precision data type. Each double precision is stored in two memory locations, thus providing twice as many significant digits. Logical Data Type: Use the Logical data type when you want an efficient way to store data that has only two values. Logical data is stored as true (.T.) or false (.F.) Characters: Choose the Character data type when you want to include letters, numbers, spaces, symbols, and punctuation. Character fields or variables store text information such as names, addresses, and numbers that are not used in mathematical calculations. For example, phone numbers or zip codes, though they include mostly numbers, are actually best used as Character values. Strings: A data type consisting of a sequence of contiguous characters that represent the characters themselves rather than their numeric values. A String can include letters, numbers, spaces, and punctuation. The String data type can store fixed-length strings ranging in length from 0 to approximately 63K characters and dynamic strings ranging in length from 0 to approximately 2 billion characters. The dollar sign ($) type-declaration character represents a String. Variable is something that may change in value. E.g. - No. Of words in different pages of a book.
31

Data Storage, Retrieval and DBMS

Sep 24, 2015

Download

Documents

Vivek Reddy

Data storage, retrieval and DBMS
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 33 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    DATA STORAGE, RETRIEVAL AND DATA BASE MANAGEMENT SYSTEMS

    Data

    Data are raw facts or observations or assumptions or occurrence about physical

    phenomenon or business transaction.

    They are objective measurement of attributes of entities like people, place, things and

    events.

    Data is a collection of facts, which is unorganized but can be organized into useful

    information.

    Data should be accurate but need not be relevant, timely or concise.

    It can exist in different forms e.g. picture, text, sound or all of these together.

    CONCEPTS RELATED TO DATA

    Double Precision: Real data values are commonly called single precision data because each

    real constant is stored in a single memory location. This usually gives seven significant digits

    for each real value. In many calculations, particularly those involving iteration or long

    sequences of calculations, single precision is not adequate to express the precision required.

    To overcome this limitation, many programming languages provide the double precision data

    type. Each double precision is stored in two memory locations, thus providing twice as many

    significant digits.

    Logical Data Type: Use the Logical data type when you want an efficient way to store data that

    has only two values. Logical data is stored as true (.T.) or false (.F.)

    Characters: Choose the Character data type when you want to include letters, numbers,

    spaces, symbols, and punctuation. Character fields or variables store text information such as

    names, addresses, and numbers that are not used in mathematical calculations. For example,

    phone numbers or zip codes, though they include mostly numbers, are actually best used as

    Character values.

    Strings: A data type consisting of a sequence of contiguous characters that represent the

    characters themselves rather than their numeric values. A String can include letters, numbers,

    spaces, and punctuation. The String data type can store fixed-length strings ranging in length

    from 0 to approximately 63K characters and dynamic strings ranging in length from 0 to

    approximately 2 billion characters. The dollar sign ($) type-declaration character represents a

    String.

    Variable is something that may change in value. E.g. - No. Of words in different pages of a

    book.

  • 34 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    KEY: relational means of specifying uniqueness. A database key is an attribute utilized to sort

    and/or identify data in some manner. Each table has a primary key which uniquely identifies

    records. Foreign keys are utilized to cross-reference data between relational tables.

    The primary key of a relational table uniquely identifies each record in the table. It can either

    be a normal attribute that is guaranteed to be unique (such as Social Security Number in a

    table with no more than one record per person) or it can be generated by the DBMS (such as a

    globally unique identifier, or GUID, in Microsoft SQL Server). Primary keys may consist of a

    single attribute or multiple attributes in combination.

    Examples:

    Imagine we have a STUDENTS table that contains a record for each student at a university. The student's unique student ID number would be a good choice for a primary key in the STUDENTS table. The student's first and last name would not be a good choice, as there is always the chance that more than one student might have the same name.

    A candidate key is a combination of attributes that can be uniquely used to identify a

    database record without any extraneous data. Each table may have one or more candidate

    keys. One of these candidate keys is selected as the table primary key.

    Referential integrity: A feature provided by relational database management systems

    (RDBMS's) that prevents users or applications from entering inconsistent data. Most RDBMS's

    have various referential integrity rules that you can apply when you create a relationship

    between two tables.

    For example, suppose Table B has a foreign key that points to a field in Table A. Referential integrity would prevent you from adding a record to Table B that cannot be linked to Table A. In addition, the referential integrity rules might also specify that whenever you delete a record from Table A, any records in Table B that are linked to the deleted record will also be deleted. This is called cascading delete. Finally, the referential integrity rules could specify that whenever you modify the value of a linked field in Table A, all records in Table B that are linked to it will also be modified accordingly. This is called cascading update. Consider the situation where we have two tables: Employees and Managers. The Employees table has a foreign key attribute entitled Managed By which points to the record for that employees manager in the Managers table. Referential integrity enforces the following three rules: 1. We may not add a record to the Employees table unless the Managed By attributes points

    to a valid record in the Managers table. 2. If the primary key for a record in the Managers table changes, all corresponding records in

    the Employees table must be modified using a cascading update. 3. If a record in the Managers table is deleted, all corresponding records in the Employees

    table must be deleted using a cascading delete.

    Alternate Key: The alternate keys of any table are simply those candidate keys which are not

    currently selected as the primary key. An alternate key is a function of all candidate keys

    minus the primary key.

  • 35 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    Secondary Key: Secondary keys can be defined for each table to optimize the data access.

    They can refer to any column combination and they help to prevent sequential scans over the

    table. Like the primary key, the secondary key can consist of multiple columns. A candidate

    key which is not selected as a primary key is known as Secondary Key.

    Index Fields: are used to store relevant information along with a document.

    Currency FieldsThe currency field accepts data in dollar form by default.

    Date Fields The date field accepts data entered in date format.

    Integer Fields The integer field accepts data as a whole number.

    Text Fields The text field accepts data as an alpha-numeric text string.

    Information

    It is the data that has been converted into a meaningful and useful context for specific end

    users.

    To obtain information data form is aggregated, manipulated and organized, its content

    analysed and evaluated and placed in proper context for human use.

    Information exists as reports, in a systematic textual format or as graphics in an organized

    manner.

    Information must be relevant, timely, accurate, concise and complete and should apply to

    the current situation.

    It should be condensed into useable length.

    Data storage hierarchy

    Character: It is the basic building block of data which consists of letters,

    numeric digits or special characters. These are put together in a FIELD.

    Field: It is a meaningful collection of related characters. It is the smallest logical

    data entity that is treated as a single unit in data processing. For example, If we

    are processing employees data of a company, we may have

    1. Employee code field

    2. Employee name field

    3. An hours worked field

    4. Hourly pay rate field

    5. Tax rate deduction field.

  • 36 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    Record: Fields are grouped together to form a record. An employee

    record would be a collection of fields of one employee.

    Record can be divided into Physical and Logical Record

    Basis Physical Record Logical Record

    Meaning A physical record refers to the

    actual portion of a medium on

    which data is stored.

    A Logical record refers to the way a

    user views a record. It contains all

    the data related to a single item.

    Independence Portion of same logical record

    may be located in different

    physical records or part of logical

    records may be located in one

    physical record.

    A logical record is independent of

    its physical environment

    Example A group of pulses recorded on a

    magnetic tape or disk, series of

    holes pushed into paper tape.

    It can be a payroll record for an

    employee, or a record of all the

    changes made by a customer in a

    departmental store.

    File: A file is a number of related records that are treated as a unit. For

    example, a collection of employee records for one company would be

    an employee file.

    FILE

    Employee 2

    Employee 1

    Employee No XXX

    XXX

    Character

    Salary

    Field

  • 37 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    Transaction File and Master File

    Basis Master File Transaction File

    Data Life Master file contains relatively

    permanent records for identification

    and summarizing statistical

    information.

    These files contain temporary

    data which is to be processed

    in combination with master file

    Content It contains current or nearly current

    data, which is updated regularly.

    These files generally contain

    information used for updating

    master files.

    Data Size It rarely contains detailed transaction

    details.

    It contains detailed data.

    Examples The product files, customer files,

    employee files etc.

    Purchase orders, job cards,

    invoices, etc.

    Access

    method

    These are usually maintained on direct

    access storage devices

    These are usually maintained

    on sequentially as well as direct

    access storage devices.

    Redundancy It can never be redundant as it has to

    be updated regularly.

    Once the transaction files are

    used to update the master file,

    it is no longer required and will

    be considered redundant.

    File Organization

    I. Serial File Organization

    Records are arranged one after the other in no particular order- other than,

    chronological order in which records are added to the file. This type of organization is

    commonly found with transaction data, where records are created in a file in the order

    in which transaction takes place.

    II. Sequential File Organization

    1. In sequential file, records are stored one after another in an ascending or

    descending order determined by the key field of the records.

    2. In Payroll example, the records of the employee file may be organized

    sequentially by employee code sequence.

  • 38 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    3. Sequentially organized files that are processed by computer systems are

    normally stored on storage media such as magnetic tape, punched paper,

    punched cards or magnetic disks.

    4. To access these records, the computer must read the file in sequence from the

    beginning. The first record is read and processed first, then the second record

    in the file sequence, and so on. To locate a particular record, the computer

    program must read in each record in sequence and compare its key field to the

    one that is needed. The retrieval search ends only when desired key matches

    with the key field of the currently read record.

    Merits:

    Simple to understand

    Only record key required to locate record.

    Efficient and Economical if the activity rate is high i.e. proportion of file

    records processing.

    Inexpensive I/O devices may be used.

    Reconstruction of files relatively easy since a built in back up us usually

    available.

    Demerits:

    Even in low activity rate entire fields are processed.

    Transaction must be stored and placed in sequence prior to processing.

    When files are accumulated in between timelines of data deteriorates.

    High data redundancy since same data stored in several files sequenced

    on different key.

    Applications:

    Payroll systems

    Electricity billing or any other billing where each record need to be

    accessed.

  • 39 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    III. Direct File Access Organization

    A- Self -Addressing method: A record key is used as its relative address. Therefore, we can

    computer the records address directly from the record key and the physical address of the

    first record in the file.

    B- Indexed Sequential File Organization :

    1. A computer provides a better way to store information like the card catalogue; indeed, most public libraries today keep their card catalogues on a computer. For each book in the library, a data record is created that contains information gathered from the various card catalogues. For example, the title of the book, the author's name, the physical location of the book, and any other relevant information. A record is generally composed of several fields, with each field used to store a particular piece of information. For example, we might store the author's last name in one field and the first name in a separate field. All the records (one for each book) are collected and stored in a file. The file containing the records is typically called the data file.

    2. Indexes are created so that a particular record in the data file can be located quickly. For example, we could create an author index, a title index, and a subject index. The indexes are typically stored in a separate file called the index file.

    3. An index is a collection of "keys", one key for each record in the data file. A key is a subset of the information stored in a record. When an index is created, the key values are extracted from one or more fields of each record. The value of each key determines its order in the index (i.e., the keys are sorted alphabetically or numerically). Each key has an associated pointer that indicates the location in the data file of the corresponding complete record. To find a particular record, a matching key is quickly located in the index, and then the associated pointer is used to locate the complete record.

    4. Consider the problem of locating a particular book in a library containing thousands of books. Public libraries long ago developed the card catalogue as a means to efficiently locate a particular book. Usually there were at least three card catalogues, one with cards arranged in order by the name of the author, another arranged by the title of the book, and a third arranged by subject heading. Each card contained information about the book, most importantly its location in the library. Therefore, by knowing the name of the author, the title of the book, or the appropriate subject heading, you could use the card catalogues

    DIRECT ACCESS

    DIRECT SEQUENTIAL

    ACCESS

    RANDOM ACCESS

    SELF DIRECT

    ADDRESSING METHOD

    (A)

    INDEX SEQUENTIAL

    ADDRESSING METHOD

    (B)

    ADDRESS GENERATION

    METHOD

    INDEXED RANDOM

  • 40 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    to quickly determine the location of a particular book. The card catalogues can be thought of as indexes.

    5. Consider the author index. There is a filing cabinet containing a card for each book in the library, filed in alphabetical order by the author's name. Each drawer in the cabinet is labelled, perhaps "A-E", "F-J", and so on. There are two broad kinds of searches that you might want to perform on the author index.

    6. First, you might want to make a list containing the name of every book in the library. To do this you would start in the first drawer with the first card, and look at each card in order until you reached the last card in the last drawer. This is called a "sequential" search because you look at each card in the catalogue in sequential order.

    7. Second, you might want to know the names of the books in the library that were written by Thomas Jefferson. Instead of examining every card in the catalogue, you are first guided by the labels on the drawers to the second drawer, the "F-J" drawer. You are then guided by the tabs inside the drawer to the names that start with the letter "J". This is called a "random" search. For any particular card, you can use the labels (or indexes) to go almost directly to the desired card.

    8. Actually locating the Thomas Jefferson card(s) involves both a random and sequential search. We use random access to go directly to the correct drawer and correct tab inside the drawer. The labels (or indexes) allow us to very quickly get close to the card of interest. After locating the "J" tab inside the "F-J" drawer, we then use sequential access to locate the particular Thomas Jefferson card(s) of interest.

    Merits:

    Allows efficient and economical use of sequential processing techniques

    when activity rate is high.

    Permits quick access to records in relatively sufficient way. This activity

    is a small fraction of the total work load.

    Demerits:

    Less efficient in the use of storage space than other organization

    Slow access to records because of using indexes. Relatively expensive

    hardware and software resources are required.

    Application:

    Inventory control where sequential access and also inquiry required.

    Students registration system.

    C- Random File Organization

    Randomizing procedure is characterised by the fact that records are stored in such a

    way that there is no relationship between the keys of the adjacent records. The

    technique provides for converting the record key number to a physical location

    represented by a disk address through a computational procedure.

  • 41 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    Transactions can be processed in any order and written at any location through the

    stored file. The desired records can be directly accessed using randomizing

    procedure without accessing all other records in the file.

    Merits:

    Access to records for inquiry and updating possible immediately.

    Immediate updating of several files as a result of single transaction is possible.

    No need for sorting.

    Demerits:

    Risk to records in the on-line file line, loss of accuracy, breach of security.

    Special backup and reconstruction procedures are established.

    Less efficient in the use of storage space than sequentially organized file.

    Relatively expensive software and hardware resources required.

    Application:

    Any type of inquiry such as

    Railway reservation or Air reservation system.

    o The Best File Organization

    File management involves logical organization of data supplied to a computer in a predetermined

    way. Storing data in a particular place is called a FILE. The file is created using a set of instructions

    called PROGRAM. The data created in the file depends on the following factors:-

    1. Data Dependence

    2. Data Redundancy

    3. Data Integrity

    File Management Software

    It is a software package that helps the users to organize data into files, process them and

    retrieve the information.

    The users can create report, formats, enter data into records, search records, sort them

    and prepare reports.

    They are designed for micro computers and menu- driven allowing end users to create files

    by giving easy to use instructions.

    Following are the criteria in choosing file organisation method:

    1. File Volatility

  • 42 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    (i) File Volatility is the number of additions and deletions to the file in a given period

    of time. E.g. Payroll file of a company where the employee register is constantly

    changing is a highly volatile file, and therefore direct access method is better.

    2. File Activity

    (i) File activity refers to the proportion of records accessed on a run to the no. Of

    records in a file.

    (ii) In case of real time files where each transaction is processed immediately only one

    master record is accessed at a time, direct access method is appropriate.

    (iii) In case where almost every record is accessed for processing sequentially ordered

    file is appropriate.

    3. File Interrogation

    (i) File interrogation refers to the retrieval of information from a file.

    (ii) If the retrieval of individual records must be fast to support a real time

    operation such as Airline reservation then some kind of direct

    organization is required.

    (iii) If on the other hand, requirements for data can be delayed, then all the

    individuals requests of information can be batched and run in a single

    processing run with a sequential file organization.

    4. File Size

    (i) Large files which require many individual references to records with immediate

    response must be organized under direct access method.

    (ii) In case of small files, it is better to search the entire file sequentially or with a more

    efficient binary search, to find an individual record than to maintain complex

    indexes or complex direct addressing schemes.

    Problems of the File Processing Systems:

    i. Data Redundancy: Same data is stored in different files since the data files are

    independent. This result in lot of duplicated data and a separate file maintenance program

    is necessities to update each file.

    ii. Data Dependence: The component of a file processing system depends on one another,

    and therefore changes were made in the format and structure of data in a file. Changes

    have to be made in all programs that use this file.

    iii. Data integrity: The same data is found in different forms in different files. Checking the

    validity of data could not be uniformly implemented with the result that data in one file

  • 43 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    may be correct and in another file wrong. Special computer programs were written to

    retrieve data from such independent files which are time consuming and expensive.

    iv. Data Availability: Since data is scattered in many files, it would be necessary to look into

    many files before relying on a particular data. Due to non- uniformity in the file design, the

    data may have different identification numbers in different files and obtaining the

    necessary data will be difficult.

    v. Management control: Uniform policies and standards cannot be set since the data is

    scattered in different files. It is difficult to relate such files and difficult to implement a

    decision due to non- uniform coding of the data files.

    DATA BASE MANAGEMENT SYSTEMS

    Database

    A Database is a collection of related and ordered information organised in such a way that

    information can be assessed quickly and easily. Hence, an organised logical group of related files

    would constitute a database.

    According to G.M.Scott, - A database is a computer file system that uses a particular file

    organisation to facilitate rapid updating of individual records, simultaneous updating of related

    records, easy access to all records by all application programs and rapid access to all stored data

    which must be bought together for a particular routine report or inquiry or a special purpose

    report or inquiry

    Types of Databases:-

    1. Operational Databases: These databases keep the information needed to support the

    operation of an organization .These are mainly day to day working database e.g.

    customer, employee and inventory database, etc.

    2. Management Databases: These databases keep the selected information and data

    extracted mainly from operational and external database.

    3. Information warehouse Databases: A Data warehouse stores the data of current and

    previous years. It is a central source of data that has been standardized and integrated

    so that it can be used by managers and other end user professionals throughout an

    organization.

    4. Distributed Databases: These are the databases of local work group and department at

    branch offices, manufacture plants and other work sites, regional offices, etc. Main aim

    of these databases is to ensure that organization database is distributed but updated

    concurrently.

  • 44 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    Advantages:

    Local computer on the network offers immediate response to local needs.

    Systems can be expanded in modular fashion as needed.

    Since many small computers are used, the system is not dependent on one

    large that could shut down the network if failed.

    Equipment operating and management costs are often lower.

    Micro computers tends to be less complex than large systems, therefore the

    system is more useful to local users.

    5. End user database: These databases consist of various data files of word, Excel and

    database which end user has generated.

    6. External Databases: These are also known as online databases provided by various data

    banks or organizations at nominal fee.

    7. Test Databases: These are informative databases available normally on CD- Rom disk

    for certain price.

    8. Images databases: These databases contain alpha numeric information. These are

    available either on Internet or in CD at certain price.

    9. Object oriented databases: This is a type of database structure developed to be

    suitable to changing application needs. When integrated database structures were

    developed, the need for OODB was felt. Database with relational qualities that are

    capable of manipulating text, data, objects, images and audio/ video clips are used by

    organisations. With OODB, OOP has been developed. In OOP (object oriented

    programming), every object is described as a set of attributes describing what the

    object is. The behaviour of the object is also included in the program. Objects with

  • 45 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    similar qualities and behaviour can be grouped together. OOP is more useful in

    decision making.

    10. Partitioned Database (Partial Distribution): Some databases are centrally managed and

    some managed in a decentralised manner. This approach is called partitioned

    database. For e.g., financial, marketing, administrative data can be maintained in

    headquarters whereas production data may be maintained in decentralised locations.

    Factors to be addressed in maintaining a database:

    1. Installation of Database:

    Correct installation of the DBMS product.

    Ensuring that adequate file space is available

    Allocate the disc space for database properly.

    Allocation of data files in standard sizes for input out balancing.

    2. Memory usage:

    How are buffers being used?

    How the DBMS uses main memory?

    What the programs in main memory have?

    3. Input/ Output ( I/O) Contention:

    Achieving maximum I/O performance is one of the most important

    aspects if timing. Understanding how the data are accessed by end-

    users is critical to I/O contention.

    Clock speed of CPU requires more time management of I/O.

    Simultaneous or separate use of I/O Devices.

    Spooling, buffering, etc. can be used.

    4. CPU usage:

    Multi programming and multi-processing improves performance in

    query processing.

    Monitoring CPU load.

    Mixture of online/ background processing need to be adjusted.

    Mark jobs that can be processed in run off period to unload the machine

    during peak working hours.

  • 46 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    Components of a Database Environment

    1. Database files: These files have data elements stored in database file organization formats.

    The database is created in such a way so as to balance the data management objective to

    speed, multiple access paths, minimum storage, program data independence and

    preservation of data integrity.

    2. A Database Management System (DBMS): DBMS is a set of system software program that

    manages the database files. Request for access to files, updating of records and retrieval of

    data is done by DBMS. The DBMS has the responsibility for data security, which is vital in a

    database environment since database is accessed by many users.

    3. The users: Users consist of both traditional users and application programmers, who are

    not traditionally considered as users. Users interact with the DBMS indirectly via

    application programs or directly via a simple query language.

    Classification of DBMS Users:

    Nave users who are not aware of the presence of the database system supporting

    the usage.

    Online users who may communicate with database either directly through online

    terminal or indirectly through user interface or application programs. Usually they

    acquire some skill and experience in communicating with the database.

    Application programmers who are responsible for developing the application

    programs and user interfaces.

    DBA who can execute centralized control and is responsible for maintaining the

    database.

    The user interaction with the DBMS includes the definition of the logical relationships in

    the database, input and maintenance of data, changing and deletion and manipulation of

    data.

    4. A host interface system: This is that part of DBMS which communicates with the

    application programs. The host language interface interprets instructions in high level

    language application programs, such as COBOL and BASIC programs that requests data

    from files so that the data needed can be retrieved. During this period the OS interacts

    with the DBMS. Application programs do not contain information about the file, thus the

    program is independent of a database system.

    5. The application programs: These programs perform the same functions as they do in

    conventional system, but they are independent of the data files and use standard data

    definitions. This independence and standardisation make rapid special purpose program

    development easier and faster.

  • 47 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    6. A Natural Language Interface System: The query permits online update and inquiry by

    users who are relatively un -sophisticated about computer systems. This language is often

    termed English- like because instructions of this language are usually in form of a simple

    command in English, which are used to accomplish an enquiry task. Query language also

    permits online programming of simple routines by managers who wish to interact with the

    data. The natural language may also facilitate managers to generate special reports.

    7. The data dictionary: Data dictionary is a centralized depository of information, in a

    computerized form, about the data in database. The data dictionary also contains the

    scheme of the database i.e. the name of each item in the database and a description and

    definition of its attributes along with the names of the programs that use them and who is

    responsible for the data authorization tables that specify users and the data and programs

    authorized for their use. Their descriptions and definitions are referred to as the data

    standards. Maintenance of a data dictionary is the responsibility of the DBA.

    8. Online access and update terminals: These may be adjacent to computer or even

    thousands of miles away. They may be dumb terminals, smart terminals or

    microcomputers.

    9. The output system or report generators: This provides routine job reports, documents and

    special reports. It allows programmers, managers and other users to design output reports

    without writing an application program in a programming language.

    10. File Pointer: It is pointers that is placed in the last field of a record and contains the

    address of another related record thus establish a link between records. It directs the

    computer system to move to that related record.

    11. Linked List: A Linked list is a group of data records arranged in an order, which is based on

    embedded pointers. An embedded pointer is a special data field that links one record to

    another by referring to the other record. The field is embedded in the first record, i.e. It is

    a data element within the record.

    Factors contributing to the Architecture of a Database:

    1. External View

    It is also known as user view.

    As the name suggests, it includes only those application programs which are user

    concerned.

    It is described by users/ programmers by means of external schema.

    2. Conceptual View

    It is also known as global view.

    It represents the entire data base and includes all data base entries

  • 48 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    It is defined by conceptual schema and describes all records, relationships,

    constraints and boundaries.

    3. Internal view

    It is also known as physical view

    It describes the data structure and the access methods

    It is defined by internal schema and indicates how data will be stored

    Out of the above three, External view is USER DEPENDENT and the rest two are

    USER INDEPENDENT.

    Data Independence

    1. In a database an ability to modify a schema definition at one level is done without

    affecting a schema in the next higher level.

    2. It facilitates logical data independence

    3. It assures physical data independence.

    Structure of Database

    The logical organizational approach of the database is called the Database structure. There are

    three basic structures available, viz. Hierarchical, and Relational and Network database

    structure.

    Ext. Schema

    2

    Ext. Schema

    1

    Ext. Schema

    3

    Conceptual Schema

    Physical Schema

  • 49 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    Hierarchical Database Structure

    In this type of architecture records are logically arranged into a hierarchy of

    relationships.

    Records are logically arranged in a tree pattern. Hierarchy structure implements one to

    one and one to many relationships. All records in hierarchy are called nodes.

    Each node is related to other in a parent- child relationship as each parent record may

    have one or more child record but no child record may have more than one parent

    record.

    The top parent record in the hierarchy is called the root.

    Features of Hierarchy Database:-

    i. Hierarchically structured database are less flexible than any other database structure

    because the hierarchy of records must be determined and implemented before a

    search can be conducted, or in other words, the relationships between records are

    relatively fixed by the structure.

    ii. Managerial use of query language to solve the problem may require multiple searches

    and proof which is very time consuming. Thus, analysis and planning activities, which

    frequently involve ad-hoc management queries of the database, may not be supported

    as effectively by a hierarchical DBMS as they are by other database structures.

    iii. Ad-hoc queries made by managers that require different relationships other than that

    are already implemented in the database may be difficult or time consuming to

    accomplish.

    iv. Records are logically structured in inverted tree pattern.

    v. It implements one to one and one to many relationships.

    vi. Each record or node in hierarchy is related to other records in a parent- child

    relationship.

    vii. Child to many parents type logical structure finds difficulty in processing.

    viii. Processing with group records of natural relations can be done faster.

  • 50 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    Relational Database Structure

    An example of such a situation may be the representation of Actors, Movies, and Theatres.

    In order to know who plays what and where, we need the combination of

    These three attributes. However, they each relate to each other cyclically. So to resolve

    this, we would need to establish parent tables with Actor - movie, movie - Theatre, and

    Theatre - Actor. These would each contain a portion of the Primary Key in the Actor,

    Movie, and Theatre table.

    ACTOR MOVIE THEATRE

    Kamalhaasan Manmadhan Ambu Satyam

    Dhanush Aadukalam PVR

    Karthi Siruthai INOX

    Trisha Manmadhan Ambu Satyam

    Tammanna Siruthai PVR

    i. This is a model where more than one data file is compared.

    ii. More than one file is compared at a time with the help of a common key field.

    iii. Each file is converted into a table and the analysis is done on the tables with the

    help of common key field.

    iv. The row of the table represents the list of records and the column represents data

    field.

    v. It is not necessary to maintain the entire file in a single physical location but it can

    be maintained geographically at any place.

    vi. This is more suitable for wider analysis of data from different locations.

    ACTOR

    MOVIE

    MOVIE

    THEATRE

    THEATRE

    ACTOR

    MOVIE

    THEATRE

    ACTOR

  • 51 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    vii. Queries are easily possible because software interacts with different records at the

    same time.

    Network Database Structure

    This structure is more useful when data is transmitted from one place to another

    place that is one-to-one mode, many-to-many model. This type of structure is

    found in organizations where online data processing is carried out.

    DBMS (Language)

    I. Data Definition Language:

    DDL defines the conceptual schema providing a link between the logical and physical

    structures of the database. The logical structure of a database is schema. A subschema

    is the way a specific application views the data from the database.

    Following are the functions of DDL:

    i. They define the physical characteristics of each record, field in the record,

    fields type and length, fields logical name and also specify relationships among

    the records.

    ii. They describe the schema and subschema.

    iii. They indicate the keys of the record

    iv. They provide means for associating related records or fields

    v. They provide for data security measures.

    vi. They provide for logical and physical data independence.

    II. Data manipulation Language

    DML is a Database Language used by database users to retrieve, insert, delete and

    update data in a database.

  • 52 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    Following are the functions of DML:

    They provide the data manipulation techniques like deletion, modification,

    insertion, replacement, retrieval, sorting and display of data or records.

    They facilitate use of relationships between the records

    They enable the user and application program to be independent of the physical

    data structure and database structures maintenance by allowing to process data on

    a logical and symbolic basis rather than on a physical location basis.

    They provide for independence of programming languages by supporting several

    high-level procedural languages like COBOL, C++, etc.

    STRUCTURE OF DBMS

    I. DDL Compiler

    It converts data definition statements into a set of tables.

    Tables contain meta data (data about the data) concerning the database.

    It gives rise to a format that can be used by other components of database.

    II. Data Manager

    It is the central software component

    It is referred to as database control system

    It converts operation in users queries to physical file system.

    III. File manager

    It is responsible for file structure

    It is responsible for managing the space

    It is responsible for acting block containing required record.

    It is responsible for requesting block from disk manager.

    It is responsible for transmitting required record to data manager.

    IV. Disk Manager

    It is a part of the operating system

    It carries out all physical input/output operations.

    It transfers block/page requested by file manager.

  • 53 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    V. Query manager

    It interprets users online query

    It converts to an efficient series of operations.

    In a form it is capable of being sent to data manager.

    It uses data dictionary to find structure of relevant portion of database.

    It uses information to modify query.

    It prepares an optimal plan to access database for efficient data retrieval.

    VI. Data Dictionary

    It maintains information pertaining to structure and usage of data and meta data.

    It is consulted by the database users to learn what each piece of data and various

    synonyms of data field means.

    DATA BASE ADMINISTRATOR

    A DBA is a person who actually creates and maintains the database and also carries out the

    policies developed by the DA. Job of the DBA is a technical one. He is responsible for defining the

    internal layout of the database and also for ensuring that the internal layout optimizes system

    performance, especially in main business processing areas.

    Main functions of a DBA are:-

    1. Determining the physical design of a database and specify the hardware resource

    requirement for the purpose. This can be done by determining the data requirement

    schedule and accuracy requirements, the way and frequency of data access, search

    strategies, physical storage requirements of data, level of security needed and the

    response time requirement.

    2. Define the contents of the database.

    3. Use of data definition language (DDL) to describe formats relationships among various

    data elements and their usage.

    4. Maintain standard and control to the database.

    5. Specify various rules, which must be adhered to while describing data for a database.

    6. Allow only specified users to access the database by using access controls thus prevent

    unauthorised access.

    7. DBA also prepares documentation which includes recording the procedures, standard

    guidelines and data descriptions necessary for the efficient and continuous use of

    database environment.

  • 54 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    8. DBA ensures that the operating staff perform its database processing related

    responsibilities which include loading the database, following maintenance and security

    procedures, taking backups, scheduling the database for use and following, restart and

    recovery procedures after some hardware or software failure in a proper way.

    9. DBA monitors the database environments.

    10. DBA incorporates any enhancements into the database environment, which may include

    new utility program or new system releases.

    Structured Query Language

    SQL is a query language that enables to create relational database which are sets of related

    information stored in tables.

    It is a set of commands for creating, updating and accessing data from database.

    It allows programmers, managers and other users to ask ad-hoc queries of the database

    interactively without the aid of programmers. It is a set of about 30 English like commands

    such as Select..From.where.

    SQL has following features:

    a. Simple English like commands

    b. command syntax is easy

    c. Can be sued by non- programmers.

    d. Can be used for different type of DBMS

    e. Allows user to create, update database.

    f. Allows retrieving data from database without having detailed information about

    structure of the records and without being concerned about the processes the DBMS users

    to retrieve the data.

    g. Has become standard practice for DBMS.

    Since SQL is used in many DBMS, managers who understand SQL are able to use the same set of

    commands regardless of the DBMS software that they may use.

    PROGRAM LIBRARY MANAGEMENT SYSTEM

    Program library management system provides several functional capabilities to facilitate effective

    and efficient management of the data centre software inventory. The inventory may include

    application and system software program code, job control statements that identify resources

    used and processes to be performed and processing parameters which direct processing.

    Some of the capabilities are as follows:

  • 55 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    a. Integrity- each source program is assigned a modification number and version number and

    each source statement is associated with a creation date. Security to program libraries, job

    control language sets and parameters file is provided through the use of passwords,

    encryption, data compression facilities and automatic backup creation.

    b. Update- Library management systems facilitate the addition, deletion, re-sequencing, and

    editing of library members.

    c. Reporting- With use of its facilities a list of additions, deletions and modifications along

    with library catalogue and library member attributes can be prepared for management

    and auditor review.

    d. Interface- Library software packages may interface with the operating system, job

    scheduling, access control system and online program management.

    Need for Documentation:

    It provides a method to understand the various issues related with software

    development.

    It provides means to access details related to system study, system development,

    system testing, system operational details.

    It provides details associated with further modification of software.

    4 types of documentation are required prior to delivery of customized software to

    a customer :

    Strategic and application plans

    Application systems and program documentation

    Systems software and utility program documentation

    Database documentation, Operation manual, User manual, Standard

    manual, Backup manual and others.

    DATA WAREHOUSE

    A Data warehouse is a computer database that collects, integrates and stores an organisations

    data with the aim of producing accurate and timely management information and supporting data

    analysis. It provides tools to satisfy the information needs of employees or all organizational levels

    and not just for complex data queries. It made possible to extract archived operational data and

    overcome inconsistencies between different legacy data formats.

    A Data Mart is a subset of a Data Warehouse. Most organizations do start designing a data mart to

    attend to immediate needs. To keep it simple, consider Data Mart as a data reserve that satisfies

  • 56 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    certain aspect of business or just one application (or a process). Data Warehouse is a super set

    that engulfs all such mini Data marts to form one big reservoir of information.

    Characteristics of Data warehouse

    1. It is subject oriented, means data are organized according to subject instead of

    application. The organized data (according to subject) contains only the information

    necessary for decision support processing.

    2. Encoding of data is often inconsistent when the data resides in many separate

    applications in the operational environment but when data are moved from the

    operational environment into the data warehouse they assume a consistent coding

    convention.

    3. Data warehouse contains a place for storing historical data to be used for comparison,

    trends and forecasting.

    4. Data are not uploaded or changed in anyway once they enter the data warehouse but

    are only loaded and accessed.

    COMPONENTS OF A DATA WAREHOUSE (W.R.T figure)

    Data Sources

    Data sources refer to any electronic repository of information that contains data of interest for management use or analytics. This definition covers mainframe databases (e.g. IBM DB2, ISAM, Adabas, Teradata, etc.),client-server databases (e.g. IBM DB2, Oracle database, Informix, Microsoft SQL Server etc.), PC databases (eg Microsoft Access), spreadsheets (e.g. Microsoft Excel) and any other electronic store of data. Data needs to be passed from these

  • 57 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    systems to the data warehouse either on a transaction-by-transaction basis for real-time data warehouses or on a regular cycle (e.g. daily or weekly) for offline data warehouses.

    Data Transformation

    The Data Transformation layer receives data from the data sources, cleans and standardises it, and loads it into the data repository. This is often called "staging" data as data often passes through a temporary database whilst it is being transformed. This activity of transforming data can be performed either by manually created code or a specific type of software could be used called an ETL tool. Regardless of the nature of the software used, the following types of activities occur during data transformation:

    Comparing data from different systems to improve data quality (e.g. Date of birth for a customer may be blank in one system but contain valid data in a second system. In this instance, the data warehouse would retain the date of birth field from the second system)

    standardising data and codes (e.g. If one system refers to "Male" and "Female", but a second refers to only "M" and "F", these codes sets would need to be standardised)

    integrating data from different systems (e.g. if one system keeps orders and another stores customers, these data elements need to be linked)

    performing other system housekeeping functions such as determining change (or "delta") files to reduce data load times, generating or finding surrogate keys for data etc.

    Data Warehouse

    The data warehouse is a relational database organised to hold information in a structure that best supports reporting and analysis. Most data warehouses hold information for at least 1 year and sometimes can reach half century, depending to the Business/Operations data retention requirement. As a result these databases can become very large.

    Reporting

    The data in the data warehouse must be available to the organisation's staff if the data warehouse is to be useful. There are a very large number of software applications that perform this function, or reporting can be custom-developed. Examples of types of reporting tools include:

    Business intelligence tools: These are software applications that simplify the process of development and production of business reports based on data warehouse data.

    Executive information systems: These are software applications that are used to display complex business metrics and information in a graphical way to allow rapid understanding.

    OLAP Tools: OLAP tools form data into logical multi-dimensional structures and allow users to select which dimensions to view data by.

    Data Mining: Data mining tools are software that allows users to perform detailed mathematical and statistical calculations on detailed data warehouse data to detect trends, identify patterns and analyse data.

  • 58 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    Metadata

    Metadata, or "data about data", is used to inform operators and users of the data warehouse about its status and the information held within the data warehouse. Examples of data warehouse metadata include the most recent data load date, the business meaning of a data item and the number of users that are logged in currently.

    Operations

    Data warehouse operations comprises of the processes of loading, manipulating and extracting data from the data warehouse. Operations also cover user management, security, capacity management and related functions

    Optional Components

    In addition, the following components also exist in some data warehouses:

    1. Dependent Data Marts: A dependent data mart is a physical database (either on the same hardware as the data warehouse or on a separate hardware platform) that receives all its information from the data warehouse. The purpose of a Data Mart is to provide a sub-set of the data warehouse's data for a specific purpose or to a specific sub-group of the organisation.

    2. Logical Data Marts: A logical data mart is a filtered view of the main data warehouse but does not physically exist as a separate data copy. This approach to data marts delivers the same benefits but has the additional advantages of not requiring additional (costly) disk space and it is always as current with data as the main data warehouse.

    3. Operational Data Store: An ODS is an integrated database of operational data. Its sources include legacy systems and it contains current or near term data. An ODS may contain 30 to 60 days of information, while a data warehouse typically contains years of data. ODS's are used in some data warehouse architectures to provide near real time reporting capability in the event that the Data Warehouse's loading time or architecture prevents it being able to provide near real time reporting capability.

    Different methods of storing data in a data warehouse

    All data warehouses store their data grouped together by subject areas that reflect the general usage of the data (Customer, Product, Finance etc.). The general principle used in the majority of data warehouses is that data is stored at its most elemental level for use in reporting and information analysis.

    Within this generic intent, there are two primary approaches to organising the data in a data warehouse.

    The first is using a "dimensional" approach. In this style, information is stored as "facts" which are numeric or text data that capture specific data about a single transaction or event, and "dimensions" which contain reference information that allows each transaction or event to be classified in various ways. As an example, a sales transaction would be broken up into facts such as the number of products ordered, and the price paid, and dimensions such as date, customer, product, geographical location and sales person. The main advantages of a dimensional approach are that the Data Warehouse is easy for business staff with limited information technology

  • 59 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    experience to understand and use. Also, because the data is pre-processed into the dimensional form, the Data Warehouse tends to operate very quickly. The main disadvantage of the dimensional approach is that it is quite difficult to add or change later if the company changes the way in which it does business.

    The second approach uses database normalisation. In this style, the data in the data warehouse is stored in third normal form. The main advantage of this approach is that it is quite straightforward to add new information into the database, whilst the primary disadvantage of this approach is that it can be quite slow to produce information and reports.

    The Advantages of using a Data Warehouse are:

    1. Enhanced and user access to a wide variety of data.

    2. Increased Data consistency

    3. Increased productivity and decreased computational cost.

    4. It is able to combine data from different sources, in one place.

    5. It provides an infrastructure that could support change to data and replication of the

    changed data back into the operational systems.

    Concerns in using data warehouse

    Extracting, cleaning and loading data could be time consuming. Data warehousing project scope might increase. Problems with compatibility with systems already in place e.g. transaction processing

    system. Providing training to end-users, who end up not using the data warehouse. Security could develop into a serious issue, especially if the data warehouse is web

    accessible.

    Types of Data Warehouses

    With improvements in technology, as well as innovations in using data warehousing techniques, data warehouses have changed from Offline Operational Databases to include an Online Integrated data warehouse.

    Offline Operational Data Warehouses are data warehouses where data is usually copied and pasted from real time data networks into an offline system where it can be used. It is usually the simplest and less technical type of data warehouse.

    Offline Data Warehouses are data warehouses that are updated frequently, daily, weekly or monthly and that data is then stored in an integrated structure, where others can access it and perform reporting.

    Real Time Data Warehouses are data warehouses where it is updated each moment with the influx of new data. For instance, a Real Time Data Warehouse might incorporate data from a Point of Sales system and is updated with each sale that is made.

  • 60 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    Integrated Data Warehouses are data warehouses that can be used for other systems to access them for operational systems. Some Integrated Data Warehouses are used by other data warehouses, allowing them to access them to process reports, as well as look up current data.

    BACKUP AND RECOVERY

    Recovery is a sequence of tasks performed to restore a database to some point-in-time.

    'Disaster recovery' differs from a database recovery scenario because the operating system

    and all related software must be recovered before any database recovery can begin.

    Database files that make up a database: Databases consist of disk files that store Data.

    When you create a database either using any database software command-line utility, a

    main database file or root file is created. This main database file contains database tables,

    system tables, and indexes. Additional database files expand the size of the database and

    are called dbspaces.

    A dbspace contains tables and indexes, but not system tables.

    A transaction log is a file that records database modifications. Database modifications

    consist of inserts, updates, deletes, commits, rollbacks, and database schema changes. A

    transaction log is not required but is recommended. The database engine uses a

    transaction log to apply any changes made between the most recent checkpoint and the

    system failure. The checkpoint ensures that all committed transactions are written to disk.

    During recovery the database engine must find the log file at specified location. When the

    transaction log file is not specifically identified then the database engine presumes that

    the log file is in the same directory as the database file.

    A mirror log is an optional file and has a file extension of .mlg. It is a copy of a transaction

    log and provides additional protection against the loss of data in the event the transaction

    log becomes unusable.

    Online backup, offline backup, and live backup: Database backups can be performed while

    the database is being actively accessed (online) or when the database is shutdown (offline)

    When a database goes through a normal shutdown process (the process is not being

    cancelled) the database engine commits the data to the database files An online database

    backup is performed by executing the command-line or from the 'Backup Database' utility.

    When an online backup process begins the database engine externalizes all cached data

    pages kept in memory to the database file(s) on disk. This process is called a checkpoint.

    The database engine continues recording activity in the transaction log file while the

    database is being backed up. The log file is backed up after the backup utility finishes

    backing up the database. The log file contains all of the transactions recorded since the last

    database backup. For this reason the log file from an online full backup must be 'applied'

    to the database during recovery. The log file from an offline backup does not have to

    participate in recovery but it may be used in recovery if a prior database backup is used.

  • 61 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    A Live backup is carried out by using the backup utility with the command-line option. A

    live backup provides a redundant copy of the transaction log for restart of your system on

    a secondary machine in the event the primary database server machine becomes

    unusable.

    Full and Incremental database backup: Full backup is the starting point for all other types

    of backup and contains all the data in the folders and files that are selected to be backed

    up. Because full backup stores all files and folders, frequent full backups result in faster

    and simpler restore operations.

    Incremental backup stores all files that have changed since the last FULL, DIFFERENTIAL

    OR INCREMENTAL backup. The advantage of an incremental backup is that it takes the

    least time to complete.

    For example, you're running a backup on Friday: this first backup always would be a full backup by default. Then, upon your working with theses files on Monday, Leo Backup performs the incremental backup: this backup will transfer only those files that changed since Friday. A Tuesday backup will carry only those files that changed since Monday. And the same course for the following days.

    Core phases in developing a backup and recovery strategy

    1. Create backup and recovery commands: The commands should be verified with the actual

    results produced to ensure that desired results are produced.

    2. Time estimates from executing backup and recovery commands help to get a feel for how

    long will these tasks take. This information helps in identifying what command will be

    executed and when.

  • 62 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    3. Document the backup commands and create procedures outlining backups which are kept

    in a file. Also identify the naming convention used as well as the king of backups

    performed.

    4. Incorporate health checks into the backup procedures to ensure that the database is not

    corrupt. Database health check can be performed prior to backing up a database or on a

    copy of the database from the back up.

    5. Deployment of backup and recovery consists of setting up backup procedures on the

    production server. Verification of the necessary hardware in place and any other

    supporting software required to perform these tasks must be done. Modify procedures to

    reflect the change in development.

    6. Monitor backup procedures to avoid unexpected errors. Make sure that any changes in

    the process are reflected in the documentation.

    Data Centre and the challenges faced by the management of a data

    centre:

    i. A Data centre is a centralized repository for the storage, management and dissemination

    of data and information.

    ii. Data centre is a facility used for housing a large amount of electronic equipment, typically

    computers and communication equipment.

    iii. The purpose of a data centre is to provide space and bandwidth connectivity for server in a

    reliable, secure and scalable environment.

    iv. It also provides facilities like housing websites, providing data serving and other services

    for companies. Such type of data centre may contain a network operation s centre (NOC)

    which is restricted access area containing automated system that constantly monitor

    server activity, web traffic, network performance and report even slight irregularities to

    engineers so that they can stop potential problems before they occur.

    Challenges:

    Maintaining Infrastructure A Data centre needs to set up an infrastructure comprising of

    a member of electronic equipment, typically computers and band width connectivity for

    server in a reliable secure and saleable environment.

    Skilled Human Resources a Data centre needs skilled staff expert at network management

    having software and hardware operating skill.

    Selection of Technology- A Data centre also faces the challenge of proper selection of

    technology crucial to the operation of the data centre.

    Maintaining system performance A Data centre has to maintain maximum uptime and

    system performance, while establishing sufficient redundancy and maintaining security.

  • 63 SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

    DATA MINING

    Data mining is the extraction of implicit, previously unknown and potentially useful information

    from data. It searches for relationship and global patterns that exist in large databases but are

    hidden among the vast amount of data. These relationships represent valuable knowledge about

    database and objects in the database that can be put to use in the areas such as decision support,

    prediction, forecasting and estimation.

    In other words, data mining is concerned with the analysis of data and the use of software

    techniques used for finding patterns and regularities in sets of data. It is the computer responsible

    for finding the patterns by identifying the underlying rules and features in the data.

    Stages in data mining

    1. Selection: Selecting or segmenting the data according to some criteria so that sub sets of

    the data can be determined.

    2. Pre- processing: This is the data cleansing stage where certain information is removed

    which is deemed unnecessary and may slow down queries. Also the data is re-configured

    to ensure a consistent format as there is a possibility of inconsistent formats because the

    data is drawn from several sources.

    3. Transformation: The data is not merely transferred across but transformed in that overlays

    may be added. For example, Demographic overlays are commonly used in market

    research. The data is made usable and navigable.

    4. Data mining: This stage is concerned with the extraction of patterns from the data. A

    pattern can be defined as a given set of facts. One popular example of data mining is using

    past behaviour to rank customers. Such tactics have been employed by financial

    companies for years as a means of deciding whether or not to approve loans and credit

    cards.

    5. Integration and Evaluation: The patterns identified by the systems are interpreted into

    knowledge which can then be used to support human decision making. For example,

    prediction and classification tasks, summarising the contents of a database or explaining

    observed phenomenon.