Top Banner

of 15

dbms mod 1.2

Apr 09, 2018

Download

Documents

Jen Cruz
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/8/2019 dbms mod 1.2

    1/15

    Course Plan DBMS

    Introduction

    Database and database systems have become an essential component of everyday life in modern society.

    In the course of a day, most of us encounter several activities that involve some interaction with a

    database. You have heard about the name database in various areas such as institution, library, shops etc.

    The database has its own critical role in all area. So let us see what a database is. A database is a

    collection of related data. For example, consider a bank. The bank has a number of customers and they

    should know each of the customer details such as their name, account number, type of account and so on.

    These all data that we want to keep or in other words need for future use are known by the name

    database. But it should have some implicit properties

    It should represent some aspect of the real world, called universe of discourse.

    It is a logical collection of data.

    It is designed, built and populated with data for a specific purpose.

    E.g.: Student Database

    Name Roll No Branch % of marks Grade

    What is DBMS?

    DBMS is the acronym of Data Base Management System. DBMS is a collection of interrelated data and

    a set of programs to access this data in a convenient and efficient way. It controls the organization,

    storage, retrieval, security and integrity of data in a database. In other words, it is a collection of

    programs that enables users to create and maintain a database. It is a general purpose software that

    facilitates the processes of defining, constructing, manipulating and sharing of database among various

    users and applications.

    Defining a database means specifying the data types, structures and constraints for the data to be stored

    in the database. Constructing the database is the process of storing the data itself on some storagemedium that is controlled by the DBMS. Manipulating the database means processing the database. It

    includes functions such as querying the database to retrieve specific data, updating the database to reflect

    changes in the mini world and generating reports from the data. Sharing a database allows multiple users

    and programs to access the database concurrently.

    RT503 1 Prepared by Lakshmi K.S

  • 8/8/2019 dbms mod 1.2

    2/15

    Course Plan DBMS

    File system Verses Database approach

    One way to store information in a computer system is to store it as in traditional file system. In this

    method each data is stored in different files and there should be an application program for each of the

    application.

    The disadvantages of file system are:

    Data redundancy and inconsistency

    In traditional file systems the data may be duplicated. For e.g.: Consider a bank having two accounts

    savings bank account and credit check account. In this case, the address of customer is stored in two files:

    one with SB account and other with checking record. Thus this duplication will result in need of high

    storage space. This will also lead to inconsistency. That is, if the address of a customer changes, then the

    change may be reflected only in one account.

    Difficulty in accessing information

    Suppose the bank needs a list of customers with an account higher than Rs.10, 000. But, we do not

    have an application at hand to list out this request. Thus, to access this information we have two choices.

    First one is that list out the SB account customers and then extracts the needed list manually. In the

    second option, we have to develop a new program to satisfy the new request. Both are difficult.

    Data Isolation

    Data are scattered in different files and files may be in various formats. So it is difficult to extract the

    appropriate data.

    Integrity problems

    The constraint of data is enforced through the programs by appropriate code. So if we need to add a

    new constraint, we have to change the code. Then, it is very difficult to add or change the constraints.

    The problem will be compounded when constraints involves several constraints from different files.

    Atomicity problems

    Suppose a failure occurs during execution of the program. Then the execution stops in the middle ofthe program resulting in an inconsistency. But the execution of a program should end to a consistency

    state. For a traditional file system the failure mostly result to an inconsistency state.

    RT503 2 Prepared by Lakshmi K.S

  • 8/8/2019 dbms mod 1.2

    3/15

    Course Plan DBMS

    Traditional File System

    Database System

    Features (characteristics) of DBMS

    Database system is

    1. Self describing:

    The database system contains not only the database itself but also a complete definition or descriptionand structure of database. This structure is stored in a catalog with type, storage format and constraints.

    The information stored in database is called meta-data.

    2. Data Security

    The DBMS can prevent unauthorized users from viewing or updating the database. Using passwords,

    users are allowed access to the entire database or a subset of it known as a "subschema." For example, in

    a student database, some users may be able to view payment details while others may view only mark list

    of students.

    3. Data Integrity

    The DBMS can ensure that no more than one user can update the same record at the same time. It

    can keep duplicate records out of the database; for example, no two customers with the same

    customer number can be entered.

    RT503 3 Prepared by Lakshmi K.S

    Application

    Program1+Data ApplicationProgram2+Data ApplicationProgram3+Data

    Application

    Program1Application

    Program3

    Application

    Program2

    Database

  • 8/8/2019 dbms mod 1.2

    4/15

    Course Plan DBMS

    4. Interactive Query

    Most DBMSs provide query languages and report writers that let users interactively interrogate

    the database and analyze its data. This important feature gives users access to all management

    information as needed. I.e. we will get easily all details of each student at any time.

    5. Interactive Data Entry and Updating

    Many DBMS provide a way to interactively enter and edit data, allowing you to manage your own files

    and databases. However, an interactive operation does not leave an audit trail and does not provide the

    controls necessary in a large organization. These controls must be programmed into the data entry and

    update programs of the application.

    6. Data Independence

    With DBMSs, the details of the data structure are not stated in each application program. The program

    asks the DBMS for data by field name. Without a DBMS, the programmer must reserve space for the full

    structure of the record in the program. Any change in data structure requires changing all application

    programs.

    DBMS Components

    Data:

    Data stored in a database include numerical data which may be integers (whole numbers only) or floating

    point numbers (decimal), and non-numerical data such as characters (alphabetic and numeric characters),

    date or logical (true or false). More advanced systems may include more complicated data entities such aspictures and images as data types.

    Standard operations:

    Standard operations are provided by most DBMS. These operations provide the user basic capabilities for

    data manipulation. Examples of these standard operations are sorting, deleting and selecting records.

    Data definition language (DDL):

    DDL is the language used to describe the contents of the database. It is used to describe, for example,

    attribute names (field names), data types, location in the database, etc.

    Data manipulation and query language:

    Normally a query language is supported by a DBMS to form commands for input, edit, analysis, output,

    reformatting, etc. Some degree of standardization has been achieved with SQL (Structured Query

    Language).

    RT503 4 Prepared by Lakshmi K.S

  • 8/8/2019 dbms mod 1.2

    5/15

    Course Plan DBMS

    Programming tools:

    Besides commands and queries, the database should be accessible directly from application programs

    through function calls (subroutine calls) in conventional programming languages.

    File structures:

    Every DBMS has its own internal structures used to organize the data although some common data

    models are used by most DBMS.

    Abstraction

    We all know that each application program have some data relevant to a particular task.

    And an application program needs to use a portion of data, which is used by some other programs. In

    early days of computerization, each application programmer designs the file structure, metadata of the

    file and the access method each record. That is, each application program use its own data, details

    concerning the structure of data as well as the access and to interpret each data. The application programs

    are implemented independently and by hence itself, any change in storage media requires changes to

    these structures and access methods. Because the files were structured for one application, it was difficult

    to use the data in these files to new applications requiring data from several files belonging to different

    existing applications. E.g.: Consider two application programs that require the data on an entity set

    EMPLOYEE. The first application program involves the public relation department sending each

    employee a newsletter and related material. This application program is interested in the record typeEMPLOYEE, that containing the values for the attributes of EMPL_Name and EMPL_Address.

    Architecture of DBMS

    The generalized architecture of DBMS is called ANSI/SPARC model. The architecture is divided into

    three levels: External level, Conceptual level and Internal level.

    A schema describes the view at each of this level. Schema describes the records and its relationships in

    the view.

    a. External view or User view

    It is the highest level of data abstraction. This includes only those portions of database of concern to a

    user or Application program. Each user has a different external view and it is described by means of a

    scheme called external schema. The schema contains the definition of the logical records and

    relationships in external view. It also contains the method of deriving the objects in the external view

    from the objects in the conceptual view.

    RT503 5 Prepared by Lakshmi K.S

  • 8/8/2019 dbms mod 1.2

    6/15

    Course Plan DBMS

    b. Conceptual view

    At this level of database abstraction, all the database entities and the relationship among them are

    included. One conceptual view represents the entire database called conceptual schema. It describes the

    method of deriving the objects in the conceptual view from the objects in the internal view. And also

    specify the checks to retain the data consistency and integrating.

    c. Internal view

    It is the lowest level of abstraction, closest to the physical storage method. It describes how the data is

    stored, what is the structure of data storage and the method of accessing these data. It is represented by

    internal schema.

    View Level Defined by User

    ..

    ...

    Defined by DBA

    Defined by DBA for optimization

    Data independence

    Data independence of DBMS is the capacity to change the schema at one level of database system

    without having to change the next high levels. The three-schema architecture can be used to achieve this

    data independence. We can define data independence into two types:

    1. Logical data independence

    It is the capacity to change the conceptual schema without having to change the external schema.

    Sometimes, we may need to change the conceptual schema to expand the database, to change the

    constraints, or to reduce the database. Only the view definitions and mappings need to be changed in

    DBMS that supporting logical data independence. Application programmer cannot feel any change in the

    schema construct of DBMS.

    RT503 6 Prepared by Lakshmi K.S

    View 2 View 4View 1 View 3

    Physical Level

    Logical Level

  • 8/8/2019 dbms mod 1.2

    7/15

    Course Plan DBMS

    2. Physical data independence

    Physical data independence is the capacity to change the internal schema without having to change the

    conceptual schema and external schema. The internal schema may change to improve the performance of

    retrieval or update. Then the conceptual schema need not change if the data remains same. For e.g.: We

    need not change the Query to retrieve a student progress report even though the DBMS take a new

    method to store the student record.

    Advantages

    1. Controlling RedundancyIn traditional file processing, every user group maintains its own files. Each group independently keeps

    files on their db e.g., students. Therefore, much of the data is stored twice or more. And the redundancy

    leads to several problems:

    a. Duplication of effort - i.e. storage space wasted when the same data is stored repeatedly

    b. Files that represent the same data may become inconsistent (since the updates are applied

    independently by each users group).

    2. Restricting Unauthorized AccessA DBMS should provide a security and authorization subsystem. Some db users will not be authorized to

    access all information in the db (e.g., financial data). Some users are allowed only to retrieve data. Someusers are allowed both to retrieve and to update database.

    3. Providing Persistent Storage for Program Objects and Data Structures

    Data structure provided by DBMS must be compatible with the programming languages data structures.

    E.g., Object oriented DBMS are compatible with programming languages such as C++, SMALLTALK,

    and the DBMS software automatically performs conversions between programming data structure and

    file formats.

    4. Permitting Inference and Actions Using Deduction Rules

    Deductive database systems provide capabilities for defining deduction rules to inference new

    information from the stored database facts.

    5. Providing Multiple User Interfaces

    (e.g., query languages, programming languages interfaces, forms, menu-driven interfaces, etc.)

    RT503 7 Prepared by Lakshmi K.S

  • 8/8/2019 dbms mod 1.2

    8/15

    Course Plan DBMS

    6. Representing Complex Relationships Between Data - The complex relationship between data is

    easily represented.

    7. Enforce Integrity Constraints - The integrity constraint for information is reasonably enforced by the

    database management system.

    DBMS Disadvantages

    A database system generally provides on-line access to the database for many users. In contrast, a

    conventional system is often designed to meet a specific need and therefore generally provides access to

    only a small number of users. Because of the larger number of users accessing the data when a database

    is used, the enterprise may involve additional risks as compared to a conventional data processing system

    in the following areas.

    1. Confidentiality, Privacy and Security - When information is centralized and is made available to

    users from remote locations, the possibilities of abuse are often more than in a conventional system. To

    reduce the chances of unauthorized users accessing sensitive information, it is necessary to take

    technical, administrative and, possibly, legal measures. Most databases store valuable information that

    must be protected from deliberate attack and destruction.

    2. Data Quality - Since the database is accessible to users remotely, adequate controls are needed to

    control users updating data and to control data quality. With increased number of users accessing data

    directly, there are enormous opportunities for users to damage the data. Unless there are suitable

    controls, the data quality may be compromised.

    3. Data Integrity - Since a large number of users could be using a database concurrently, we should

    have to ensure that data remain correct during operation. The main threat to data integrity comes from

    several different users attempting to update the same data at the same time. The database therefore

    needs to be protected against accidental changes by the users.

    4. Enterprise Vulnerability - Centralizing all data of an enterprise in one database may mean that the

    database becomes critical resource. The survival of the enterprise may depend on reliable information

    being available from its database. The enterprise therefore becomes vulnerable to the destruction of the

    database or to unauthorized modification of the database.

    RT503 8 Prepared by Lakshmi K.S

  • 8/8/2019 dbms mod 1.2

    9/15

    Course Plan DBMS

    5. The Cost of using a DBMS - Conventional data processing systems are typically designed to run a

    number of well-defined preplanned processes. Such systems are often "tuned" to run efficiently for the

    processes that they were designed for. Although the conventional systems are usually fairly inflexible inthat new applications may be difficult to implement and/or expensive to run, they are usually very

    efficient for the applications they are designed for. The database approach on the other hand provides a

    flexible alternative where new applications can be developed relatively inexpensively. The flexible

    approach is not without its costs and one of these costs is the additional cost of running applications that

    the conventional system was designed for. Using standardized software is almost always less machine

    efficient than specialized software.

    Entities and Attributes - Entities are distinguishable objects of concern and are modeled using their

    characteristics or attributes. A database usually contains large number of similar entities. For eg: A

    company database consists of a large number of employees may want to store similar information for

    each employee. Then each of the employees can be termed as an entity. An entity can be an object with

    physical existence. For e.g.: a car, a person or an employee. But each entity will have its own value.

    Each entity has properties that describe the entity called attribute of that entity. Collection of entities with

    same attributes termed as an entity type.

    For e.g.: Employee (Employee_id, Address, Designation, Salary)

    Here Employee is an entity and Employee_id, Address, Designation, Salary represents the attribute of

    entity Employee. There can be several types of attributes such as Simple versus composite, single-

    valued verses multi-valued and stored verses derived.

    1. Composite versus Simple - Composite attributes are those attributes that can be divided into smaller

    sub parts with independent meaning. Consider the above e.g.: in which the attribute Address can be

    divided into small sub parts like City, State and Street_address. The attributes that are not divisible are

    called simple or atomic attributes. The value of a composite attribute is the concatenation of the value of

    its constituent simple attributes.

    2. Single-valued verses multi-valued - Most of the attributes will have only single value for a particular

    entity. Such attributes are called single valued. In some cases there may be having more than one value

    for an attribute of a particular entity. These attributes are called multi-valued. The attribute age of an

    RT503 9 Prepared by Lakshmi K.S

  • 8/8/2019 dbms mod 1.2

    10/15

    Course Plan DBMS

    entity person will have only one value, while the college degree of that person will have more than one

    degree. So the attribute age can be consider as single-valued and college degree as multi-valued.

    3. Stored verses derived - In some case the attribute values can be related so that one can be

    derived from the other. Consider a person as an entity. The attributes age and DateOfBirth of

    person is related. i.e. the age of a person can be derived from the current date and his

    DateOfBirth. The age attribute hence is called Derived attribute and the DateOfBirth is called

    stored attribute from where age of person calculated.

    Database Design - Database design is the process of organizing the data into related record types.

    The DBMS is chosen so that it can support the organization and can process the transaction for the

    organization.

    Data model - A data model is a plan for building a database. The model represents data conceptually, the

    way the user sees it, rather than how computers store it. Data models focus on required data elements and

    associations; most often they are expressed graphically using

    Entity-relationship diagrams. On a more abstract level, the term is also used in describing a database's

    overall structure. Mostly used data modeling techniques are

    1. Entity- Relational model

    2. Hierarchical model3. Network model

    4. Object-oriented model

    Hierarchical Model

    The hierarchical data model organizes data in a tree structure. There is a hierarchy ofparent and child data segments. This structure implies that a record can have repeating information,

    generally in the child data segments. Data in a series of records have a set of field values attached to it. It

    collects all the instances of a specific record together as a record type. These record types are the

    equivalent of tables in the relational model, and with the individual records being the equivalent of rows.

    To create links between these record types, the hierarchical model uses Parent Child Relationships

    Hierarchical databases link records like an organization chart. A record type can be ownedby only one owner. In the following example, orders are owned by only one customer. Hierarchical

    RT503 10 Prepared by Lakshmi

    K.S

  • 8/8/2019 dbms mod 1.2

    11/15

    Course Plan DBMS

    structures were widely used with early mainframe systems; however, they are often restrictive in linking

    real-world structures.

    Advantages:

    Hierarchical Model is simple to construct and operate on

    Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in

    manufacturing, personnel organization in companies

    Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN

    PARENT etc.

    Disadvantages:

    Navigational and procedural nature of processing

    Database is visualized as a linear arrangement of records

    Little scope for "query optimization"

    Network Model

    In 1971, the Conference on Data Systems Languages (CODASYL) formally defined the network model.

    The basic data-modeling construct in the network model is the set construct. A set consists of an owner

    record type, a set name, and a member record type. A member record type can have that role in more than

    one set, hence the multiparent concept is supported. An owner record type can also be a member or

    RT503 11 Prepared by Lakshmi

    K.S

    Customer

    Order

  • 8/8/2019 dbms mod 1.2

    12/15

    Course Plan DBMS

    owner in another set. In network databases, a record type can have multiple owners. In the example

    below, orders are owned by both customers and products, reflecting their natural relationship in business.

    Advantages:

    Network Model is able to model complex relationships and represents semantics of add/delete on the

    relationships.

    Can handle most situations for modeling using record types and relationship types.

    Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within

    set, GET etc. Programmers can do optimal navigation through the database.

    Disadvantages:

    Navigational and procedural nature of processing

    Database contains a complex array of pointers that thread through a set of records.

    Little scope for automated "query optimization"

    Object-Oriented Model

    Object DBMSs add database functionality to object programming languages. They bring much more than

    persistent storage of programming language objects. Object DBMSs extend the semantics of the C++,

    Smalltalk and Java object programming languages to provide full-featured database programming

    capability, while retaining native language compatibility. A major benefit of this approach is the

    RT503 12 Prepared by Lakshmi

    K.S

    Customer

    Order

    Product

  • 8/8/2019 dbms mod 1.2

    13/15

    Course Plan DBMS

    unification of the application and database development into a seamless data model and language

    environment. As a result, applications require less code, use more natural data modeling, and code bases

    are easier to maintain. Object developers can write complete database applications with a modest

    Objects

    Entity set

    An entity set is a set of entities of the same type that share the same properties, or

    attributes. It is represented by a set of attributes. An attribute, as used in the E-R model can be

    characterized by the following attributes.

    Simple and composite attributes

    Single and multi-valued attributes

    Null attributes

    Derived attributes

    A relationship is an association among several entities. And a relationship set is a set of relationships of

    the same type.

    Keys

    Before designing a database we should be able to specify how entities within a given entity set and

    relationships within a given relationship set are distinguished. Conceptually the individual entities and

    relationships are distinct; but from a database perspective, the difference must be expressed by their

    attributes. The concept of key is used to make such distinctions.

    Super key is a set of attributes that, taken collectively, to identify uniquely an entity in the entity set.

    For e.g.: the social_security_no attribute of the entity set employee is sufficient to distinguish one

    employee entity from another. Thus social_security_no is a superkey for the entity set employee.

    RT503 13 Prepared by Lakshmi

    K.S

    Ord

  • 8/8/2019 dbms mod 1.2

    14/15

    Course Plan DBMS

    Superkeys with minimal subset is known as the candidate key. For eg: it is possible to combine the

    attributes, employ_id & employ_name form a superkey. But the social_security_no is sufficient to

    distinguish the two employees. Thussocial_security_no is a candidate key. Usually primary key is used

    to denote the candidate key that is chosen by the database designer to identify an entity from an entity set.

    A key (super, candidate and primary) is a property of the entity set rather than the individual entities.

    Entity- Relationship (E-R) Diagram

    The overall logical structure of a database can be expressed graphically by an E-R diagram. The diagram

    consists of the following major components.

    Rectangles: represent entity set.

    Ellipses: represent attributes.

    Diamonds: represents relationship sets.

    Lines: links attribute set to entity set and entity set to relationship set.

    Double ellipses: represent multi-valued attributes.

    Dashed ellipses: denote derived attributes.

    For e.g.: Consider an E-R diagram, which consists of two-entity sets customer and loan.

    Entity relational model (RDBMS - relational database management system)

    A database based on the relational model developed by E.F. Codd. A relational database allows the

    definition of data structures, storage and retrieval operations and integrity constraints. In such a database

    RT503 14 Prepared by Lakshmi

    K.S

    Emp_i

    d

    Employee

    LocationProduct

    Salary

    Designatio

    n

    Addr

    Wor

    ks

    For

    Company

  • 8/8/2019 dbms mod 1.2

    15/15

    Course Plan DBMS

    the data and relations between them are organized in tables. A table is a collection of records and each

    record in a table contains the same fields.

    It permits the database designer to create a consistent, logical representation of

    information. Consistency is achieved by including declared constraints in the database design, which is

    usually referred to as the logical schema. The theory includes a process of database normalization

    whereby a design with certain desirable properties can be selected from a set of logically equivalent

    alternatives. The access plans and other implementation and operation details are handled by the DBMS

    engine, and are not reflected in the logical model. This contrasts with common practice for SQL DBMSs

    in which performance tuning often requires changes to the logical model.

    The basic relational building block is the domain or data type, usually abbreviated

    nowadays to type. A tuple is an unordered set of attribute values. An attribute is an ordered pair of

    attribute name and type name. An attribute value is a specific valid value for the type of the attribute.

    This can be either a scalar value or a more complex type. Relational databases do not link records

    together physically, but the design of the records must provide a common field, such as account number,

    to allow for matching. Often, the fields used for matching are indexed in order to speed up the process.

    In the following example, customers, orders and products are linked by comparing

    data fields and/or indexes when information from more than one record type is needed. This method is

    more flexible for ad hoc inquiries. Many hierarchical and network DBMSs also provide this capability.

    Relational model

    RT503 15 Prepared by Lakshmi

    K.S

    Customer Order ProductCustomer Order