DBMS

1

Definition of Database

Data are facts that can be recorded and have implicit meaning. Data refers to values

such as names, telephone, addresses that can be easily stored inside diary, PC or floppy. Data

is actually stored in the database and information refers to the meaning of that data as

understood by user.

The database is collection of related data. A database has the following implicit

properties.

i. A database represents some aspect of the real world, sometimes called the miniworld or

the Universe of Discourse (UoD). Changes to the miniworld are reflected in the

database.

ii. A database is a logically coherent collection of data with some inherent meaning.

iii. A database is designed built and populated with data for a specific purpose. It has an

intended group of users and some applications.

Database can be of any size. Example for Sources of databases is patients in hospital,

bank, university, government department etc.

Definition of DBMS

DBMS means database Management System. It is a collection of programs that enables

users to create and maintain database as well as enables to store, modify and extract

information from the database. DBMS is software for defining, constructing and

manipulating databases. It is also called database manager or database server. Example of

DBMS are Ms. Access, oracle, MYSQL, Ms. SQL server etc.

Thus the goal of DBMS is to provide an environment that is both convenient and

efficient to use in retrieving and storing database information. In DBMS, user issue request

for information then DBMS analyzes and some internal processing takes place and then the

result is sent back to the user.

Definition of Database System

Database system is computerized record keeping system. E.g. computerized library

system, flight reservation system, automated teller machine etc. Database and the DBMS

software collectively known as database system.

The following operations take place in the database system.

2

i. Adding new / empty files to database.

ii. Inserting, retrieving, updating, deleting data from existing database.

iii. Removing existing files from database.

dbms

database

application programs

end users

Fig. simplified picture of a database system

Software to access stored data

Software to process queries/programs

Application programs/queries

users/programmers

Stored database definition

Stored database

DBMS

Fig. A simplified database system environment

Advantages of database system over paper based methods of record keeping are (i)

compactness (ii) speed and (iii) accuracy

Characteristics of Database Approach

There are a no. of characteristics which distinguish the database from the traditional

approach of programming with files. In the traditional approach of programming with files,

3

many users may be using the same data such as student name separately. Thus data is

duplicated and leads to wastage of storage space.

Main characteristics of database approach versus the file processing approach are as

follows

i) Self describing nature of a database system

The definition or description of the database is stored in the system catalog separately

and thus are available to users.

Stored database definition

Stored database

DBMS

The system catalog stores structure and details of database only and no other data. thus

the system catalog inside dbms describes database itself.

ii) Insulation between programs and data, and data abstraction

In traditional file processing, the changes the structure of data file may require

changing all programs that access this file but the DBMS changes catalog information only.

Thus both the program and data are independent and also called program data independence.

Data abstraction : DBMS provides user with a conceptual representation of data that

does not include many of details of how the data is stored or how the operations are being

implemented. Suppose the example of car. People don't think of a car as set of tens of

thousands of individual parts. They think of it as a well defined object with its own behavior.

Similarly data abstraction hides the complexity. Data model is a type of data abstraction.

iii) Support of multiple views of data

A database typically has many users, each of whom may require a different perspective

or view of the database. A view may be portion or subset of the database. It is also called

virtual table as it may contain virtual data. Users shouldn't be given the whole privilege for

security purpose about some users may not be aware of whether the data they refer to is

stored or derived. The DBMS supports multiple news view of data in a multi-user DBMS.

4

iv) Sharing of data and multi-user transaction processing

Many user can select, update data at the same time. So dbms must support concurrency

control. for example in applications such as train/bus reservation system, flight reservation

system, many users use the system from different locations at the same time and so is sharing

of data and multi-user transaction processing.

Advantages and benefits using DBMS

Advantages of using DBMS are as follows

i) Controlling redundancy

In traditional file processing system, each user maintains their own file and so there

may be duplication of data. Storing same data multiple times lead to several problems such as

wastage of space, duplication effort for entering data, data may become inconsistent.

ii) Restricting unauthorized access (security)

Confidential data should not be available to all users. User accounts with certain

restrictions to data may be created for security. Similarly multiple views can be created for

database security. In traditional file processing, if own get file gets everything & all data.

iii) Providing persistent storage for program objects and data structure

The values of program variables are discarded once the program terminates as in C,

C++ pascal program unless the programmer writes them in files. A complex object in C++

can be stored permanently in an object oriented DBMS.

iv) Permitting inferencing and actions using rules

Database system may be deductive or active. Deductive databases have capabilities for

defining deduction rules for inferencing new information from stored database. It works like

reporting system.

v) Providing multiple user interfaces

DBMS provides variety of interfaces for varying users. DBMS provide query language

for casual user, programming languages for application programmers, forms and command

for parametric users, menu driven interfaces for stand alone users. Form styles and menu

driven interfaces are collectively called GUI (Graphical user interface)

vi) Representing complex relationship among data

Relationships may be created among data using DBMS which helps in managing the

data and defining constraints for updating and deleting.

5

vii) Enforcing integrity constraints

Something that limits data is called constraints in database. For example, the minimum

balance should not fall in a bank. It is a constraint. Some of the constraints are primary key,

NOT NULL, check.

viii) Providing backup and recovery

DBMS provides facilities for taking backup of the database which can be used for

recovery in case of failure of computer system or hardware system.

ix) Easy in accessing data

It becomes very easy and fast while accessing data from database using DBMS.

Reports can be used for easy access of data.

x) Concurrent access to database

Many users can share the data at the same time and thus dbms provides users to access

the database concurrently.

Database system concepts and architecture

Data Model

Data model is a collection of tools for describing data, data relationship and consistency

constraints. It is used to describe the structure of a database, basic operations for specifying

retrievals and updates on the database.

Data model is a type of data abstraction. 3 levels of data abstraction are as follows.

i) Physical level : It is also called internal or low level data model. It describes about how

data is actually stored in the database.

ii) Logical level : Next higher level is the logical level which describes about what data are

stored and its relationship.

iii) View level : It is the highest level and describes about multiple views of data.

Many data models have been proposed.

Categories of data model:

1) Object based logical models

2) Record based logical models

6

3) Physical models

1. Object based logical models

It is used in describing data at the logical and view levels. There are many different

models. Some of them are:

i) Entity-relationship model

ii) Object-oriented model

iii) Semantic data model

iv) Functional data model

i) Entity-relationship model:

The entity relationship (ER) data model is based on a perception of a real world that

consists of a collection of basic objects, called entities, and of relationships among these

objects.

Customer Name Address

Customer Depositor

Customer Name Address

Customer

Fig. A sample E-R diagram

ii) Object oriented model

The object oriented model is based on a collection of object. An object contains values

stored in instance variables within the object. An object also contains bodies of code that

operate on the object. These bodies of code are called methods. Objects that contain the same

types of values and the same methods are grouped together into classes.

iii) Semantic data models

It is similar to E-R modeling. It is also called object modeling. It also supports entity,

which has properties and relationship.

iv) Functional data model

It is based on functions instead of relations. The functional approach shares certain

ideas with object approach. It addresses object, which are functionally related to other.

7

2. Record based logical models

Record based logical models are used in describing data at the logical and view levels.

It is used to specify the overall logical structure of the database.

Record based models are so named because the database is structured in fixed format

records of several types. Each record type defines a fixed no. of fields, or attributes, and each

field is usually of a fixed length. The three most widely accepted record based data models

are the relational, network and hierarchical models.

i) Relational model

The relational model uses a collection of tables to represent both data and the

relationships among those data. Each table has multiple columns, and each column has a

unique name.

Customer Name Address Account No. Account No. Balance

Ram KTM A-1 A-1 500

Laxman Lalitpur A-2 A-2 700

Bharat Jhapa A-3 A-3 900

Fig. A sample relational database

ii)Network model

Data in the network model are represented by collections of records and relationships

among data are represented by links, which can be viewed as pointers.

Ram KTM 0001 A-1 500

Laxman Lalitpur 0002 A-2 900

Bharat Janakpur 0003 A-3 6000

Fig. A sample Network database

CustomersName DustomerStreet CustomerCity deposit Account No Balance

Customer Account

Fig. data structure diagram for network data model

8

iii) Hierarchical model

It is similar to the network model in the sense that data and relationships among data

are represented by records and links, respectively. Records are organized as collections of

trees rather than arbitrary graphs.

Ram KTM ……

Laxman Jhapa …..

A-1 700 A-2 500

A-3 900 A-4 700

Bharat KTM …….

A-5 500 Sita KTM …….

A-6 600

Fig. A sample hierarchical database

Custom erName CustomerStreet CustomerCity Customer

Account No. Balance Account

Fig. Tree structure diagram for hierarchical model

Physical models

Physical data models are used to describe data at the lowest level. There are only few

physical data modes in use. Two widely known ones are the unifying model and the frame

memory model.

Schemas and instances

Database=description of database + database itself

The overall design of the database is called database schema. Database schema is

specified during database design and not expected to change frequently.

Student Course

9

Name Class Major Course Name Duration Remarks

Fig. Schema Diagram

Database changes over time as information is inserted and deleted. The collection of

information stored in the database at a particular moment is called instances in the database.

It is also called database state or snapshot or current set of occurrences. When database is

designed, the database is in empty state with no data. It is in initial state when database is

loaded with data. Thus at any point, database has a current state. When any field is added to

database, it is called schema evolution.

DBMS Architecture

The main characteristics of database approach are (i) insulation of programs and data

(ii) support of multiple user views (iii) use of a catalog to store the database description. The

architecture of the DBMS is proposed to visualise these characteristics and so called the three

schema architecture. It is also called ANSI/SPARC (American National standard

Institute/Standards planning and requirements committee) Architecture.

Goal of the architecture is to separate the user applications and physical database. In

this architecture, schemas can be defined as the following three levels.

i) The internal level

It has an internal schema which describes the physical structure of the database. It

describes the complete details of data storage and access paths for the database.

ii) The conceptual level

It has a conceptual schema, which describes the structure of the whole database for a

community of users. It hides the details of physical storage structures and concentrates on

describing entities, data types, relationships, user operations and constrains.

iii) The external or view level

It includes a number of external schemas or user views. Each external schema describes

the past of the database that a particular user group is interested in and hides the rest of the

database from that user group.

10

External view External view

Conceptual schema

Internal schema

Stored database

USER USER

External/conceptural Mappping

conceptural /Internal Mappping

Internal Level

Conceptual Level

External Level

Fig. Three schema Architecture

Three schema architecture is a tool for the user to visualize the schema levels in a

database system. Most DBMS don't separate the three levels data actually exists at the

physical level. User/groups refer only to its own external schema. So DBMS must transform

a request from users into a request against conceptual schema and then into a request on the

internal schema for processing over the stored database. If the request is a database retrieval,

the data extracted from the stored database must be reformatted to match the user's external

view. The process of transforming requests and results between levels are called mappings.

DATA INDEPENDENCE

Data independence is defined as the capacity of DBMS to change the schema at one

level of a database system without having to change the schema at the next higher level. We

can define two types of data independence.

i) Logical data independence

Logical data independence is the capacity to change the conceptual schema without

having to change external schemas or application programs. We may change the conceptual

11

schema to expand the database (by addressing a record type or data item) or to reduce the

database (by removing a record type or data item). It results in change in E-R diagram but the

application program or external schema is not changed.

ii) Physical data independence

Physical data independence is the capacity to change the internal schema without

having to change the conceptual (or external) schemas. Changes to the internal schema may

be needed because some physical files had to be reorganized. For example, by creating

additional access structures to improve the performance of retrieval or update. If the same

data as before remains in the database, we should not have to change the conceptual schema.

In multiple level DBMS, its catalog must be expanded to include information on how to

map requests and data among the various levels. In data independence, when the schema is

changed at some level, the schema at the next higher level remains unchanged only mappings

change.

DATABASE LANGUAGES

A database system provides mainly two different types of languages: one to specify the

database schema, called data definition language and the other to express database queries

and updates called data manipulation language.

i) Data definition language (DDL):

A database schema is specified by a set of definitions expressed by a special language

called a data-definition language. The result of compilation of DDL statements is a set of

tables that is stored in a special file called data dictionary or data directory. A data dictionary

is a file that contains metadata that is data about data. DBMS will have a DDL complier,

which process DDL statements. DDL is used to specify conceptual schema only. Similarly,

SDL (storage definition language) is used to specify the internal schema & VDL (view

definition language) is used to specify user views and their mappings to the conceptual

schema.

iii) Data manipulation language (DML)

A data manipulation language is a language that enables users to access or manipulate

data as organized by the appropriate data model.

Data manipulation consists of

12

The retrieval of information stored in the database

The insertion of new information into the database

The deletion of information from the database

The modification/update of information stored in the database

DML is of the following 2 types

a) Non-procedural DMLs

The language requires a user to specify what data are needed without specifying how to

get those data. It is easier to learn and use. Many DBMS allow it either to be entered

interactively from a terminal or to be embedded in a general purpose programming language.

For example SQL (structured query language). SQL can retrieve many records in a single

DML statement and hence it is also called set at a time or set oriented language. It is also

called high level language.

b) Procedural (Low Level) DMLs

The language requires a user to specify what data are needed and now to get those data.

It is embedded in a general purpose programming language. This type of DML retrieves

records one by one and processes each record separately using programming language

construct such as looping and hence it is also called record at a time DML.

When DML are embedded in a general purpose-programming language, then that

language is called host language and the DML is called the data sub language.

DBMS Interfaces

DBMS provides the following user-friendly interfaces.

i) Menu based interfaces for browsing

These interfaces present the user with lists of options, called menus that lead the user

through the formulation of a request. The query is composed step by step by picking optional

from a menu that is displayed by the system. Pull down menus is becoming popular technique

in window based user interfaces.

13

ii) Forms based interfaces

A forms-based interface displays a form to each other. Users can fill out all of the form

entries to insert new data, or they fill out only certain entries. Forms are usually designed and

programmed for naïve users as interfaces to canned transactions.

iii) Graphical user interface

A graphical interface (GUI) typically displays a schema to the user in diagrammatic

form. The user can then specify a query by manipulating the diagram. GUIs utilize both

menus and forms.

iv) Natural language interfaces

These interfaces accept requests written in English or some other language and attempt

to understand them. The natural language interface usually has its own schema, which is

similar to database conceptual schema. The natural language interface refers to words in its

schema to interpret the request. If the interpretation is successful, the interface generates a

high level query corresponding to the natural language request and submits it to the DBMS

for processing.

v) Interfaces for parametric users

Parametric users such as bank tellers, often have a small set of operations that they

must perform repeatedly. System analysts and programmers design and implement a special

interface for a known class of naïve users. For example, function keys in a terminal can be

programmed to initiate the various commands. This allows the parametric user to proceed

with a minimal number of keystrokes.

vi) Interfaces for DBA

Most database system contains privileged commands that can be used only by the

DBA's staff. These include commands for creating accounts, setting system parameters,

granting account authorization, changing a schema, and reorganizing the storage structure of

database

Classification of Database Management System

DBMS is classified on the basis of data model, number of users, number of sites, cost

and types of access path.

On the basis of data model, DBMS is classified into

i. Relational data model

14

ii. Object data model

iii. Hierarchical data model

iv. Network data model

On the basis of numbers of users, DBMS is classified into

i. Single user system – supports only one user at a time

ii. Multi-user system supports multiple users concurrently

On the basis of numbers of sites, DBMS is classified into

i. Centralized – if the data is stored at a single computer site.

ii. Distributed – database and DBMS software distributed over many sites, connected by a

computer network.

iii. Homogeneous – Use same DBMS software at multiple sites.

iv. Heterogenous – Participating DBMS are loosely coupled and have a degree of local

autonomy. Many DBMS use a client server architecture.

On the basis of cost, DBMS is classified into

i. DBMS packages between $10,000 and $100,000

ii. DBMS packages costing more than $100,000

On the basis of types of access of Path, DBMS is classified into

i. General purpose – Designed for general purpose

ii. Special purpose – Designed and built for specific application such as airlines

reservation, telephone directly system such DBMS can't be used for other applications

without major change.

Data Dictionary

The data dictionary can be regarded as a system database which contains data about

data. This is also called metadata. It contains definitions of other objects in the system instead

of raw data. It also stores schemas and mappings details, various security and integrity

constraints. It is also called data directory or system catalog or simply catalog or data

repository

15

TABLE COLUMN

TABNAME COLCOUNT ROWCOUNT TABNAME COLNAME

DEPT 2 2 DEPT Dept No.

Emp 3 3 DEPT Dept Name

EMP Emp Name

EMP Emp No.

EMP Emp Telephone No.

Fig. Catalog for department and Employee database

DATA DICTIONARY

Application Programmers

End Users Database Administrators

HUMAN INTERFACES SOFTWARE AND DBMS INTERFACES

Security and Authority

subsystem Compilers /

Precompilers Application Programs /

Report generators Integrity constraint

Enforcer

Query optimizer

Fig. Human & Software interfaces to a data dictionary

Data dictionary is accessed by various software modules of DBMS itself such as

DDL/DML compilers, query optimizer, constraint enforcer. If the data dictionary is used by

designers, users and administrators, not by DBMS software it is called a passive data

dictionary, otherwise it is called an active data dictionary.

E-R MODEL

E-R model means entity relationship which is a popular high level conceptual data

model. ER model describes data as entities, relationship and attributes. ER model is based on

a perception of a real world that consists of a set of basic objects called entities, and of

relationship among these objects.

16

Entity types and entity sets: An entity is a thing or object in the real world that is

distinguishable from all other objects. For example, each person in an enterprise is an entity.

An entity has a set of properties, which may uniquely identify an entity. An entity may be

concrete, such as a person or a book, or it may be abstract such as loan or a holiday or a

concept.

An entity set is a set of entities of the same type that share the same properties or

attributes. The set of all persons who are customers at a given bank for example can be

defined as the entity set customer. Similarly, the entity set loan might represent set of all

loans awarded by a particular bank.

Attributes: Attributes are descriptive properties possessed by each member of an entity

set. Each entity has attributes. For example an employee entity may be described by the

employee's name, age, address, salary and job. Possible attributes of the loan entity set are

loan number and amount. For each attribute, there is a set of permitted values, called the

domain or value set.

Employee

Name

Address Age Salary Job Loan No. Loan

Amount

Ram KTM 15 5000 Manager L – 1 5000

Shyam KTM 20 3000 Operator L - 2 3000

Sita BRT 17 4000 CEO L – 3 10,000

Customer Loan

Fig.: Entity sets customer and loan

An attribute, as used in the E-R model, can be characterized by the following attribute

types.

i) Simple (Atomic) and composite attributes: Attributes that are not divisible are called

simple or atomic attributes. Such as age as shown in fig. is simple attribute. Composite

attributes can be divided into smaller attribute. Composite attributes can be divided into

smaller subparts, which represent more basic attributes with independent meaning. For

example: Employee name could be structured as a composite attribute consisting of

first name, middle name and last name.

17

ii) Single valued and multivalued attributes: Single valued have a single value for a

particular entity. For example, the loan number attribute for a specific loan entity refers

to only one loan number and so it is single valued. Consider the employee entity set

with the attribute dependent name. Any particular employee may have zero, one or

more dependents. So, different employee entities within the entity set will have

different numbers of values for the dependent name attribute and this type of attribute is

said to be multivalued.

iii) Null (missing): A null value is used when an entity does not have a value for an

attribute. It is unknown value of not applicable. If a particular employee has ho

dependents. The dependent name value for that employee will be null.

iv) Derived attribute: The value for this type of attribute can be derived from the values of

other related attributes. For instance, let us say that the customer entity has an attribute

loans-held, which represents how many loans a for this attribute by counting the

number of loan entities associated with that customer.

v) Complex Attributes: The composite and multivalued attributes be nested in an arbitrary

way. We can represent arbitrary resting by grouping components of a composite

attribute between Parentheses ( ) and separating the components with commas, and by

displaying multivalued attributes between braces . Such attributes are called

complex attributes. For example, if a person can have more than one residence and each

residence can have multiple phones. An attributes Address Phone for a person entity

type can be specified as bellows:

Addres Phone ( Phone (AreaCode, PhoneNumber),

Address (Street Address (Number, street, ApartmentNumber), city, state, up))

Key attributes of an entity type:

An entity type defines a collection of entities that have the same attributes. The

collection of all entities of a particular entity type in the database at any point in time is called

an entity set.

18

Entity Type Employee Company

Name EmpID, Name, Age, Salary Name, Headquarters, President

Entity Set:

e1 :

(1, Ram, 10, 2000)

e2

(2, Shyam, 20, 5000)

e2

(3, Mohan, 25, 1000)

C1

(wlink, Jawalakhel, Dr. Ashish)

C2

(Nepasoft, Ratnapark, S.P. Joshi)

C3

(NEA, KTM, Dr. S.R. Malla)

Fig.: Two entity types named employee and company and some of the member entities

in the entity set.

It is important to be able to specify how entities within a given entity set and

relationships within a given relationship set are distinguished. An entity type usually has an

attribute whose values are distinct for each individual entity in the collection. Such an

attribute is called a key attribute and its values can be used to identify each entity uniquely.

For example, the EmpId attribute is a key of the employee entity type. Some keys are

superkey, Candidate key and Primary Key.

Superkey: A super key is a set of one or more attributes that, taken collectively, allows

us to identify uniquely an entity in the entity set. For example, EmpId is a super key for the

entity set employee.

For example: suppose the attributes of the customer entity set are customer Name,

Social security, customer street, customerCity. Then social security is a superkey.

Candidate Key: There may be superkeys for which no proper subset is a superkey. Such

minimal superkeys are called candidate keys. Social – security and customerName,

CustomerStreet are candidate keys. Although the attributes social security and customer

Name together can distinguish customer entities, their combination does not form a candidate

key, since the attribute social security alone is a candidate key.

Primary key: Primary key is a candidate key that is chosen by the database designer as

the principal means of identifying entities within an entity set. A key (Primary, candidate and

super is a property of the entity set, rather than of the individual entites.

19

Relationships & Relationship Types

A relationship is an association between entities. Each relationship is identified so that

its name is descriptive of the relationship. Verbs such as takes, teaches, and employs make

good relationship names. For example, a student takes a class, a professor teaches a class, a

department employs a professor and so on.

Rectangles represent entity sets, ellipses represent attributes. Similarly relationships are

represented by diamond shaped symbols as shown below in fig. and the lines link attributes to

entity sets and entity sets to relationship sets.

Professor Class Teaches

Fig.: An entity relationship

The figure shows a relationship between two entities (also known as participants in the

relationship) named professor and class respectively.

A relationships degree indicates the number of associated entities or participants. A

unary relationship exists when an association is maintained within a single entity. A binary

relationship exists when two entities are associated. A ternary relationship exists when three

entities are associated. Although higher degrees exist, they are rare and are not specifically

named.

Course Contributor Professor

Prerequisite Teaches

Class

Binary

CRF

Fund

Recipient

Unary Ternary

Fig.: Three types of relationships

A course within the course entity is a prerequisite for another course within that entity.

The existence of a course prerequisite means that a course requires a course i.e. a course has a

relationship with itself. Such a relationship is also called a recursive relationship.

20

Connectivity: The relationships are all classified as M : N for example. A fund can have

many donors. A fund may support many researchers who become the fund receiptants and a

researcher may draw support from many funds. Contributors can make donations to many

funds.

The term connectivity is used to describe the relationship classification.

Professor Class Teaches

Student Class Enrolls in

M

N

1

M

One-to-Many relationship

Many-to-Many relationship

Fig.: Connectivity in an E-R diagram

Cardinality: Cardinality expresses the specific number of entity occurrences associated

with one occurrence of the related entity. The actual number of associated entities usually is a

function of an organizations policy. For example: For Purbanchal University limits the

professor to teaching a maximum of three classes per week. Therefore, the cardinality rule

governing the professor – class association is expressed as "one professor teaches upto three

classes per week. The cardinality is indicated by placing the appropriate numbers beside the

entities as shown in fig.

Professor Class Teaches I M

(0,3) (1,1)

One to many relationship

Fig.: Cardinality in an E-R diagram

The relationship between Professor and class is 1:M

The cardinality limits are (0,3) for professor indicating that a professor may teach a

minimum of zero and a maximum of three class

21

The cardinality limits for class entity are (1,1) indicating that the minimum no. of

professor required to teach a class is one, as is the maximum number of Professors.

For binary relationship between entity sets A and B, the mapping cardinality must be

one of the following.

i) One to one: An entity in A is associated with at most one entity in B, and an entity in B

is associated with at most one entity A.

ii) One to many: An entity in A is associated with any number of entities in B and an

entity in B however, can be associated with almost one entity in A.

iii) Many to one: An entity in A is associated with at most one entity in B and an entity in

B, however can be associated with any number of entities in A.

iv) Many to many: An entity in A is associated with any number of entities in B and an

entity in B is associated with any number of entities in A

a1

a2

a3

b1

b2

b3

A B

a1

a2

a3

b1

b2

b3

A B

b4

One to One One to Many

a1

a2

a3

b1

b2

b3

Many to One

a4

a5

a1

a2

a3

b1

b2

b3

Many to Many

a4 b4

Fig.: Mapping Cardinalities

A relationship is an association among several entities.

Entity – Relationship Diagram (E-R diagram)

The E-R diagram is used to represent to E-R model. The E-R diagram consists of the

following major components.

i) Rectangles – to represent entity sets

ii) Ellipses – to represent attributes

iii) Diamonds – to represent Relationships

22

iv) Lines – to link attributes to entity sets and entity sets to relationship sets -

v) Double ellipses – to represent multivalued attributes

vi) Dashed ellipses – to denote derived attributes –

vii) Double lines – to indicate total participation of an entity in a relationship set.

E1 R

Total participation of ∈2 in R

E2

For example: Suppose the attributes associated with customer are customerName,

SocialSecurity, CustomerStreet and CustomerCity. The attributes associated with loan are

loanNumber and amount. The relationship set borrower may be many to many, one to many,

many to one and one to one. To distinguish among these types, we know either a directed line

(→) or an undirected line (-) between the relationship set and the entity set.

Customer Loan Borrower

Social Security customer street

customer city customer name loan number Amount

Fig. E-R diagram corresponding to customers and loans

Similarly, we can see another example as follows:

Customer Account depositor

Social Security

customer city customer name Account number balance

Fig. E-R diagram showing one to many relationship

23

A directed line from the relationship set depositor to the entity set account specifies that

a customer can deposit to many accounts. So is a one to many relationships.

Weak Entity Types

An entity set may not have sufficient attributes to form a primary key. Such an entity

set is termed as weak entity set. An entity set that has a primary key is termed as strong entity

set. For example: consider the entity set payment, which has three attributes:

PaymentNumber, PaymentDate and paymentAmount. Although each payment entity is

distinct, payment for different loans may share the same paymentNumber. Thus, this entity

set does not have a primary key. Hence it is a weak entity set.

The primary key of weak entity set is formed by the primary key of the strong entity set

on which the weak entity set is existence dependent, plus the weak entity set's discriminator.

In this case, the existence of entity payment depends on the existence of entity loan. If

loan is deleted, its associated payment entities must be deleted. So, entity set loan is dominant

and payment is subordinate. Discriminator of weak entity set is a set of attributes that can

uniquely identify weak entities that are related to the same owner entity. For example: The

discriminator of the weak entity set payment is the attribute payment Number. Since, for each

loan, a paymentNumber uniquely identities one single payment for that loan. Hence, in the

case of the entity set payment, its primary key is loanNumber, PaymentNumber

Loan

Loan Number

Amount

loanPayment Payment

PaymentNumber

PaymentAmount

PaymentDate

Fig. E-R diagram with a weak entity set.

Roles On Relationships

Each entity type that participates in a relationship type plays a particular role in the

relationship. The role name signifies the role that a participating entity from the entity type

plays in each relationship instance and helps to explain what the relationship means. For

24

example, in the works_for relationship type, EMPLOYEE plays the role of employee or

worker and DEPARTMENT plays the role of department or employer.

Role names are not technically necessary in relationship types where all the

participating entity types are distinct, since each entity type name can be used as the role

name. However, in some cases the same entity type participates more than once in a

relationship type in different roles. In such cases, the role names become essential for

distinguishing the meaning of each participations. Such relationships are called recursive

relationships.

•

e1

e2

e3

e4

e5

r1

r2

r3

r4

r5

•

•

•

•

.

.

.

Employee Supervision

α

β

α

β

α

β

α

Fig.: Recursive relationship where employee entity type plays two roles

- supervisee

- Supervisor

The supervision relationship type relates an employee to a supervisor, where both

employee and supervisor entities are members of the same EMPLOYEE entity type.

Structural Constraints On Relationships Types

Relationship types usually have certain constraints that limit the possible combinations

of entities that may participate in the corresponding relationship set. These constraints are

determined from the miniworld situation that the relationship represent.

β

α

25

•

e1

e2

e3

e4

e5

r1

r2

r3

r4

r5

•

•

•

•

.

.

.

Employee department

d1

d2

d3

.

.

.

.

.

.

.

works_for

•

•

•

.

.

Fig.: Some instances of the works_for relationship between employee and department.

Suppose the company has rule that each employee must work for exactly one

department.

There are two types of relationship constraints: Cardinality ratio and participation.

i. Cardinality ratios for binary relationships: The cardinality ratio for a binary relationship

specifies the number of relationship instances that an entity can participate in for example: in

the works_for binary relationship type, department : employee is of cardinality ratio I:N,

meaning that each department can be related to numerous employee but an employee can be

related to only one department. The possible cardinality ratios for binary relationships types

are 1:1, 1:N, N:1 and M:N.

ii. Participation constraints: The participation constraints specifies whether the existence

of an entity depends on its being related to another entity via the relationship types.

There are two types of participation constraints total and partial. If a Company policy

states that every employee must work for a department, then an employee can exist

only if it participants in a works_for relationship instance. Thus, the participation of

employee in works_for is called total participation meaning that every entity in the total

set of employee entity must be related to a department entity via works for. Total

participation is also called existence dependency.

Cardinality ratio and participation constraints, taken together is called the structural

constraints.

26

NAMING CONVENTIONS

The choice of names for entity types, attributes, relationship types and roles is not

always straight forward. One should choose names that convey, as much as possible, the

meanings attached to different constructs in the schema. We choose to use singular names for

entity types, rather than plural ones, because the entity type name applies to each individual

entity belonging to that entity type.

In E-R diagrams, we will use the convention that entity type and relationship type

names are in uppercase, letters, attribute names are capitalized and roles names are in

lowercase letters. Generally the nouns appearing in the narrative tend to give rise to entity

type names and the verbs tend to indicate names of relationship types.

CUSTOMER ACCOUNT DEPOSITOR

CustomerName

CustomerCity

CustomerId AccountNumber Balance

Fig. E-R diagram using Naming conventions

Another naming convention involves choosing relationship names to make the ER

diagram of the schema readable from left to right and from top to bottom.

EMPLOYEE PROJECT WORKS_ON

EmployeeName EmployeeId ProjectName

Location

Fig. E-R diagram using convention for relationship Name from Left to right.

27

Relational Model:

Relational Model Concept:

Students: Course:

Section: Grade Report:

Section

ID

Course

Number

Semester Year Instructor

75 MAT

305

Fall 99 J. Gupta

77 DB 315 Spring 98 P. Gurung

80 NEP

318

Spring 97 B. Twari

82 CS 310 Fall 99 K. Bista

Fig: A Database that stores Students and Course Information.

The relational model represents the database as a collection of relations. Informally, each

relation resembles a table of values or to some extent, a file of records. For example the

database of files that is shown above is similar to the relational model representation.

Course

Name

Course

Number

Credit

Hours

Departments

Computer

Science

CS 310 3 CS

Data base DB 315 3 CS

Maths MAT 305 3 Math

Nepali NEP 318 3 nepali

Name Std. ID Class Major

Laxman

05 11 Science

Pema 07 12 Management

Dil 12 11 English

Symbol

No.

Std ID Section

ID

Grade

30115 5 75 B+

30116 7 77 A

30117 12 80 C

28

When a relation is thought of as a table of values, each row in the table represent a collection

of related data values. We introduced entity types and relationship types as concept for

modeling real world data. In the relational model, each row in the table represents a fact that

typically corresponds to a real world entity or relationship. The table names and column

names are used to help in interpreting the meaning of the values in each row. For example,

the first table of above figure is called STUDENT because each row represents facts about a

particular student entity. The column names:-Names, Std ID, Class and Major-specify how to

interpret the data values in each row based on the column each value is in. All values in a

column are the same data type.

In the formal relational model terminology a row is called a tuple, a column header is called

an attribute, and the table is called a relation. The data type describing the types of values that

can appear in each column is represented by a domain of possible values.

Fundamental Concepts on Relational Data Model:-

Relations:-

The relational data model organizes and represents data in the form table or relations.

Relation is terms that comes from Mathematics and represent a simple 2- dimensional table,

consisting of rows and columns of data.

WORKER:

Worker ID Name Hourly- Rate Skill-Type SUPV- ID

123 Birendra Negi 12 Electric 131

141 Pema Tamang 11 Plumbing 152

292 Dil Thapa 10 Roofing -

323 Shyam Shah 11 Driving -

152 Hariom Shah 15 Teaching -

Fig: A portion of the relation worker.

This above figure shows a relation with sample data values, which represents the WORKER

object set, and its attributes. Each column in the relation is an attribute of the relation. The

29

name of the column is called the attribute name. We use the terms attribute and attribute

name rather than column name.

The number of attributes in a relation is called the degree of the relation. The degree of

WORKER is Five.

The rows of relation are also called tuples. It is assumed that there is no prescribed order to

the rows or tuples if a relation and that no tuples have identical set of values. The set of all

possible values that an attribute may have is the domain of the attribute.

* Null Values:

A null value means the value given an attribute in a tape if the attribute is inapplicable or its

value is unknown. For example, some employee in the WORKER relation do not have

supervisor. Consequently no values exist for SUPV-ID for Three employees. In addition

when we are entering data for a row in relation, we might not know the values of one or more

attributes for that row. In either can, we enter nothing,and that row is recorded in the database

with Null values for those attributes. A null value is not blank or zero. It is simply unknown

or inapplicable and may be supplied at a later time.

* Key:

In fact Key is a minimal set of attributes that uniquely identifies each row in a relation. In the

above figure, let us assume that the WORKER –ID attribute uniquely identifies a row in

WORKER, and we say that WORKER-ID is a key in the worker relation.

Any set of attributes that uniquely identifies each tuple in relation is called a super key. A key

of a relation is a minimal set of such attributes. That is, a key is a minimal super key. By

minimal, we mean that no subset of the set of key attributes will uniquely identify tuples in a

relation.

ASSIGNMENT

WORKER-ID BL DG-ID START DATE NO.-DAYS

123 312 10/10 5

30

141 312 05/10 10

123 312 12/08 5

141 315 12/12 12

In the ASSIGNMENT relation, the key consist of the WORKER-ID and the BL DG-ID

attributes. Neither WORKER-ID alone nor BL DG-ID alone uniquely identifies every row,

but the two attributes together do provide that unique identification required for a key. A key

consisting of more than one attribute is called a Composite Key.

In any given relation, there may be more than one set of attributes that could be chosen as a

key, these are called Candidate Keys. Candidate Key is defined as “any set of attributes that

could be chosen as a key of a relation. For example WORKER-ID is a candidate key in

worker relation if it will always be unique. When one of the candidate key is selected as the

relation key, it may be called the Primary Key. The candidate key that is the easiest to use in

day to day data entry is normally selected as the primary key.

A Foreign Key is a set of attributes in one relation that constitutes a key in some other or

possibly the same relation that are used to indicate logical links between relation. WORKER-

ID in the WORKER relation is the example of foreign keys since WORKER-ID is the key of

the ASSIGNMENT relation.

RELATIONAL ALGEBRA:

A Query language is a language in which a user request information from the database. The

relation algebra is a procedural query language (in procedural language, the user instructs the

system, to perform a sequence of operations on the database to complete the desired result). It

consists of a set of operations that take one or two relations as input and produce a new

relation as their results. The fundamental operations in the relation algebra are select, project

union, set difference, Cartesian product and rename. In addition, to the fundamental

operations, there are several others operations-namely, set intersection, natural join, division,

and assignment.

31

* Fundamental Operations:

In case of fundamental operations, the select, Project and rename operations are called unary

operations because they operate one to one relation and the other three operations, i.e. union,

set difference and Cartesian product operate on pairs of relations and are therefore called

binary operations. Let us describe all the fundamental operations in brief.

1. The Select operation:

The select operation selects tuples that satisfy a given predicate. We use the letter sigma

(б) to denote selection. The predicate appears as a subscript to sigma. The argument

relation is in parentheses after the sigma. The general format of the select operation is: -

sigma (selection condition)

LOAN

Loan Number Branch Name Amount

11 Round Hill 900

14 Down Town 1500

15 Perryridge 1500

16 Perryridge 1300

17 Down Town 1000

23 Red Wood 3000

93 Milanus 500

Figure: Loan Relation

Now in order to select those types of the loan relation where the branch name is “Perryridge”

we write

(б) Branch Name = “Perryridge” (Loan).

Here the result of this predicate from the loan relation is shown below:

Loan Number Branch Name Amount

15 Perryridge 1500

16 Perryridge 1300

Figure: - Result of (б) Branch Name = “Perryridge” (Loan)

32

2. The Project Operation:

The Project operation is unary operation that returns its argument relation with certain

attributes left out. Since a relation is a set, any duplicate rows are eliminated. Projection is

denoted by the Greek letter pi (π). We list those attributes that we wish to appear in the

result as a subscript to pi. The argument relation follows in parentheses. Thus we write

the query to list all loan numbers and the amount of the loan as.

PI loan number, amount (loan)

The general format of the project operation is: pi (attribute List) ®

And the result of these queries is given below:

Loan Number Amount

11 900

14 1500

15 1500

16 1300

17 1000

23 2000

93 500

Figure: - Result of π Loan-Number, Amount (Loan)

3. The Rename Operation:

We can also define a formal Rename operation which can rename either the relation name

or the attributes names, or both in a manner similar to the way we define select and

project operations. The general rename operation when applied to a relation R of degree n

is denoted by any of the following three forms.

Ps (B1, B2, B3,………..Bn) ®

Where the symbol P denoted the rename operation. S is the new relation name and B1, B2,

B3,………..Bn are the new attribute names. The first expression renames both the

33

notation and its attributes, the second expression renames the relation only and the third

expression renames the attributes only. If the attributes of R are (A1, A2, A3,………..An)

in that order, then each Ai is renamed as Bi.

4. The Union Operation:

The result of this operation denoted by R U S, is a relation that includes all tuples that are

either in R or in S relations or in both R and S. Duplicate tuples are eliminated. Let us see

an example.

Figure: The Depositor Relation Figure: The Borrowers Relation.

Now by the use of Union Operation, we can find the name of all customers with a loan in the

bank and also with an account in the bank. Hence the expression is:

π customer name (borrower) U π customer name (depositor)

The result of this expression which is extracted based on the above two relations are shown

below:

Customer Name Account Number

Shyam 101

Hariom 102

Chudamani 103

Dev Laxmi 104

Lekha 105

Customer Name Loan Number

Sanjeep 16

Bibek 17

Shahina 18

Dil Bd 19

Pema 20

34

Customer Name

Sanjeep

Bibek

Shahina

Dil Bd

Pema

Dev Laxmi

Lekha

Figure: Name of all customer s who have either a loan or an account.

5. The Set Difference (Minus) Operation:

The set difference operation, denoted by R-S, allows us to find tuples that are in one

relation but are not in another. The expression R-S produces a relation containing those

tuples in R but not in S.

In case of example are can find all customers of the bank who an account but not a loan

by anything:

π Customer-name (depositor) – π Customer-name (borrower).

The result of this above expression is shown below.

Customer Name

Shyam

Hariom

Figure: Customer with an Account but not Loan

6. The Cartesian Product Operation:

The Cartesian product operation which is also known as cross product or Cross Join

operation is denoted by (*), allows us to combine tuples from two relations in a

combinational fashion. We write the Cartesian product of relations R and S as R*S.

35

In general the result of R(A1, A2, A3,………..An)*S(B1, B2, B3,………..Bm) is a relation

Q with degree n+m attributes Q(A1, A2, A3,………..An, B1, B2, B3,………..Bm), in that

order. The resulting relation Q has one tuple for each combination of tuples- one from R

and one from S.

For example; Suppose that one want to find the name of all customers also have a loan at

the Perryridge branch we need the information in both the Loan relation and Borrower

relation to do so. If we write

Б branch name = “Parryridge”

(borrower*loan)

The result is shown in the following Table;

Customer

Name

Borrower

Loan Number

Loan Loan

Number

Branch Name Amount

Adams 16 15 Perryridge 1500

Adams 16 16 Perryridge 1300

Curry 17 15 Perryridge 1500

Curry 17 16 Perryridge 1300

Hayes 18 15 Perryridge 1500

Hayes 18 16 Perryridge 1300

Vanes 19 15 Perryridge 1500

Vanes 19 16 Perryridge 1300

Smith 20 15 Perryridge 1500

Smith 20 16 Perryridge 1300

Some Other Operations Of Relational Algebra:

• The Set- Intersaction Operation:

This operation produces a relation that includes all the tuples in both R and S relations

and is denoted by ∩. For example; we wish to find all customers who have both a loan

and account. Using set intersection, we can write

Customer Name

Hayes

Vanes

36

Π customername (borrower) ∩ Π customername (Deposit). The result of this query

is:

• The Natural Join Operation:

The natural join is a binary operation that allows us to produces all the combination of

tuples from R and S relations that satisfy a join condition with only equality

comparisons except that the join attributes of S relation are not included in the

resulting relation. It is denoted by the “Join” symbol ∞

Let us consider an example to find the name of all customers who have a loan at the

bank, and find the amount of the loan. We express this query by using the natural join

as:

Π customer name, loan-number, amount (borrower ∞ loan)

The result of this above query is:

Customer

Name

Loan

Number

Adams 16

Curry 17

Hayes 18

Vanes 19

Smith 20

Loan

Number

Branch

Name

Amount

11 Roundhill 900

14 Downtown 1500

15 Perryridge 1500

16 Perryridge 1300

17 Downtown 1000

23 Redwood 2000

93 Milanus 500 Customer

Name

Loan Number Amount

Adams 16 1300

Curry 17 1000

37

• The Division Operation:

The division operation denoted by ÷ is useful for special kinds of query that

sometimes occurs in database applications. Formally let R(X) and R(Y)be relations

and let Y≤X, that is every attributes of Y is also in schema X. The relations R÷S is a

relation on schema X-Y (that is on the schema containing all attributes of schema X

that are not schema Y). A tuple t is in R÷S if and only if both of conditions hold:

1. it is in π X-Y ®

2. For every tuple ts in S, there is a tuple tR in R satisfying both of the following:

•••• tR (Y) = ts (Y)

•••• tR (X-Y) = t

• The Assignment Operation:

It is convenient at times to write a relational algebra expression by assigning parts of

it to temporary relation variables. The assignment operation denoted by ←, works like

assignment in a programming language. For eg.

Temp1← R

Temp2← S

Result = Temp1- temp2.

Integrity Constraints

Integrity constraints guard against accidental damage to database.

Entity Integrity

Entity integrity ensures that each row in the table is uniquely identified. In other words,

entity integrity ensures a table does not have any duplicate rows. Example: Two separate

customers should not have the same customer number .SQL Server will allow duplicate rows

if entity integrity is not enforced. Entity integrity is a key concept in the relational database

model. Data in the relational database is independent of physical storage; there is no such

thing as the '5th customer row' in a table. Physical independence is achieved by being able to

reference each row by a unique value, sometimes referred to as a ‘key’. Entity integrity

ensures that each row in a table has a unique identifier that allows one row to be

38

distinguished from another. Entity integrity is most often enforced by placing a primary key

(PK) constraint on a specific column (although it can also be enforced with a UNIQUE

constraint, a unique index, or the IDENTITY property) .The PK constraint forces each value

inserted into a column (or combination of columns) to be unique; if a user attempts to insert a

duplicate value into the column(s), the PK constraint will cause the insert to fail

A PK will not allow any Nulls to be inserted into the column(s) (A NULL entry would be

disallowed even if it would be the only NULL in the column and therefore unique.) . A PK is

referred to as a ' surrogate key' if the column contains no real data other than a uniqueness

identifier .If ‘real’ data can be used as a PK (e.g., a social security number), then it is referred

to as an ' intelligent key' .There can be only one PK per table .A composite PK is a PK that

consists of more than one column; it is used when none of the columns in the composite key

is unique by itself .Thus, there can be only one PK in a table but the PK can consist of more

than one column .If you need to enforce uniqueness on more than one column, use a PK

constraint on one column and a UNIQUE constraint or IDENTITY property on any other

columns that must not contain duplicates .Example: If the 'customer ID' column is the PK in

the 'customers' table and you also want to make sure there are no duplicate customer names,

you can place a UNIQUE constraint on the 'customer name' column . Non-PK columns on

which uniqueness is enforced are referred to as alternative keys or AKs; they get their name

from the fact that they are 'alternatives' to the PK and as such, make good candidates for

indexing or 'joining' on.

Domain Constraint

A domain of possible values must be associated with every attribute SQL allows the domain

declaration of an attribute to include the specification "not null" and thus prohibits insertion

of a null value for this attribute. Any database modification that would cause a null to be

inserted in a not null domain generates an error diagnostic. There are many situations where

the prohibition of null values is desirable. A particular case where it is essential to prohibit

null values is in the primary key of a relation schema.

The SQL-92 allows us to define domains using a create domain clause, as shown in the

following example.

create domain personName char (60)

We can then use the domain name personName to define the type of an attribute, just

like a built-in domain.

39

Domain constraints are the most elementary form of integrity constraint. They are

tested easily by the system whenever a new data item is entered into the database. It is

possible for several attributes to have the same domain. The principle behind attribute

domains is similar to that behind typing of variables in programming languages.

The check clause in SQL-92 permits the schema designer to specify a predicate that

must be satisfied by any value assigned to a variable whose type is the domain. For instance,

a check clause can ensure that an hourly wage domain allows only values greater that a

specified value (such as minimum wage) as shown below.

create domain hourlywage numeric (5, 2)

Constraint wage, valuetestcheck (value> = 4.00)

The domain hourlywage is declared to be a decimal number with a total of five digits,

two of which are placed after the decimal point, and the domain has a constraint that ensures

that the hourlywage is equal to or greater that 4.00.

The check clause can also be used to restrict a domain not to contain any null values, as

shown below.

e.g.: create domain accountNumber char(10)

constraint accountNumber NullTest check (value not null)

create domain gender char (10)

constraint checkgendercheck (value in ("Male", "Female")

Referential Integrity

It is also required that a value that appears in one relation for a given set of attributes

also appears for a certain set of attributes in another relation. This condition is called

referential integrity.

The referential integrity constraint is specified between two relations and is used to

maintain the consistency among tuples of the two relations. Informally, the referential

integrity constraint states that a tuple in one relation that refers to another relation must refer

to an existing tuple in that relation. Consider the two relations EMPLOYEE and

DEPARTMENT as follows.

40

EMPLOYEE DEPARTMENT

NAME SSN Address Sex Salary Dept

No.

Dept

No.

DeptName MGRSSN

The attribute dept No. of EMPLOYEE gives the department Number for which each

employee works. hence, its value in every EMPLOYEE tuple must match the dept no. value

of some tuple in the DEPARTMENT relation. To define referential integrity more formally,

we must first define the concept of a foreign key. The conditions for a foreign key between

two relation schemas R1 and R2 states that a set of attributes FK in relation schema R1 is a

foreign key of R1 that references relation R2 if it satisfies the following two rules.

i. The attributes in FK have the same domain as the primary key attributes PK of R2. The

attributes FK are said to reference or refer to the relation R2.

ii. A value of FK in a tuple t1 of the current state r1 (R1) either occurs as a value of PK for

some tuple t2 in the current state r2 (R2) or is null. In the former case, we have t1 [FK] =

t2 [PK], and we say that the tuple t1 references or refers to the tuple t2. R1 is called the

referencing relation and R2 is the referenced relation.

In a database of many relations, there are usually many referential integrity constraints.

To specify these constraints, we must first have a clear understanding of the meaning or role

that each set of attributes plays in the various relation schemas of the database.

In the EMPLOYEE relation the attribute deptNo refers to the department for which

employee work hence, we designate deptNo to be a foreign key of EMPLOYEE, referring to

the DEPARTMENT relation. This means that a value of deptNo in any tuple t1 of the

EMPLOYEE relation must match a value of the primary key of the department.

We can diagrammatically display referential integrity constraints by drawing a directed

arc from each foreign key to the relation it references. For clarity, the arrowhead may point to

primary key of the referenced relation.

41

EMPLOYEE

NAME SSN address Sex Salary dept No

DEPARTMENT

dept

no.

dept Name mgrssn

DEPARTMENT_LOCATIONS

dept no. Locations

Fig. Referential integrity constraints

Referential integrity in SQL :

Primary and foreign key can be specified as part of the SQL create table statement.

The primary key clause of create table statement includes a list of attributes that

constitute a candidate key.

The UNIQUE clause of create table statement includes a list of the attributes that

constitute a candidate key.

The FOREIGN KEY clause of create table statement includes both a list of

attributes that constitute foreign key and the name of the relation referenced by the

foreign key.

Assertion

An assertion is a predicate expressing a condition that we wish the database always to

satisfy. Domain constraints and referential integrity constraints are special forms of

assertions. However, there are many constraints that we can't express using only these special

forms.

For example suppose the constraints are:

i. The sum of all loan amounts for each branch must be less than the sum of all account

balances at that branch.

ii. Every loan has at least one customer who maintains an account with a minimum

balance of 50,000.

42

An assertion in SQL-92 takes the form

create assertion <assertionName> check <Predicate>

e.g.:

create assertion sumConstraint check

(not exists (select * from branch

where (select sum(amount) from loan

where loan.branchName = branch.branchName)

>= (select sum (amount) from account

where account.branchName = branch.branchName)))

When an assertion is created, the system tests it for validity. If the assertion is valid,

then any further modification to the database is allowed only if it does not cause that assertion

to be violated.

Triggers

A trigger is a statement that is executed automatically by the system as a side effect of a

modification to the database. Trigger must contain the following two requirements.

i. specify the conditions under which the trigger is to be executed.

ii. specify the actions to be taken when the trigger executes.

Triggers are useful mechanisms for alerting humans, or for performing certain tasks

automatically when certain conditions are met. Triggers are sometimes called rules or active

rules. Triggers are written in both front end and backend. If the triggers are written in

backend, they are called database triggers.

Types:

i. row level triggers

ii. statement level triggers

iii. before and after triggers

iv. database level triggers

Triggers can be written for events such as insert, update, delete, create, alter, drop etc.

For example suppose we want to store username and the system date into a table

logdata. For this purpose, the trigger can be written as follows.

CREATE OR REPLACE TRIGGER tg_before_update_user

BEFORE INSERT OR UPDATE

43

ON Policies

FOR EACH ROW

BEGIN

INSERT INTO LOGDATA

VALUES (USER, SYSDATE); COMMIT;

END;

In this trigger, if any insert or update is made in the policies table, then the user name

& current date is stored in the logdata table.

QUERY PROCESSING

Introduction

Query Processing refers to the range of activities involved in extracting data from a

database. The range of activities include translation of queries from high level database

language into expressions that can be used at the physical level of the file system, a variety of

query optimizing transformation and actual evaluation of queries.

The basic steps involved in processing a query are given below:

a) Parsing and Translation:

At the initial step, a query expressed in high level query language such as SQL must

be translated into its internal form. This translation process is similar to the work performed

by the parser of the complier. In generating the internal form the query, the parser checks the

syntax of the users query, verifies that the relation names appearing in the query are names of

the relations in the database, and so on. The system constructs a parse- tree representation of

the query, which it then translates into a relational algebra expression.

b) Optimization :

A query typically has many possible executions strategies, and the process of

choosing a suitable one for processing a query is know as Query optimization. The main task

of query optimizer is to produce suitable execution plan.

44

c) Evaluation:

A sequence of the primitive operations that can be used to evaluate a query is query

execution plan or query evaluation plan. The query execution engine takes a query evaluation

plan, executes that plan and returns the answer to the query.

The following figure illustrates the steps used in query processing.

Figure: Steps in Query Processing

For Example:

Consider the Query

Select Balance

From account

Where balance <2500

The query can be translated into either of the following relational algebra expression

• ∂ balance <2500(Ωbalance (account)) ___select

• Ω balance(∂ balance<2500(account)) ____ project

And Query Evaluation plan

Ω Balance

׀

Parser & translator

Relational algebra expression

Optimizer

Execution plan Evaluation engine

Query output

Query

45

∂ Balance<2500

׀

account.

Query Optimization:

For a given query, there are generally a variety of methods for computing the answer.

It is the responsibility of data system to transform the query as entered by the user into an

equivalent query that can be computed more efficiently. The process of finding a good

strategy for processing a query is called Query Optimization.

In other words, Query Optimization is the process of selecting the most efficient

query evaluation plan from among the many strategies usually possible for processing a given

query, especially if the query is complex. Users are not expected to write their queries so that

they can be processed efficiently but the system is expected to construct a query evaluation

plan that minimizes the cost of query evaluation. This is the place where query optimization

comes into play.

Let us consider a scenario. One aspect of optimization occurs at the relational algebra

level, where the system attempts to find an expression that is equivalent to the given

expression, but more efficient to execute. Another aspect is selecting a detailed strategy for

processing the query such as choosing the algorithm to use for executing an operation,

choosing the specific indices to use, and so on.

Equivalence of Expression

To relational algebra expressions are said to be equivalent if the given two

expressions generate the same set of tuples on every legal data base instance. A legal

database instance is one that satisfies all the integrity constraints specified in the database

schema. Although the two expressions may generate the tuples in different orders, but they

would be considered equivalent as long as the set of tuples is the same.

Beside this, an Equivalence Rule says that expressions of two forms are always

equivalent, then we can replace an expression of the first form by an expression of the second

form, or vice-versa.

46

In SQL, the inputs and outputs of tuples and a multi set version of the relational

algebra is used for evaluating SQL queries. Two expression in the multi set version of the

relational algebra are said to be equivalent if an every legal database, the two expression

generate the same multi set of tuples,

Query Decomposition: (For detail see internet)

Query Decomposition is the process in DBMS which break up or decompose a query

into sub queries that can be executed at the individual sites.

In fact, Query decomposer generates sub queries based on the conventional format

query intercepted from the initial processing.

OBJECT ORIENTED DATA MODEL

Introduction:

Traditional Data Model and systems, such as relational, network and hierarchical,

have been quite successful in developing the database technology required for many

traditional business database applications. However, they have certain short comings when

more complex database application must be designed and implemented. For e.g, database for

engineering design and manufacturing, scientific experiments, telecommunications,

geographic information system, multimedia and many more. These new applications have

requirements and characteristics that differ from those of traditional business applications,

such as more complex structures for objects, longer duration transaction, new data types for

sorting images or large textual items, etc.

Object oriented database were proposed to meet the needs of these more complex

applications mentioned above. The object oriented approach offers the flexibility to handle

some of these requirements without being limited by the data types and query languages

available in traditional database system. Some of the keys features for creating objected

database or model are given below.

1. The first keys feature is the power they give the designer to specify both the structure

of complex objects and the operations that can be applied to these objects.

2. Another reason for the creation of object oriented database is the increasing use of

object oriented programming language in developing software applications.

47

3. The need for additional data modeling feature has been recognized by relation DBMS

vendors, and thus newer version of relational system are incorporating many of the

features that were proposed for object oriented database.

Design of Object Oriented Data Model

Mainly, design of the object oriented data model is based on the principal of object

oriented programming language means, design of object oriented data model includes the

elements that are included by object oriented programming languages which are give

below. Beside this, this model contain more sub elements but the given below are the core

elements.

1. Objects:

Simply, an object corresponds to an entity in the E-R model. The object oriented

paradigm is based on encapsulation of data and code related to an object into a single

unit, whose contents are not visible to the outside world. Object is also known as the

combination of data and function in single unit. Conceptually, all interactions between

an object and the rest of the system are via message.

2. Class:

Usually, these are many similar objects in a database. By similar, we mean that they

respond to the same message, use the same methods, and have variables of the same name

and type. It would be wasteful to define each such object separately. Therefore we group

similar objects to form a class. Each such object is called an instance of its class. All

objects in a class share a common definition, although they differ in the values assigned

to the variables. The notation of a class in the object oriented data model corresponds to

the notation of an entity set in the E-R model. Example of classes in our bank database is

Employee, Customers, Accounts and Loans.

3. Inheritance:

Inheritance is the process of creating a new class called derived class from the

existing class called the base class. Each sub class or derived class shares common

characteristics of base class. An object oriented database schema usually requires large

classes. Among many classes, there could be classes which are similar type. In such situation,

48

inheritance plays vital role because it is mechanism to create derived class for e.g. bus and

truck are considered to be the member of class vehicle. In addition to those similar

characteristics, each member if the class vehicle unit characteristic for e.g. bus carries

passengers where as truck carries goods.

4. Polymorphism:

It is another important feature of object oriented database. The word polymorphism is

derived from the Latin words ‘poly’ means many and ‘morphism’ the concept is form.

Therefore the concept of using functions and operators in different ways, depending on

what they are operating is called polymorphism. This type of concept is frequently used in

object oriented database. For e.g. consider the addition operation. In case of numbers, the

operation will add numbers but in case of string, the operation will concatenate the given

strings.

Besides these, there could be other elements that are considered while designing

object oriented database, But those above four are considered while designing object oriented

database, but there above four are considered as the most,

SQL

Introduction

The history of SQL begins in an IBM laboratory in San Jose, California, where SQL

was developed in the late 1970s. The initials stand for Structured Query Language, and the

language itself is often referred to as "sequel." It was originally developed for IBM's DB2

product (a relational database management system, or RDBMS, that can still be bought today

for various platforms and environments). In fact, SQL makes an RDBMS possible. SQL is a

nonprocedural language, in contrast to the procedural or third-generation languages (3GLs)

such as COBOL and C that had been created up to that time.

Background

SQL = Structured Query Language

Created in late 70’s at IBM, under the name of SEQUEL

Went through major standardizations (which contributed to its wide acceptance):

SQL-86 (SQL1)

Queries + some schema definitions & manipulation

SQL-89

Do

cum

en

t2

49

Referential integrity

SQL-92 (SQL2)

Revised and expanded

SQL-99 (SQL3)

Archive rules & triggers, recursive operations, aggregate operations, object-oriented

features

Consists of

A Data Definition Language (DDL) for declaring database schemas. e.g.

create table

Data Manipulation Language (DML) for modifying and querying database

instances . It also includes commands to insert tuples into, deletes tuples from,

and to modify tuples in the database. e.g.select,insert,update,delete,explain,lock

table etc.

Embedded DML : The embedded form of SQL is designed for use within

general-purpose programming languages, such as Cobol, Pascal, Fortran and C.

View Defintion: The SQL DDL includes commands for defining views.

Authorization: The SQL DDL includes commands for specifying access rights

to relations and views.

Integrity: The SQL DDL includes commands for specifying integrity

constraints that the data stored in the database must satisfy.

Transaction Control: SQL includes commands for specifying the beginning

and ending of transactions. e.g. set transaction, commit, rollback

Session Control: Manages the properties of user session. e.g. alter session

System Control : manipulates the properties of database. e.g. alter system

Basic Structure

The basic structure of an sql expression consists of three clauses: select, from and

where.

The select clause corresponds to the projection operation of the relational algebra. It is used

to list the attributes desired in the result of query.

The FROM CLAUSE corresponds to the Cartesian product operation of the relational

algebra. It lists the relations to be scanned in the evaluation of the expression.

The WHERE clause corresponds to the selection predicate of the relational algebra. It

consists of a predicate involving attributes of the relations that appear in the from clause.

50

select attribute-expression

from table

[where condition]

i.e. A typical SQL query has the form

SELECT A1,A2,A3,….,An

FROM R1,R2,R3,…..,Rm

WHERE P

Each Ai represents an attribute, and each ri a relation. P is a predicate. The query is

equivalent to the relational algebra expression

∏A1,A2,…An(σρ(r1 × r2 × … × rm))

select Name

from Students

where name<'N';

STUDENTS Name Number Sex

Ben 3412 M

Dan 1234 M

Nel 2341 F

Name

Ben

Dan

The result of this select statement is a relation consisting of a single attribute with the

heading Name.

We can rewrite the preceding query as

Select distinct name from students if we want duplicates to be removed. SQL allows us to use

the keyword ALL to specify explicitly that duplicates are not removed.

Select all name from students.

Since duplicate retention is the default, we will not use all in the query.

The asterisk symbol "*" is used to denote all attributes.

The select clause can also contain arithmetic expressions involving the operators, +,-,*, and /,

and operating on constants or attributes of tuples.

Select branchName, loanNumber, loanAmount*100

From Loan

51

Similarly SQL uses the logical connectives and, or, and not rather than the

mathematical symbols in the where clause.

e.g. Select loanNumber from loan where loanAmount<50000 and

branchName="Kathmandu";

If we wish to find the loan number of those loans with loan amounts between 100000

and 500000, we can use

Select loanNumber

From loan

Where loanAmount between 100000 and 500000;

Operators in conditions: =, <> , < , > , <=, >=, or, and, not

Renaming Attributes

SQL provides a mechanism for renaming both relations and attributes. It uses as clause,

taking the form

oldName as NewName

select attribute-expression [as] target-attribute

from table

[where condition]

select Name as Names

from Students

where Name < 'N'


Ben 3412 M

Dan 1234 M

Nel 2341 F

Names

Ben

Dan

Multiple Attributes

select Name as Names, Number * 10 as Num

from Students

where Name < 'N'

STUDENTS Name Number Sex Names Num

52

Ben 3412 M

Dan 1234 M

Nel 2341 F

Ben 34120

Dan 12340

The full list of attributes may be referenced through the character ‘*’

select *

from Students

where Name < 'N'

Multiple Tables

select Code

from Students, Classes

where Students.Number = Classes.Num


Ben 3412 M

Dan 1234 M

Nel 2341 F

CLASSES Code Num

670 1234

680 1234

680 4123

Code

670

680

The dot operator is available for distinguishing attributes of different tables

When no ambiguity arises, the identifying tables are not needed.

select Classes.Code

from Students, Classes

where Number = Num

Aliases for Tables

from table [as] alias,....

select S1.FN, S1.LN

from Students as S1, Students as S2

where S1.LN = S2.LN and S1.FN <> S2.FN

53

STUDENTS FN LN

Ben Smith

Dan McLean

Nel Smith

FN LN

Ben Smith

Nel Smith

Duplicate Tuples

In general, SQL tables are multisets, allowing duplicated rows in the tables

Some SQL tables (e.g., with key attributes) are forced to be sets

Requests to remove the extra entries can be made with the ‘distinct’ keyword,

following the ‘select’ keyword.

select S1.LN

from Students as S1,

Students as S2

where S1.LN = S2.LN

and S1.FN <> S2.FN

select distinct S1.LN


Students as S2

where S1.LN = S2.LN

and S1.FN <> S2.FN

LN

Smith

Smith

LN

Smith

Set Operations

The set operations include UNION, INTERSECT AND EXCEPT operations on

relations.

select ... <union | intersect | except> [all] select ...

a. The UNION Operation:

To find all customers having a loan, an account, or both at the bank we write

SELECT customerName from depositor)

UNION

SELECT customerName from borrower)

If we want to retain all duplicates , we must write UNION ALL in place of union as

follows.

54

(SELECT customerName from depositor)

UNION ALL

(SELECT customerName from borrower)

select FN as Name from Students

union

select LN as Name from Students

STUDENTS FN LN

Ben Smith

Dan McLean

Nel Smith

Name

Ben

Dan

Nel

Smith

McLean

The ‘all’ requests to retain duplicates. The default is to eliminate them

After aliasing, the tables involved in the operations must agree on their attributes,

and the ordering of the attributes

b. The INTERSECT operation: To find all customers who have both a loan and an account at

the bank, we write

(select customerName from depositor)

intersect

(select distinct customerName from borrower)

The intersect operation also automatically eliminates duplicates . If we want to retain

all duplicates, we must write INTERSECT ALL in place of intersect.

(select customerName from depositor)

intersect all

(select distinct customerName from borrower)

c. The EXCEPT operation: To find all customers who have an account but no loan at the

bank, we write

55

(select distinct customerName from depositor)

except

(select customerName from borrower)

If we want to retain all duplicates , we must write EXCEPT ALL in place of except as

follows.

(select distinct customerName from depositor)

except all

(select customerName from borrower)

Ordering

Ordering may be imposed on rows of tables, based on values of attributes.

order by attribute [asc |desc],...

The default assumes ascending order

select Name,Number

from Students order by Name

String Comparisons

=, !=,... standard operations

like, binary operator for comparing string patterns

% , wild card for strings . The % character matches any substring.

_, wild card for characters . The – character matches any character.

(name like '%a_') is true for all names having ‘a’ as second letter from the end.

Like "ab\%cd%" escape "\" matches all strings beginning with "ab%cd"

suppose "Find the names of all customers whose street address includes the substring

'PUR'. " This query can be written as

Select customerName

From customer

Where customerStreet like "%PUR%";

Similarly,

like "ab\%cd%" escape "\" matches all strings beginning with "ab%cd".

Null Values

An attribute can be checked whether it ‘is null’ or it ‘is not null’.

56

SQL allows the use of null values to indicate absence of information about the value of

an attribute.

The result of an arithmetic expressions (involving for example +,-,* or /) is null if any of the

input values is null. The result of any comparison involving a null value can be thought of as

being false.

select ...

from ...

where (x is null) and (y is not null)

Aggregate Queries

Aggregate functions are functions that take a collection ( a set or multiset) of values as

input and return a single value. SQL offers five built-in aggregate function.

Average: avg

Minimum: min

Maximum: max

Total: sum

Count: count

count(*), count ([distinct] attributes)

Counts the number of tuples

select count(*)


Students as S2

where S1.LN = S2.LN

and S1.FN <> S2.FN

STUDENTS FN LN Number

Ben Smith 3412

Dan McLean 1234

Nel Smith 2341

Count

2

sum ([distinct] attributes)

max ([distinct] attributes)

min ([distinct] attributes)

avg ([distinct] attributes)

57

select min(Number), max(Number), avg(Number)

from Students

There are circumstances where we would like to apply aggregate function not only to a

single set of tuples, but also to a group of sets of tuples; we specify this in SQL using group

by clause. The attribute or attributes given in the group by clause are used to form groups.

e.g. "Find the average account balance at each branch". We write this query as follows.

Select branchName, avg(balance)

From account

Group by branchName;

Group Clauses

Tables might be partitioned to subsets of rows which agree on their entries for given

attributes. The attributes are to be specified within a ‘group by’ clause, and are the only ones

allowed in the projection specified by the ‘select’ component.

select Sex, sum(Number)

from Students

group by Sex


Ben 3412 M

Dan 1234 M

Nel 2341 F

Sex sum_Number

M 4646

F 2341

Group predicates through ‘having’ clause may be added, to exclude subgroups which

don’t satisfy desirable conditions.

select Sex, sum(Number)

from Students

group by Sex

having sum(Number) < 3000


Ben 3412 M

Sex sum_Number

F 2341

58

Dan 1234 M

Nel 2341 F

Nested Subqueries

A common use of subqueries is to perform tests for set membership, set comparisons,

and set ardinality.

a. Set Membership

SQL draws on the relational calculus for operations that allow testing tuples for

membership in a relation. The IN connectives tests for set membership , where the set is a

collection of values produced by a select clause. The NOT IN connective tests for the absence

of set membership. We begin by finding all account holders, and we write the subquery

(Select CustomerName

From depositor)

We then need to find those customers who are borrowers from the bank and who appear

in the list of account holders obtained in the subquery. We do so by nesting the subquery in

an outer SELET. The resulting query is

Select distinct customerName

From borrower

Where customerName in (Select customerName from depositor)

Similarly to "find all the customers who have both an account and a loan at the

kathmandu branch" , The query is


From borrower,loan

Where borrower.loanNumber=loan.loanNumber and branchName="kathmandu" and

(branchName,customerName) IN (Select branchName,customerName from depositor,account

where depositor.accountNumber=account.accountNumber)

Similarly "find all customers who do have a loan at the bank who are others than

"Ram", "Mohan", "Barsha". The query is


From borrower

Where customerName NOT IN("Ram","Mohan"," barsha")

59

b. Set Comparison

Some, any, all are used for set comparison.

Consider the query "Find the names of all branches that have assets greater than those of at

least one branch located in Kathmandu". The query is

Select distinct T.branchName

From branch as T, branch as S

Where T.assets > S.assets and S.branchCity="Kathmandu"

The alternative style for writing the preceding query is using SOME. The phrase

"greater than at least one " is represented in SQL by >SOME as follows.

Select branchName

From branch

Where assets > some (select assets

from branch where branchCity="Kathmandu")

SQL also allows <some, <=some, >=some, =some, <>some comparisons.

Similary "Find the names of all branches that have assets greater than that of each

branch in kathmandu" . The query is

Select branchName

From branch

Where assets > all (select assets

from branch

where branchCity="Kathmandu")

select Name

from Students

where Number = all (select Number

from Same)

STUDENTS Name Number

Ben 3412

Dan 1234

Nel 2341

SAME Number

3412

3412

Name

Ben

60

any

select Name

from Students

where Number = any (select Number

from Diff)


Ben 3412

Dan 1234

Nel 2341

DIFF Number

3412

2341

Name

Ben

Nel

select Name

from Students

where (Name,Number) not in

(select Name,Number

from Student

where Number <> 1234)


Ben 3412

Dan 1234

Nel 2341

Name

Dan

c. Test for empty Relations

SQL includes a feature for testing whether a subquery has any tuples in its result The

EXISTS construct returns the value TRUE if the argument subquery is nonempty. Using the

EXISTS construct , we can write the query "Find all customers who have both an account and

a loan at the bank" in another way as follows.

Select customerName

From borrower

Where exists (select * from depositor

where depositor.customerName=borrower.customerName)

61

d. Test for the absence of duplicate tuples

SQL includes a feature for testing whether a subquery has any duplicate tuples in its

result. The UNIQUE construct returns the value true if the argument subquery contains no

duplicate tuples. Using the unique construct , we can write the query " Find all customers

who have only one account at the Kathmandu branch," as follows.

Select T. CustomerName

From depositor as T

Where unique(select R.customerName

From account,depositor as R

Where T.customerName=R.customerName and

R.accountNumber=account.accountNumber and

Account.branchName="Kathmandu")

Similary we can test for the existence of duplicates tuples in a subquery by using the

not unique construct. Consider the query "Find all customers who have at least two accounts

at the Kathmandu branch" can be written as

Select distinct T. Customer Name

From depositor as T

Where not unique(select R.customerName

From account,depositor as R

Where T.customerName=R.customerName and

R.accountNumber=account.accountNumber and

Account.branchName="Kathmandu")

Views

Views are virtual tables whose contents depend on other tables. Views are defined in

SQL using CREATE VIEW command.

Create or replace view viewName as <sql expression>

e.g. create or replace view v_employee as

select emid, empname from employee

62

create view Males (Nm,Num)

select Name,Number

from Students

where Sex = 'M'


Ben 3412 M

Dan 1234 M

Nel 2341 F

MALES Nm Num

Ben 3412

Dan 1234

We can use the view forcefully with compilation errors as follows.

Create force view viewname as <sql expression>

View can be created with compilation errors and later on the error can be fixed and

compiled using

Alter view viewname compile;

View can be created read only and with check option constraint. Create view with an

optional WITH READ ONLY specifies that the view will be read only. Similary WITH

CHECK OPTION specifies that inserts and updates done through the view should satisfy the

where clause of the view.

e.g. Create or replace view v_employee(empid,empname) as select

employeeid,empname from employee where salary>2999 with check option constraint

top_emp_sal;

Joined Relation

Instead of providing simple tables, combinations of them may be specified using the

inner, left outer, right outer, and full outer operations.

table-1 join-op table-2 on condition

select Name,Code as Course

from Students inner join Classes

on Student.Number = Classes.Number

63


Ben 3412 M

Dan 1234 M

Nel 2341 F

CLASSES Code Number

670 1234

680 1234

680 4123

Name Course

Dan 670

Dan 680


from Students natural inner join Classes


Ben 3412 M

Dan 1234 M

Nel 2341 F

CLASSES Code Number

670 1234

680 1234

680 4123

Name Course

Dan 670

Dan 680

For natural joins, the ‘natural’ keyword can be specified before the operator instead of

providing the condition.

Attributes names might be renamed, to facilitate the natural join operation.


from Students natural inner join

Classes as C(Code,number)


Ben 3412 M

64

Dan 1234 M

Nel 2341 F

CLASSES Code Num

670 1234

680 1234

680 4123

Name Course

Dan 670

Dan 680

The keyword NATURAL appears before the join type. The meaning of the join

condition natural, in terms of tuples from two relations match, is straightforward. The

ordering of the attributes in the result of a natural join is as follows. The join attributes appear

first, in the order in which they appear in the order in the left hand side relation. Next come

all nonjoin attributes of the left hand side relation and finally all nonjoin attributes of the right

hand side relation.

65

Consider the following two tables with data.

LOAN (branchname, loannumber, amount)

downtown L-170 3000

redwood L-230 4000

perryridge L-260 1700

BORROWER(customername, loannumber)

jones L-170

smith L-230

Hayes L-155

branchname, loannumber, amount, customername, loannumber

downtown L-170 3000 jones L-170

redwood L-230 4000 smith L-230

fig. Result of loan inner join borrower on loan.loannumber=borrower.loannumber

branchname, loannumber, amount, customername, loannumber

downtown L-170 3000 jones L-170

redwood L-230 4000 smith L-230

perryridge L-260 1700 null null

fig. Result of loan left outer join borrower on loan.loannumber=borrower.loannumber

branchname, loannumber, amount, customername

downtown L-170 3000 jones

redwood L-230 4000 smith

null L-155 1700 hayes

fig. Result of loan natural right outer join borrower

branchname, loannumber, amount, customername

downtown L-170 3000 jones

redwood L-230 4000 smith

66

perryridge L-260 1700 null

null L-155 null hayes

fig. Result of loan full outer join borrower using(loannumber)

Data Definition Language (DDL) in SQL

Table Definition

create table name (attributes)

create table Student(

Name varchar (5),

registrationNo number(4),

class char(1),

gender char(1),

Joining_dt date

)

Default Values

default value | user | null


Name varchar (5),

registrationNo number(4),

class char(1),

gender char(1) default 'M',

Joining_dt date

)

Constraints on Attributes

not null, unique, primary key ,foreign key, check


Name varchar (5),

registrationNo number(4) constraint pk_registrationNo Primary key,

class char(1) constraint uniq_class UNIQUE,

gender char(1) constraint chk_gender check(gender in('M','F')),

Joining_dt date

)

67

Referential Triggers

Foreign key:

create table bank(

bank_code number(10),

bank_name varchar2(60),

constraint pk_bank_code primary key(bank_code)

)

Create table branch(

branch_code number(10) constraint pk_branch_code primary key,

bank_code number(10),

branch_name varchar2(60),

constraint fk_bank_code foreign key(bank_code) references bank(bank_code)

on delete set cascade

);

When violating referential constraints

In the default case, requested updates are rejected

Alternative actions may be requested

on <delete | update> < cascade | set null | no action>

updates (external table/master/parent) /deletes (external tables)

cascade Updates/ deletes child record automatically

set null change to null in the internal table(child or detail table)

no action reject action

User-defined Data Types

create domain [name] as known-domain [default-value] [constraints]

create domain person_name varchar2(60);

Data Manipulation Language(DML) in SQL

Inserting Rows

insert into table [(attributes)]

<values (values) | SQL-query >

68

insert into Students (Name,Number,Sex)

values ('Don',4123,'F')

69


Ben 3412 M

Dan 1234 M

Nel 2341 F

Name Number Sex

Ben 3412 M

Dan 1234 M

Nel 2341 F

Don 4123 F

The previous example inserts a single record, the following incorporates information

from an alternative table.

insert into Students

(select Name,Number,Sex

from Applicants

where State = 'OH')


Ben 3412 M

Dan 1234 M

Nel 2341 F

APPLICANTS Name Number Sex State

Don 4123 F OH

Pam 3421 F MI


Ben 3412 M

Dan 1234 M

Nel 2341 F

Don 4123 F

Incomplete insertions are similar to

insert into Students (Name,Number)

(select Name,Number

70

from Applicants

where State = 'OH')

Deleting Rows

delete from table

[where condition]

delete from Students

where Number < 2000


Ben 3412 M

Dan 1234 M

Nel 2341 F

Name Number Sex

Ben 3412 M

Nel 2341 F

Updating Attributes

update table

set attribute = <expr | SQL-query | null | default > ,...

[where condition]

update Students

set Name = 'Tom', Number = Number + 5

where Name = 'Dan'


Ben 3412 M

Dan 1234 M

Nel 2341 F

Name Number Sex

Ben 3412 M

Tom 1239 M

Nel 2341 F

Updating Table Definitions

alter table name

<

add constraint def |

drop constraint constraint |

add column def |

71

drop column name |

alter column name < set default value | drop default >

>

Names can be assigned to constraints by a prefix of the form constraint name.


Name varchar (5) not null,

Number numeric(4) primary key,

constraint foo primary key(Number)

)

alter table Student

add column BirthDate date

alter table bank

add constraint pk_bankcode primary key(bank_code);

Removing Components

drop < table | view | assertion > name [restrict | cascade ]

restrict asks the action to take place only if the component is empty

cascade removes the component and its dependents

72

Mysql Tutorial

9.5 Creating and Using a Database

Now that you know how to enter commands, it's time to access a database.

Suppose you have several pets in your home (your menagerie) and you'd like to keep track of

various types of information about them. You can do so by creating tables to hold your data

and loading them with the desired information. Then you can answer different sorts of

questions about your animals by retrieving data from the tables. This section shows you how

to:

• Create a database

• Create a table

• Load data into the table

• Retrieve data from the table in various ways

• Use multiple tables

The menagerie database will be simple (deliberately), but it is not difficult to think of real-

world situations in which a similar type of database might be used. For example, a database

like this could be used by a farmer to keep track of livestock, or by a veterinarian to keep

track of patient records.

Use the SHOW statement to find out what databases currently exist on the server:

mysql> SHOW DATABASES;

+----------+

| Database |

+----------+

| mysql |

| test |

| tmp |

+----------+

73

The list of databases is probably different on your machine, but the mysql and test databases

are likely to be among them. The mysql database is required because it describes user access

privileges. The test database is often provided as a workspace for users to try things out.

If the test database exists, try to access it:

mysql> USE test

Database changed

Note that USE, like QUIT, does not require a semicolon. (You can terminate such statements

with a semicolon if you like; it does no harm.) The USE statement is special in another way,

too: it must be given on a single line.

You can use the test database (if you have access to it) for the examples that follow, but

anything you create in that database can be removed by anyone else with access to it. For this

reason, you should probably ask your MySQL administrator for permission to use a database

of your own. Suppose you want to call yours menagerie. The administrator needs to execute a

command like this:

mysql> GRANT ALL ON menagerie.* TO your_mysql_name;

where your_mysql_name is the MySQL user name assigned to you.

9.5.1 Creating and Selecting a Database

If the administrator creates your database for you when setting up your permissions, you can

begin using it. Otherwise, you need to create it yourself:

mysql> CREATE DATABASE menagerie;

Under Unix, database names are case sensitive (unlike SQL keywords), so you must always

refer to your database as menagerie, not as Menagerie, MENAGERIE, or some other variant.

This is also true for table names. (Under Windows, this restriction does not apply, although

you must refer to databases and tables using the same lettercase throughout a given query.)

Creating a database does not select it for use; you must do that explicitly. To make menagerie

the current database, use this command:

74

mysql> USE menagerie

Database changed

Your database needs to be created only once, but you must select it for use each time you

begin a mysql session. You can do this by issuing a USE statement as shown above.

Alternatively, you can select the database on the command line when you invoke mysql. Just

specify its name after any connection parameters that you might need to provide. For

example:

shell> mysql -h host -u user -p menagerie

Enter password: ********

Note that menagerie is not your password on the command just shown. If you want to supply

your password on the command line after the -p option, you must do so with no intervening

space (for example, as -pmypassword, not as -p mypassword). However, putting your

password on the command line is not recommended, because doing so exposes it to snooping

by other users logged in on your machine.

9.5.2 Creating a Table

Creating the database is the easy part, but at this point it's empty, as SHOW TABLES will tell

you:

mysql> SHOW TABLES;

Empty set (0.00 sec)

The harder part is deciding what the structure of your database should be: what tables you

will need and what columns will be in each of them.

You'll want a table that contains a record for each of your pets. This can be called the pet

table, and it should contain, as a bare minimum, each animal's name. Because the name by

itself is not very interesting, the table should contain other information. For example, if more

than one person in your family keeps pets, you might want to list each animal's owner. You

might also want to record some basic descriptive information such as species and sex.

75

How about age? That might be of interest, but it's not a good thing to store in a database. Age

changes as time passes, which means you'd have to update your records often. Instead, it's

better to store a fixed value such as date of birth. Then, whenever you need age, you can

calculate it as the difference between the current date and the birth date. MySQL provides

functions for doing date arithmetic, so this is not difficult. Storing birth date rather than age

has other advantages, too:

• You can use the database for tasks such as generating reminders for upcoming pet

birthdays. (If you think this type of query is somewhat silly, note that it is the same

question you might ask in the context of a business database to identify clients to

whom you'll soon need to send out birthday greetings, for that computer-assisted

personal touch.)

• You can calculate age in relation to dates other than the current date. For example, if

you store death date in the database, you can easily calculate how old a pet was when

it died.

You can probably think of other types of information that would be useful in the pet table, but

the ones identified so far are sufficient for now: name, owner, species, sex, birth, and death.

Use a CREATE TABLE statement to specify the layout of your table:

mysql> CREATE TABLE pet (name VARCHAR(20), owner VARCHAR(20),

-> species VARCHAR(20), sex CHAR(1), birth DATE, death DATE);

VARCHAR is a good choice for the name, owner, and species columns because the column

values will vary in length. The lengths of those columns need not all be the same, and need

not be 20. You can pick any length from 1 to 255, whatever seems most reasonable to you. (If

you make a poor choice and it turns out later that you need a longer field, MySQL provides

an ALTER TABLE statement.)

Animal sex can be represented in a variety of ways, for example, "m" and "f" , or perhaps

"male" and "female". It's simplest to use the single characters "m" and "f" .

The use of the DATE data type for the birth and death columns is a fairly obvious choice.

Now that you have created a table, SHOW TABLES should produce some output:

76

mysql> SHOW TABLES;

+---------------------+

| Tables in menagerie |

+---------------------+

| pet |

+---------------------+

To verify that your table was created the way you expected, use a DESCRIBE statement:

mysql> DESCRIBE pet;

+---------+-------------+------+-----+---------+-------+

| Field | Type | Null | Key | Default | Extra |

+---------+-------------+------+-----+---------+-------+

| name | varchar(20) | YES | | NULL | |

| owner | varchar(20) | YES | | NULL | |

| species | varchar(20) | YES | | NULL | |

| sex | char(1) | YES | | NULL | |

| birth | date | YES | | NULL | |

| death | date | YES | | NULL | |

+---------+-------------+------+-----+---------+-------+

You can use DESCRIBE any time, for example, if you forget the names of the columns in

your table or what types they are.

9.5.3 Loading Data into a Table

After creating your table, you need to populate it. The LOAD DATA and INSERT statements

are useful for this.

Suppose your pet records can be described as shown below. (Observe that MySQL expects

dates in YYYY-MM-DD format; this may be different than what you are used to.)

name owner species sex birth death

Fluffy Harold cat f 1993-02-04

Claws Gwen cat m 1994-03-17

77

Buffy Harold dog f 1989-05-13

Fang Benny dog m 1990-08-27

Bowser Diane dog m 1998-08-31 1995-07-29

Chirpy Gwen bird f 1998-09-11

Whistler Gwen bird 1997-12-09

Slim Benny snake m 1996-04-29

Because you are beginning with an empty table, an easy way to populate it is to create a text

file containing a row for each of your animals, then load the contents of the file into the table

with a single statement.

You could create a text file `pet.txt' containing one record per line, with values separated by

tabs, and given in the order in which the columns were listed in the CREATE TABLE

statement. For missing values (such as unknown sexes or death dates for animals that are still

living), you can use NULL values. To represent these in your text file, use \N. For example,

the record for Whistler the bird would look like this (where the whitespace between values is

a single tab character):

Whistler Gwen bird \N 1997-12-09 \N

To load the text file `pet.txt' into the pet table, use this command:

mysql> LOAD DATA LOCAL INFILE "pet.txt" INTO TABLE pet;

You can specify the column value separator and end of line marker explicitly in the LOAD

DATA statement if you wish, but the defaults are tab and linefeed. These are sufficient for

the statement to read the file `pet.txt' properly.

When you want to add new records one at a time, the INSERT statement is useful. In its

simplest form, you supply values for each column, in the order in which the columns were

listed in the CREATE TABLE statement. Suppose Diane gets a new hamster named Puffball.

You could add a new record using an INSERT statement like this:

mysql> INSERT INTO pet

78

-> VALUES ('Puffball','Diane','hamster','f','1999-03-30',NULL);

Note that string and date values are specified as quoted strings here. Also, with INSERT, you

can insert NULL directly to represent a missing value. You do not use \N like you do with

LOAD DATA.

From this example, you should be able to see that there would be a lot more typing involved

to load your records initially using several INSERT statements rather than a single LOAD

DATA statement.

9.5.4 Retrieving Information from a Table

The SELECT statement is used to pull information from a table. The general form of the

statement is:

SELECT what_to_select

FROM which_table

WHERE conditions_to_satisfy

what_to_select indicates what you want to see. This can be a list of columns, or * to indicate

``all columns.'' which_table indicates the table from which you want to retrieve data. The

WHERE clause is optional. If it's present, conditions_to_satisfy specifies conditions that rows

must satisfy to qualify for retrieval.

9.5.4.1 Selecting All Data

The simplest form of SELECT retrieves everything from a table:

mysql> SELECT * FROM pet;

+----------+--------+---------+------+------------+------------+

| name | owner | species | sex | birth | death |

+----------+--------+---------+------+------------+------------+

| Fluffy | Harold | cat | f | 1993-02-04 | NULL |

| Claws | Gwen | cat | m | 1994-03-17 | NULL |

| Buffy | Harold | dog | f | 1989-05-13 | NULL |

79

| Fang | Benny | dog | m | 1990-08-27 | NULL |

| Bowser | Diane | dog | m | 1998-08-31 | 1995-07-29 |

| Chirpy | Gwen | bird | f | 1998-09-11 | NULL |

| Whistler | Gwen | bird | NULL | 1997-12-09 | NULL |

| Slim | Benny | snake | m | 1996-04-29 | NULL |

| Puffball | Diane | hamster | f | 1999-03-30 | NULL |

+----------+--------+---------+------+------------+------------+

This form of SELECT is useful if you want to review your entire table, for instance, after

you've just loaded it with your initial dataset. As it happens, the output just shown reveals an

error in your data file: Bowser appears to have been born after he died! Consulting your

original pedigree papers, you find that the correct birth year is 1989, not 1998.

There are are least a couple of ways to fix this:

• Edit the file pet.txt' to correct the error, then empty the table and reload it using

DELETE and LOAD DATA:

• mysql> SET AUTOCOMMIT=1; # Used for quick re-create of the table

• mysql> DELETE FROM pet;

• mysql> LOAD DATA LOCAL INFILE "pet.txt" INTO TABLE pet;

However, if you do this, you must also re-enter the record for Puffball.

• Fix only the erroneous record with an UPDATE statement:

• mysql> UPDATE pet SET birth = "1989-08-31" WHERE name = "Bowser";

As shown above, it is easy to retrieve an entire table. But typically you don't want to do that,

particularly when the table becomes large. Instead, you're usually more interested in

answering a particular question, in which case you specify some constraints on the

information you want. Let's look at some selection queries in terms of questions about your

pets that they answer.

9.5.4.2 Selecting Particular Rows

You can select only particular rows from your table. For example, if you want to verify the

change that you made to Bowser's birth date, select Bowser's record like this:

80

mysql> SELECT * FROM pet WHERE name = "Bowser";

+--------+-------+---------+------+------------+------------+


+--------+-------+---------+------+------------+------------+


+--------+-------+---------+------+------------+------------+

The output confirms that the year is correctly recorded now as 1989, not 1998.

String comparisons are normally case insensitive, so you can specify the name as "bowser",

"BOWSER", etc. The query result will be the same.

You can specify conditions on any column, not just name. For example, if you want to know

which animals were born after 1998, test the birth column:

mysql> SELECT * FROM pet WHERE birth >= "1998-1-1";

+----------+-------+---------+------+------------+-------+


+----------+-------+---------+------+------------+-------+


| Puffball | Diane | hamster | f | 1999-03-30 | NULL |

+----------+-------+---------+------+------------+-------+

You can combine conditions, for example, to locate female dogs:

mysql> SELECT * FROM pet WHERE species = "dog" AND sex = "f";

+-------+--------+---------+------+------------+-------+


+-------+--------+---------+------+------------+-------+


+-------+--------+---------+------+------------+-------+

The preceding query uses the AND logical operator. There is also an OR operator:

mysql> SELECT * FROM pet WHERE species = "snake" OR species = "bird";

+----------+-------+---------+------+------------+-------+

81


+----------+-------+---------+------+------------+-------+



| Slim | Benny | snake | m | 1996-04-29 | NULL |

+----------+-------+---------+------+------------+-------+

AND and OR may be intermixed. If you do that, it's a good idea to use parentheses to indicate

how conditions should be grouped:

mysql> SELECT * FROM pet WHERE (species = "cat" AND sex = "m")

-> OR (species = "dog" AND sex = "f");

+-------+--------+---------+------+------------+-------+


+-------+--------+---------+------+------------+-------+



+-------+--------+---------+------+------------+-------+

9.5.4.3 Selecting Particular Columns

If you don't want to see entire rows from your table, just name the columns in which you're

interested, separated by commas. For example, if you want to know when your animals were

born, select the name and birth columns:

mysql> SELECT name, birth FROM pet;

+----------+------------+

| name | birth |

+----------+------------+

| Fluffy | 1993-02-04 |

| Claws | 1994-03-17 |

| Buffy | 1989-05-13 |

| Fang | 1990-08-27 |

| Bowser | 1989-08-31 |

| Chirpy | 1998-09-11 |

82

| Whistler | 1997-12-09 |

| Slim | 1996-04-29 |

| Puffball | 1999-03-30 |

+----------+------------+

To find out who owns pets, use this query:

mysql> SELECT owner FROM pet;

+--------+

| owner |

+--------+

| Harold |

| Gwen |

| Harold |

| Benny |

| Diane |

| Gwen |

| Gwen |

| Benny |

| Diane |

+--------+

However, notice that the query simply retrieves the owner field from each record, and some

of them appear more than once. To minimize the output, retrieve each unique output record

just once by adding the keyword DISTINCT:

mysql> SELECT DISTINCT owner FROM pet;

+--------+

| owner |

+--------+

| Benny |

| Diane |

| Gwen |

| Harold |

+--------+

83

You can use a WHERE clause to combine row selection with column selection. For example,

to get birth dates for dogs and cats only, use this query:

mysql> SELECT name, species, birth FROM pet

-> WHERE species = "dog" OR species = "cat";

+--------+---------+------------+

| name | species | birth |

+--------+---------+------------+

| Fluffy | cat | 1993-02-04 |

| Claws | cat | 1994-03-17 |

| Buffy | dog | 1989-05-13 |

| Fang | dog | 1990-08-27 |

| Bowser | dog | 1989-08-31 |

+--------+---------+------------+

9.5.4.4 Sorting Rows

You may have noticed in the preceding examples that the result rows are displayed in no

particular order. However, it's often easier to examine query output when the rows are sorted

in some meaningful way. To sort a result, use an ORDER BY clause.

Here are animal birthdays, sorted by date:

mysql> SELECT name, birth FROM pet ORDER BY birth;

+----------+------------+

| name | birth |

+----------+------------+

| Buffy | 1989-05-13 |

| Bowser | 1989-08-31 |

| Fang | 1990-08-27 |

| Fluffy | 1993-02-04 |

| Claws | 1994-03-17 |

| Slim | 1996-04-29 |

| Whistler | 1997-12-09 |

| Chirpy | 1998-09-11 |

84

| Puffball | 1999-03-30 |

+----------+------------+

To sort in reverse order, add the DESC (descending) keyword to the name of the column you

are sorting by:

mysql> SELECT name, birth FROM pet ORDER BY birth DESC;

+----------+------------+

| name | birth |

+----------+------------+

| Puffball | 1999-03-30 |

| Chirpy | 1998-09-11 |

| Whistler | 1997-12-09 |

| Slim | 1996-04-29 |

| Claws | 1994-03-17 |

| Fluffy | 1993-02-04 |

| Fang | 1990-08-27 |

| Bowser | 1989-08-31 |

| Buffy | 1989-05-13 |

+----------+------------+

You can sort on multiple columns. For example, to sort by type of animal, then by birth date

within animal type with youngest animals first, use the following query:

mysql> SELECT name, species, birth FROM pet ORDER BY species, birth DESC;

+----------+---------+------------+

| name | species | birth |

+----------+---------+------------+

| Chirpy | bird | 1998-09-11 |

| Whistler | bird | 1997-12-09 |

| Claws | cat | 1994-03-17 |

| Fluffy | cat | 1993-02-04 |

| Fang | dog | 1990-08-27 |

| Bowser | dog | 1989-08-31 |

| Buffy | dog | 1989-05-13 |

85

| Puffball | hamster | 1999-03-30 |

| Slim | snake | 1996-04-29 |

+----------+---------+------------+

Note that the DESC keyword applies only to the column name immediately preceding it

(birth); species values are still sorted in ascending order.

9.5.4.5 Date Calculations

MySQL provides several functions that you can use to perform calculations on dates, for

example, to calculate ages or extract parts of dates.

To determine how many years old each of your pets is, compute age as the difference

between the birth date and the current date. Do this by converting the two dates to days, take

the difference, and divide by 365 (the number of days in a year):

mysql> SELECT name, (TO_DAYS(NOW())-TO_DAYS(birth))/365 FROM pet;

+----------+-------------------------------------+

| name | (TO_DAYS(NOW())-TO_DAYS(birth))/365 |

+----------+-------------------------------------+

| Fluffy | 6.15 |

| Claws | 5.04 |

| Buffy | 9.88 |

| Fang | 8.59 |

| Bowser | 9.58 |

| Chirpy | 0.55 |

| Whistler | 1.30 |

| Slim | 2.92 |

| Puffball | 0.00 |

+----------+-------------------------------------+

Although the query works, there are some things about it that could be improved. First, the

result could be scanned more easily if the rows were presented in some order. Second, the

heading for the age column isn't very meaningful.

86

The first problem can be handled by adding an ORDER BY name clause to sort the output by

name. To deal with the column heading, provide a name for the column so that a different

label appears in the output (this is called a column alias):

mysql> SELECT name, (TO_DAYS(NOW())-TO_DAYS(birth))/365 AS age

-> FROM pet ORDER BY name;

+----------+------+

| name | age |

+----------+------+

| Bowser | 9.58 |

| Buffy | 9.88 |

| Chirpy | 0.55 |

| Claws | 5.04 |

| Fang | 8.59 |

| Fluffy | 6.15 |

| Puffball | 0.00 |

| Slim | 2.92 |

| Whistler | 1.30 |

+----------+------+

To sort the output by age rather than name, just use a different ORDER BY clause:

mysql> SELECT name, (TO_DAYS(NOW())-TO_DAYS(birth))/365 AS age

-> FROM pet ORDER BY age;

+----------+------+

| name | age |

+----------+------+

| Puffball | 0.00 |

| Chirpy | 0.55 |

| Whistler | 1.30 |

| Slim | 2.92 |

| Claws | 5.04 |

| Fluffy | 6.15 |

| Fang | 8.59 |

| Bowser | 9.58 |

87

| Buffy | 9.88 |

+----------+------+

A similar query can be used to determine age at death for animals that have died. You

determine which animals these are by checking whether or not the death value is NULL.

Then, for those with non-NULL values, compute the difference between the death and birth

values:

mysql> SELECT name, birth, death, (TO_DAYS(death)-TO_DAYS(birth))/365 AS age

-> FROM pet WHERE death IS NOT NULL ORDER BY age;

+--------+------------+------------+------+

| name | birth | death | age |

+--------+------------+------------+------+

| Bowser | 1989-08-31 | 1995-07-29 | 5.91 |

+--------+------------+------------+------+

The query uses death IS NOT NULL rather than death != NULL because NULL is a special

value. This is explained later. See section 9.5.4.6 Working with NULL Values.

What if you want to know which animals have birthdays next month? For this type of

calculation, year and day are irrelevant; you simply want to extract the month part of the birth

column. MySQL provides several date-part extraction functions, such as YEAR(),

MONTH(), and DAYOFMONTH(). MONTH() is the appropriate function here. To see how

it works, run a simple query that displays the value of both birth and MONTH(birth):

mysql> SELECT name, birth, MONTH(birth) FROM pet;

+----------+------------+--------------+

| name | birth | MONTH(birth) |

+----------+------------+--------------+

| Fluffy | 1993-02-04 | 2 |

| Claws | 1994-03-17 | 3 |

| Buffy | 1989-05-13 | 5 |

| Fang | 1990-08-27 | 8 |

| Bowser | 1989-08-31 | 8 |

| Chirpy | 1998-09-11 | 9 |

88

| Whistler | 1997-12-09 | 12 |

| Slim | 1996-04-29 | 4 |

| Puffball | 1999-03-30 | 3 |

+----------+------------+--------------+

Finding animals with birthdays in the upcoming month is easy, too. Suppose the current

month is April. Then the month value is 4 and you look for animals born in May (month 5)

like this:

mysql> SELECT name, birth FROM pet WHERE MONTH(birth) = 5;

+-------+------------+

| name | birth |

+-------+------------+

| Buffy | 1989-05-13 |

+-------+------------+

There is a small complication if the current month is December, of course. You don't just add

one to the month number (12) and look for animals born in month 13, because there is no

such month. Instead, you look for animals born in January (month 1).

You can even write the query so that it works no matter what the current month is. That way

you don't have to use a particular month number in the query. DATE_ADD() allows you to

add a time interval to a given date. If you add a month to the value of NOW(), then extract

the month part with MONTH(), the result produces the month in which to look for birthdays:

mysql> SELECT name, birth FROM pet

-> WHERE MONTH(birth) = MONTH(DATE_ADD(NOW(), INTERVAL 1 MONTH));

A different way to accomplish the same task is to add 1 to get the next month after the current

one (after using the modulo function (MOD) to wrap around the month value to 0 if it is

currently 12):

mysql> SELECT name, birth FROM pet

-> WHERE MONTH(birth) = MOD(MONTH(NOW()), 12) + 1;

89

Note that MONTH returns a number between 1 and 12. And MOD(something,12) returns a

number between 0 and 11. So the addition has to be after the MOD() otherwise we would go

from November (11) to January (1).

9.5.4.6 Working with NULL Values

The NULL value can be surprising until you get used to it. Conceptually, NULL means

missing value or unknown value and it is treated somewhat differently than other values. To

test for NULL, you cannot use the arithmetic comparison operators such as =, <, or !=. To

demonstrate this for yourself, try the following query:

mysql> SELECT 1 = NULL, 1 != NULL, 1 < NULL, 1 > NULL;

+----------+-----------+----------+----------+

| 1 = NULL | 1 != NULL | 1 < NULL | 1 > NULL |

+----------+-----------+----------+----------+

| NULL | NULL | NULL | NULL |

+----------+-----------+----------+----------+

Clearly you get no meaningful results from these comparisons. Use the IS NULL and IS NOT

NULL operators instead:

mysql> SELECT 1 IS NULL, 1 IS NOT NULL;

+-----------+---------------+

| 1 IS NULL | 1 IS NOT NULL |

+-----------+---------------+

| 0 | 1 |

+-----------+---------------+

In MySQL , 0 means false and 1 means true.

This special treatment of NULL is why, in the previous section, it was necessary to determine

which animals are no longer alive using death IS NOT NULL instead of death != NULL.

9.5.4.7 Pattern Matching

90

MySQL provides standard SQL pattern matching as well as a form of pattern matching based

on extended regular expressions similar to those used by Unix utilities such as vi, grep, and

sed.

SQL pattern matching allows you to use `_' to match any single character and `%' to match an

arbitrary number of characters (including zero characters). In MySQL , SQL patterns are case

insensitive by default. Some examples are shown below. Note that you do not use = or !=

when you use SQL patterns; use the LIKE or NOT LIKE comparison operators instead.

To find names beginning with `b':

mysql> SELECT * FROM pet WHERE name LIKE "b%";

+--------+--------+---------+------+------------+------------+


+--------+--------+---------+------+------------+------------+



+--------+--------+---------+------+------------+------------+

To find names ending with `fy':

mysql> SELECT * FROM pet WHERE name LIKE "%fy";

+--------+--------+---------+------+------------+-------+


+--------+--------+---------+------+------------+-------+



+--------+--------+---------+------+------------+-------+

To find names containing a `w':

mysql> SELECT * FROM pet WHERE name LIKE "%w%";

+----------+-------+---------+------+------------+------------+


+----------+-------+---------+------+------------+------------+


91



+----------+-------+---------+------+------------+------------+

To find names containing exactly five characters, use the _' pattern character:

mysql> SELECT * FROM pet WHERE name LIKE "_____";

+-------+--------+---------+------+------------+-------+


+-------+--------+---------+------+------------+-------+



+-------+--------+---------+------+------------+-------+

The other type of pattern matching provided by MySQL uses extended regular expressions.

When you test for a match for this type of pattern, use the REGEXP and NOT REGEXP

operators (or RLIKE and NOT RLIKE, which are synonyms).

Some characteristics of extended regular expressions are:

• `.' matches any single character.

• A character class `[...]' matches any character within the brackets. For example, [abc]'

matches a', b', or c'. To name a range of characters, use a dash. `[a-z]' matches any

lowercase letter, whereas `[0-9]' matches any digit.

• `*' matches zero or more instances of the thing preceding it. For example, `x*'

matches any number of `x' characters, `[0-9]*' matches any number of digits, and `.*'

matches any number of anything.

• Regular expressions are case sensitive, but you can use a character class to match both

lettercases if you wish. For example, `[aA]' matches lowercase or uppercase `a' and

`[a-zA-Z]' matches any letter in either case.

• The pattern matches if it occurs anywhere in the value being tested. (SQL patterns

match only if they match the entire value.)

• To anchor a pattern so that it must match the beginning or end of the value being

tested, use `^' at the beginning or `$' at the end of the pattern.

92

To demonstrate how extended regular expressions work, the LIKE queries shown above are

rewritten below to use REGEXP.

To find names beginning with `b', use ^' to match the beginning of the name and `[bB]' to

match either lowercase or uppercase `b':

mysql> SELECT * FROM pet WHERE name REGEXP "^[bB]";

+--------+--------+---------+------+------------+------------+


+--------+--------+---------+------+------------+------------+



+--------+--------+---------+------+------------+------------+

To find names ending with `fy', use $' to match the end of the name:

mysql> SELECT * FROM pet WHERE name REGEXP "fy$";

+--------+--------+---------+------+------------+-------+


+--------+--------+---------+------+------------+-------+



+--------+--------+---------+------+------------+-------+

To find names containing a `w', use [wW]' to match either lowercase or uppercase `w':

mysql> SELECT * FROM pet WHERE name REGEXP "[wW]";

+----------+-------+---------+------+------------+------------+


+----------+-------+---------+------+------------+------------+




+----------+-------+---------+------+------------+------------+

93

Because a regular expression pattern matches if it occurs anywhere in the value, it is not

necessary in the previous query to put a wild card on either side of the pattern to get it to

match the entire value like it would be if you used a SQL pattern.

To find names containing exactly five characters, use ^' and $' to match the beginning and

end of the name, and five instances of `.' in between:

mysql> SELECT * FROM pet WHERE name REGEXP "^.....$";

+-------+--------+---------+------+------------+-------+


+-------+--------+---------+------+------------+-------+



+-------+--------+---------+------+------------+-------+

You could also write the previous query using the `n' ``repeat-n-times'' operator:

mysql> SELECT * FROM pet WHERE name REGEXP "^.5$";

+-------+--------+---------+------+------------+-------+


+-------+--------+---------+------+------------+-------+



+-------+--------+---------+------+------------+-------+

9.5.4.8 Counting Rows

Databases are often used to answer the question, ``How often does a certain type of data

occur in a table?'' For example, you might want to know how many pets you have, or how

many pets each owner has, or you might want to perform various kinds of censuses on your

animals.

Counting the total number of animals you have is the same question as ``How many rows are

in the pet table?'' because there is one record per pet. The COUNT() function counts the

number of non-NULL results, so the query to count your animals looks like this:

94

mysql> SELECT COUNT(*) FROM pet;

+----------+

| COUNT(*) |

+----------+

| 9 |

+----------+

Earlier, you retrieved the names of the people who owned pets. You can use COUNT() if you

want to find out how many pets each owner has:

mysql> SELECT owner, COUNT(*) FROM pet GROUP BY owner;

+--------+----------+

| owner | COUNT(*) |

+--------+----------+

| Benny | 2 |

| Diane | 2 |

| Gwen | 3 |

| Harold | 2 |

+--------+----------+

Note the use of GROUP BY to group together all records for each owner. Without it, all you

get is an error message:

mysql> SELECT owner, COUNT(owner) FROM pet;

ERROR 1140 at line 1: Mixing of GROUP columns (MIN(),MAX(),COUNT()...)

with no GROUP columns is illegal if there is no GROUP BY clause

COUNT() and GROUP BY are useful for characterizing your data in various ways. The

following examples show different ways to perform animal census operations.

Number of animals per species:

mysql> SELECT species, COUNT(*) FROM pet GROUP BY species;

+---------+----------+

| species | COUNT(*) |

+---------+----------+

95

| bird | 2 |

| cat | 2 |

| dog | 3 |

| hamster | 1 |

| snake | 1 |

+---------+----------+

Number of animals per sex:

mysql> SELECT sex, COUNT(*) FROM pet GROUP BY sex;

+------+----------+

| sex | COUNT(*) |

+------+----------+

| NULL | 1 |

| f | 4 |

| m | 4 |

+------+----------+

(In this output, NULL indicates sex unknown.)

Number of animals per combination of species and sex:

mysql> SELECT species, sex, COUNT(*) FROM pet GROUP BY species, sex;

+---------+------+----------+

| species | sex | COUNT(*) |

+---------+------+----------+

| bird | NULL | 1 |

| bird | f | 1 |

| cat | f | 1 |

| cat | m | 1 |

| dog | f | 1 |

| dog | m | 2 |

| hamster | f | 1 |

| snake | m | 1 |

+---------+------+----------+

96

You need not retrieve an entire table when you use COUNT(). For example, the previous

query, when performed just on dogs and cats, looks like this:

mysql> SELECT species, sex, COUNT(*) FROM pet

-> WHERE species = "dog" OR species = "cat"

-> GROUP BY species, sex;

+---------+------+----------+


+---------+------+----------+

| cat | f | 1 |

| cat | m | 1 |

| dog | f | 1 |

| dog | m | 2 |

+---------+------+----------+

Or, if you wanted the number of animals per sex only for known-sex animals:

mysql> SELECT species, sex, COUNT(*) FROM pet

-> WHERE sex IS NOT NULL

-> GROUP BY species, sex;

+---------+------+----------+


+---------+------+----------+

| bird | f | 1 |

| cat | f | 1 |

| cat | m | 1 |

| dog | f | 1 |

| dog | m | 2 |

| hamster | f | 1 |

| snake | m | 1 |

+---------+------+----------+

9.5.5 Using More Than one Table

97

The pet table keeps track of which pets you have. If you want to record other information

about them, such as events in their lives like visits to the vet or when litters are born, you

need another table. What should this table look like? It needs:

• To contain the pet name so you know which animal each event pertains to.

• A date so you know when the event occurred.

• A field to describe the event.

• An event type field, if you want to be able to categorize events.

Given these considerations, the CREATE TABLE statement for the event table might look

like this:

mysql> CREATE TABLE event (name VARCHAR(20), date DATE,

-> type VARCHAR(15), remark VARCHAR(255));

As with the pet table, it's easiest to load the initial records by creating a tab-delimited text file

containing the information:

Fluffy 1995-05-15 litter 4 kittens, 3 female, 1 male

Buffy 1993-06-23 litter 5 puppies, 2 female, 3 male

Buffy 1994-06-19 litter 3 puppies, 3 female

Chirpy 1999-03-21 vet needed beak straightened

Slim 1997-08-03 vet broken rib

Bowser 1991-10-12 kennel

Fang 1991-10-12 kennel

Fang 1998-08-28 birthday Gave him a new chew toy

Claws 1998-03-17 birthday Gave him a new flea collar

Whistler 1998-12-09 birthday First birthday

Load the records like this:

mysql> LOAD DATA LOCAL INFILE "event.txt" INTO TABLE event;

98

Based on what you've learned from the queries you've run on the pet table, you should be

able to perform retrievals on the records in the event table; the principles are the same. But

when is the event table by itself insufficient to answer questions you might ask?

Suppose you want to find out the ages of each pet when they had their litters. The event table

indicates when this occurred, but to calculate the age of the mother, you need her birth date.

Because that is stored in the pet table, you need both tables for the query:

mysql> SELECT pet.name, (TO_DAYS(date) - TO_DAYS(birth))/365 AS age, remark

-> FROM pet, event

-> WHERE pet.name = event.name AND type = "litter";

+--------+------+-----------------------------+

| name | age | remark |

+--------+------+-----------------------------+

| Fluffy | 2.27 | 4 kittens, 3 female, 1 male |

| Buffy | 4.12 | 5 puppies, 2 female, 3 male |

| Buffy | 5.10 | 3 puppies, 3 female |

+--------+------+-----------------------------+

There are several things to note about this query:

• The FROM clause lists two tables because the query needs to pull information from

both of them.

• When combining (joining) information from multiple tables, you need to specify how

records in one table can be matched to records in the other. This is easy because they

both have a name column. The query uses WHERE clause to match up records in the

two tables based on the name values.

• Because the name column occurs in both tables, you must be specific about which

table you mean when referring to the column. This is done by prepending the table

name to the column name.

You need not have two different tables to perform a join. Sometimes it is useful to join a

table to itself, if you want to compare records in a table to other records in that same table.

For example, to find breeding pairs among your pets, you can join the pet table with itself to

pair up males and females of like species:

99

mysql> SELECT p1.name, p1.sex, p2.name, p2.sex, p1.species

-> FROM pet AS p1, pet AS p2

-> WHERE p1.species = p2.species AND p1.sex = "f" AND p2.sex = "m";

+--------+------+--------+------+---------+

| name | sex | name | sex | species |

+--------+------+--------+------+---------+

| Fluffy | f | Claws | m | cat |

| Buffy | f | Fang | m | dog |

| Buffy | f | Bowser | m | dog |

+--------+------+--------+------+---------+

In this query, we specify aliases for the table name in order to refer to the columns and keep

straight which instance of the table each column reference is associated with.

9.6 Getting Information About Databases and Tables

What if you forget the name of a database or table, or what the structure of a given table is

(for example, what its columns are called)? MySQL addresses this problem through several

statements that provide information about the databases and tables it supports.

You have already seen SHOW DATABASES, which lists the databases managed by the

server. To find out which database is currently selected, use the DATABASE() function:

mysql> SELECT DATABASE();

+------------+

| DATABASE() |

+------------+

| menagerie |

+------------+

If you haven't selected any database yet, the result is blank.

To find out what tables the current database contains (for example, when you're not sure

about the name of a table), use this command:

mysql> SHOW TABLES;

100

+---------------------+

| Tables in menagerie |

+---------------------+

| event |

| pet |

+---------------------+

If you want to find out about the structure of a table, the DESCRIBE command is useful; it

displays information about each of a table's columns:

mysql> DESCRIBE pet;

+---------+-------------+------+-----+---------+-------+

| Field | Type | Null | Key | Default | Extra |

+---------+-------------+------+-----+---------+-------+

| name | varchar(20) | YES | | NULL | |

| owner | varchar(20) | YES | | NULL | |

| species | varchar(20) | YES | | NULL | |

| sex | char(1) | YES | | NULL | |

| birth | date | YES | | NULL | |

| death | date | YES | | NULL | |

+---------+-------------+------+-----+---------+-------+

Field indicates the column name, Type is the data type for the column, Null indicates whether

or not the column can contain NULL values, Key indicates whether or not the column is

indexed, and Default specifies the column's default value.

If you have indexes on a table, SHOW INDEX FROM tbl_name produces information about

them.

DBMS

Documents

extended regular

flight reservation

menu driven

cartesian

referential

object oriented

data manipulation

functional