Page 1
1
Definition of Database
Data are facts that can be recorded and have implicit meaning. Data refers to values
such as names, telephone, addresses that can be easily stored inside diary, PC or floppy. Data
is actually stored in the database and information refers to the meaning of that data as
understood by user.
The database is collection of related data. A database has the following implicit
properties.
i. A database represents some aspect of the real world, sometimes called the miniworld or
the Universe of Discourse (UoD). Changes to the miniworld are reflected in the
database.
ii. A database is a logically coherent collection of data with some inherent meaning.
iii. A database is designed built and populated with data for a specific purpose. It has an
intended group of users and some applications.
Database can be of any size. Example for Sources of databases is patients in hospital,
bank, university, government department etc.
Definition of DBMS
DBMS means database Management System. It is a collection of programs that enables
users to create and maintain database as well as enables to store, modify and extract
information from the database. DBMS is software for defining, constructing and
manipulating databases. It is also called database manager or database server. Example of
DBMS are Ms. Access, oracle, MYSQL, Ms. SQL server etc.
Thus the goal of DBMS is to provide an environment that is both convenient and
efficient to use in retrieving and storing database information. In DBMS, user issue request
for information then DBMS analyzes and some internal processing takes place and then the
result is sent back to the user.
Definition of Database System
Database system is computerized record keeping system. E.g. computerized library
system, flight reservation system, automated teller machine etc. Database and the DBMS
software collectively known as database system.
The following operations take place in the database system.
Page 2
2
i. Adding new / empty files to database.
ii. Inserting, retrieving, updating, deleting data from existing database.
iii. Removing existing files from database.
dbms
database
application programs
end users
Fig. simplified picture of a database system
Software to access stored data
Software to process queries/programs
Application programs/queries
users/programmers
Stored database definition
Stored database
DBMS
Fig. A simplified database system environment
Advantages of database system over paper based methods of record keeping are (i)
compactness (ii) speed and (iii) accuracy
Characteristics of Database Approach
There are a no. of characteristics which distinguish the database from the traditional
approach of programming with files. In the traditional approach of programming with files,
Page 3
3
many users may be using the same data such as student name separately. Thus data is
duplicated and leads to wastage of storage space.
Main characteristics of database approach versus the file processing approach are as
follows
i) Self describing nature of a database system
The definition or description of the database is stored in the system catalog separately
and thus are available to users.
Stored database definition
Stored database
DBMS
The system catalog stores structure and details of database only and no other data. thus
the system catalog inside dbms describes database itself.
ii) Insulation between programs and data, and data abstraction
In traditional file processing, the changes the structure of data file may require
changing all programs that access this file but the DBMS changes catalog information only.
Thus both the program and data are independent and also called program data independence.
Data abstraction : DBMS provides user with a conceptual representation of data that
does not include many of details of how the data is stored or how the operations are being
implemented. Suppose the example of car. People don't think of a car as set of tens of
thousands of individual parts. They think of it as a well defined object with its own behavior.
Similarly data abstraction hides the complexity. Data model is a type of data abstraction.
iii) Support of multiple views of data
A database typically has many users, each of whom may require a different perspective
or view of the database. A view may be portion or subset of the database. It is also called
virtual table as it may contain virtual data. Users shouldn't be given the whole privilege for
security purpose about some users may not be aware of whether the data they refer to is
stored or derived. The DBMS supports multiple news view of data in a multi-user DBMS.
Page 4
4
iv) Sharing of data and multi-user transaction processing
Many user can select, update data at the same time. So dbms must support concurrency
control. for example in applications such as train/bus reservation system, flight reservation
system, many users use the system from different locations at the same time and so is sharing
of data and multi-user transaction processing.
Advantages and benefits using DBMS
Advantages of using DBMS are as follows
i) Controlling redundancy
In traditional file processing system, each user maintains their own file and so there
may be duplication of data. Storing same data multiple times lead to several problems such as
wastage of space, duplication effort for entering data, data may become inconsistent.
ii) Restricting unauthorized access (security)
Confidential data should not be available to all users. User accounts with certain
restrictions to data may be created for security. Similarly multiple views can be created for
database security. In traditional file processing, if own get file gets everything & all data.
iii) Providing persistent storage for program objects and data structure
The values of program variables are discarded once the program terminates as in C,
C++ pascal program unless the programmer writes them in files. A complex object in C++
can be stored permanently in an object oriented DBMS.
iv) Permitting inferencing and actions using rules
Database system may be deductive or active. Deductive databases have capabilities for
defining deduction rules for inferencing new information from stored database. It works like
reporting system.
v) Providing multiple user interfaces
DBMS provides variety of interfaces for varying users. DBMS provide query language
for casual user, programming languages for application programmers, forms and command
for parametric users, menu driven interfaces for stand alone users. Form styles and menu
driven interfaces are collectively called GUI (Graphical user interface)
vi) Representing complex relationship among data
Relationships may be created among data using DBMS which helps in managing the
data and defining constraints for updating and deleting.
Page 5
5
vii) Enforcing integrity constraints
Something that limits data is called constraints in database. For example, the minimum
balance should not fall in a bank. It is a constraint. Some of the constraints are primary key,
NOT NULL, check.
viii) Providing backup and recovery
DBMS provides facilities for taking backup of the database which can be used for
recovery in case of failure of computer system or hardware system.
ix) Easy in accessing data
It becomes very easy and fast while accessing data from database using DBMS.
Reports can be used for easy access of data.
x) Concurrent access to database
Many users can share the data at the same time and thus dbms provides users to access
the database concurrently.
Database system concepts and architecture
Data Model
Data model is a collection of tools for describing data, data relationship and consistency
constraints. It is used to describe the structure of a database, basic operations for specifying
retrievals and updates on the database.
Data model is a type of data abstraction. 3 levels of data abstraction are as follows.
i) Physical level : It is also called internal or low level data model. It describes about how
data is actually stored in the database.
ii) Logical level : Next higher level is the logical level which describes about what data are
stored and its relationship.
iii) View level : It is the highest level and describes about multiple views of data.
Many data models have been proposed.
Categories of data model:
1) Object based logical models
2) Record based logical models
Page 6
6
3) Physical models
1. Object based logical models
It is used in describing data at the logical and view levels. There are many different
models. Some of them are:
i) Entity-relationship model
ii) Object-oriented model
iii) Semantic data model
iv) Functional data model
i) Entity-relationship model:
The entity relationship (ER) data model is based on a perception of a real world that
consists of a collection of basic objects, called entities, and of relationships among these
objects.
Customer Name Address
Customer Depositor
Customer Name Address
Customer
Fig. A sample E-R diagram
ii) Object oriented model
The object oriented model is based on a collection of object. An object contains values
stored in instance variables within the object. An object also contains bodies of code that
operate on the object. These bodies of code are called methods. Objects that contain the same
types of values and the same methods are grouped together into classes.
iii) Semantic data models
It is similar to E-R modeling. It is also called object modeling. It also supports entity,
which has properties and relationship.
iv) Functional data model
It is based on functions instead of relations. The functional approach shares certain
ideas with object approach. It addresses object, which are functionally related to other.
Page 7
7
2. Record based logical models
Record based logical models are used in describing data at the logical and view levels.
It is used to specify the overall logical structure of the database.
Record based models are so named because the database is structured in fixed format
records of several types. Each record type defines a fixed no. of fields, or attributes, and each
field is usually of a fixed length. The three most widely accepted record based data models
are the relational, network and hierarchical models.
i) Relational model
The relational model uses a collection of tables to represent both data and the
relationships among those data. Each table has multiple columns, and each column has a
unique name.
Customer Name Address Account No. Account No. Balance
Ram KTM A-1 A-1 500
Laxman Lalitpur A-2 A-2 700
Bharat Jhapa A-3 A-3 900
Fig. A sample relational database
ii)Network model
Data in the network model are represented by collections of records and relationships
among data are represented by links, which can be viewed as pointers.
Ram KTM 0001 A-1 500
Laxman Lalitpur 0002 A-2 900
Bharat Janakpur 0003 A-3 6000
Fig. A sample Network database
CustomersName DustomerStreet CustomerCity deposit Account No Balance
Customer Account
Fig. data structure diagram for network data model
Page 8
8
iii) Hierarchical model
It is similar to the network model in the sense that data and relationships among data
are represented by records and links, respectively. Records are organized as collections of
trees rather than arbitrary graphs.
Ram KTM ……
Laxman Jhapa …..
A-1 700 A-2 500
A-3 900 A-4 700
Bharat KTM …….
A-5 500 Sita KTM …….
A-6 600
Fig. A sample hierarchical database
Custom erName CustomerStreet CustomerCity Customer
Account No. Balance Account
Fig. Tree structure diagram for hierarchical model
Physical models
Physical data models are used to describe data at the lowest level. There are only few
physical data modes in use. Two widely known ones are the unifying model and the frame
memory model.
Schemas and instances
Database=description of database + database itself
The overall design of the database is called database schema. Database schema is
specified during database design and not expected to change frequently.
Student Course
Page 9
9
Name Class Major Course Name Duration Remarks
Fig. Schema Diagram
Database changes over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called instances in the database.
It is also called database state or snapshot or current set of occurrences. When database is
designed, the database is in empty state with no data. It is in initial state when database is
loaded with data. Thus at any point, database has a current state. When any field is added to
database, it is called schema evolution.
DBMS Architecture
The main characteristics of database approach are (i) insulation of programs and data
(ii) support of multiple user views (iii) use of a catalog to store the database description. The
architecture of the DBMS is proposed to visualise these characteristics and so called the three
schema architecture. It is also called ANSI/SPARC (American National standard
Institute/Standards planning and requirements committee) Architecture.
Goal of the architecture is to separate the user applications and physical database. In
this architecture, schemas can be defined as the following three levels.
i) The internal level
It has an internal schema which describes the physical structure of the database. It
describes the complete details of data storage and access paths for the database.
ii) The conceptual level
It has a conceptual schema, which describes the structure of the whole database for a
community of users. It hides the details of physical storage structures and concentrates on
describing entities, data types, relationships, user operations and constrains.
iii) The external or view level
It includes a number of external schemas or user views. Each external schema describes
the past of the database that a particular user group is interested in and hides the rest of the
database from that user group.
Page 10
10
External view External view
Conceptual schema
Internal schema
Stored database
USER USER
External/conceptural Mappping
conceptural /Internal Mappping
Internal Level
Conceptual Level
External Level
Fig. Three schema Architecture
Three schema architecture is a tool for the user to visualize the schema levels in a
database system. Most DBMS don't separate the three levels data actually exists at the
physical level. User/groups refer only to its own external schema. So DBMS must transform
a request from users into a request against conceptual schema and then into a request on the
internal schema for processing over the stored database. If the request is a database retrieval,
the data extracted from the stored database must be reformatted to match the user's external
view. The process of transforming requests and results between levels are called mappings.
DATA INDEPENDENCE
Data independence is defined as the capacity of DBMS to change the schema at one
level of a database system without having to change the schema at the next higher level. We
can define two types of data independence.
i) Logical data independence
Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the conceptual
Page 11
11
schema to expand the database (by addressing a record type or data item) or to reduce the
database (by removing a record type or data item). It results in change in E-R diagram but the
application program or external schema is not changed.
ii) Physical data independence
Physical data independence is the capacity to change the internal schema without
having to change the conceptual (or external) schemas. Changes to the internal schema may
be needed because some physical files had to be reorganized. For example, by creating
additional access structures to improve the performance of retrieval or update. If the same
data as before remains in the database, we should not have to change the conceptual schema.
In multiple level DBMS, its catalog must be expanded to include information on how to
map requests and data among the various levels. In data independence, when the schema is
changed at some level, the schema at the next higher level remains unchanged only mappings
change.
DATABASE LANGUAGES
A database system provides mainly two different types of languages: one to specify the
database schema, called data definition language and the other to express database queries
and updates called data manipulation language.
i) Data definition language (DDL):
A database schema is specified by a set of definitions expressed by a special language
called a data-definition language. The result of compilation of DDL statements is a set of
tables that is stored in a special file called data dictionary or data directory. A data dictionary
is a file that contains metadata that is data about data. DBMS will have a DDL complier,
which process DDL statements. DDL is used to specify conceptual schema only. Similarly,
SDL (storage definition language) is used to specify the internal schema & VDL (view
definition language) is used to specify user views and their mappings to the conceptual
schema.
iii) Data manipulation language (DML)
A data manipulation language is a language that enables users to access or manipulate
data as organized by the appropriate data model.
Data manipulation consists of
Page 12
12
The retrieval of information stored in the database
The insertion of new information into the database
The deletion of information from the database
The modification/update of information stored in the database
DML is of the following 2 types
a) Non-procedural DMLs
The language requires a user to specify what data are needed without specifying how to
get those data. It is easier to learn and use. Many DBMS allow it either to be entered
interactively from a terminal or to be embedded in a general purpose programming language.
For example SQL (structured query language). SQL can retrieve many records in a single
DML statement and hence it is also called set at a time or set oriented language. It is also
called high level language.
b) Procedural (Low Level) DMLs
The language requires a user to specify what data are needed and now to get those data.
It is embedded in a general purpose programming language. This type of DML retrieves
records one by one and processes each record separately using programming language
construct such as looping and hence it is also called record at a time DML.
When DML are embedded in a general purpose-programming language, then that
language is called host language and the DML is called the data sub language.
DBMS Interfaces
DBMS provides the following user-friendly interfaces.
i) Menu based interfaces for browsing
These interfaces present the user with lists of options, called menus that lead the user
through the formulation of a request. The query is composed step by step by picking optional
from a menu that is displayed by the system. Pull down menus is becoming popular technique
in window based user interfaces.
Page 13
13
ii) Forms based interfaces
A forms-based interface displays a form to each other. Users can fill out all of the form
entries to insert new data, or they fill out only certain entries. Forms are usually designed and
programmed for naïve users as interfaces to canned transactions.
iii) Graphical user interface
A graphical interface (GUI) typically displays a schema to the user in diagrammatic
form. The user can then specify a query by manipulating the diagram. GUIs utilize both
menus and forms.
iv) Natural language interfaces
These interfaces accept requests written in English or some other language and attempt
to understand them. The natural language interface usually has its own schema, which is
similar to database conceptual schema. The natural language interface refers to words in its
schema to interpret the request. If the interpretation is successful, the interface generates a
high level query corresponding to the natural language request and submits it to the DBMS
for processing.
v) Interfaces for parametric users
Parametric users such as bank tellers, often have a small set of operations that they
must perform repeatedly. System analysts and programmers design and implement a special
interface for a known class of naïve users. For example, function keys in a terminal can be
programmed to initiate the various commands. This allows the parametric user to proceed
with a minimal number of keystrokes.
vi) Interfaces for DBA
Most database system contains privileged commands that can be used only by the
DBA's staff. These include commands for creating accounts, setting system parameters,
granting account authorization, changing a schema, and reorganizing the storage structure of
database
Classification of Database Management System
DBMS is classified on the basis of data model, number of users, number of sites, cost
and types of access path.
On the basis of data model, DBMS is classified into
i. Relational data model
Page 14
14
ii. Object data model
iii. Hierarchical data model
iv. Network data model
On the basis of numbers of users, DBMS is classified into
i. Single user system – supports only one user at a time
ii. Multi-user system supports multiple users concurrently
On the basis of numbers of sites, DBMS is classified into
i. Centralized – if the data is stored at a single computer site.
ii. Distributed – database and DBMS software distributed over many sites, connected by a
computer network.
iii. Homogeneous – Use same DBMS software at multiple sites.
iv. Heterogenous – Participating DBMS are loosely coupled and have a degree of local
autonomy. Many DBMS use a client server architecture.
On the basis of cost, DBMS is classified into
i. DBMS packages between $10,000 and $100,000
ii. DBMS packages costing more than $100,000
On the basis of types of access of Path, DBMS is classified into
i. General purpose – Designed for general purpose
ii. Special purpose – Designed and built for specific application such as airlines
reservation, telephone directly system such DBMS can't be used for other applications
without major change.
Data Dictionary
The data dictionary can be regarded as a system database which contains data about
data. This is also called metadata. It contains definitions of other objects in the system instead
of raw data. It also stores schemas and mappings details, various security and integrity
constraints. It is also called data directory or system catalog or simply catalog or data
repository
Page 15
15
TABLE COLUMN
TABNAME COLCOUNT ROWCOUNT TABNAME COLNAME
DEPT 2 2 DEPT Dept No.
Emp 3 3 DEPT Dept Name
EMP Emp Name
EMP Emp No.
EMP Emp Telephone No.
Fig. Catalog for department and Employee database
DATA DICTIONARY
Application Programmers
End Users Database Administrators
HUMAN INTERFACES SOFTWARE AND DBMS INTERFACES
Security and Authority
subsystem Compilers /
Precompilers Application Programs /
Report generators Integrity constraint
Enforcer
Query optimizer
Fig. Human & Software interfaces to a data dictionary
Data dictionary is accessed by various software modules of DBMS itself such as
DDL/DML compilers, query optimizer, constraint enforcer. If the data dictionary is used by
designers, users and administrators, not by DBMS software it is called a passive data
dictionary, otherwise it is called an active data dictionary.
E-R MODEL
E-R model means entity relationship which is a popular high level conceptual data
model. ER model describes data as entities, relationship and attributes. ER model is based on
a perception of a real world that consists of a set of basic objects called entities, and of
relationship among these objects.
Page 16
16
Entity types and entity sets: An entity is a thing or object in the real world that is
distinguishable from all other objects. For example, each person in an enterprise is an entity.
An entity has a set of properties, which may uniquely identify an entity. An entity may be
concrete, such as a person or a book, or it may be abstract such as loan or a holiday or a
concept.
An entity set is a set of entities of the same type that share the same properties or
attributes. The set of all persons who are customers at a given bank for example can be
defined as the entity set customer. Similarly, the entity set loan might represent set of all
loans awarded by a particular bank.
Attributes: Attributes are descriptive properties possessed by each member of an entity
set. Each entity has attributes. For example an employee entity may be described by the
employee's name, age, address, salary and job. Possible attributes of the loan entity set are
loan number and amount. For each attribute, there is a set of permitted values, called the
domain or value set.
Employee
Name
Address Age Salary Job Loan No. Loan
Amount
Ram KTM 15 5000 Manager L – 1 5000
Shyam KTM 20 3000 Operator L - 2 3000
Sita BRT 17 4000 CEO L – 3 10,000
Customer Loan
Fig.: Entity sets customer and loan
An attribute, as used in the E-R model, can be characterized by the following attribute
types.
i) Simple (Atomic) and composite attributes: Attributes that are not divisible are called
simple or atomic attributes. Such as age as shown in fig. is simple attribute. Composite
attributes can be divided into smaller attribute. Composite attributes can be divided into
smaller subparts, which represent more basic attributes with independent meaning. For
example: Employee name could be structured as a composite attribute consisting of
first name, middle name and last name.
Page 17
17
ii) Single valued and multivalued attributes: Single valued have a single value for a
particular entity. For example, the loan number attribute for a specific loan entity refers
to only one loan number and so it is single valued. Consider the employee entity set
with the attribute dependent name. Any particular employee may have zero, one or
more dependents. So, different employee entities within the entity set will have
different numbers of values for the dependent name attribute and this type of attribute is
said to be multivalued.
iii) Null (missing): A null value is used when an entity does not have a value for an
attribute. It is unknown value of not applicable. If a particular employee has ho
dependents. The dependent name value for that employee will be null.
iv) Derived attribute: The value for this type of attribute can be derived from the values of
other related attributes. For instance, let us say that the customer entity has an attribute
loans-held, which represents how many loans a for this attribute by counting the
number of loan entities associated with that customer.
v) Complex Attributes: The composite and multivalued attributes be nested in an arbitrary
way. We can represent arbitrary resting by grouping components of a composite
attribute between Parentheses ( ) and separating the components with commas, and by
displaying multivalued attributes between braces . Such attributes are called
complex attributes. For example, if a person can have more than one residence and each
residence can have multiple phones. An attributes Address Phone for a person entity
type can be specified as bellows:
Addres Phone ( Phone (AreaCode, PhoneNumber),
Address (Street Address (Number, street, ApartmentNumber), city, state, up))
Key attributes of an entity type:
An entity type defines a collection of entities that have the same attributes. The
collection of all entities of a particular entity type in the database at any point in time is called
an entity set.
Page 18
18
Entity Type Employee Company
Name EmpID, Name, Age, Salary Name, Headquarters, President
Entity Set:
e1 :
(1, Ram, 10, 2000)
e2
(2, Shyam, 20, 5000)
e2
(3, Mohan, 25, 1000)
C1
(wlink, Jawalakhel, Dr. Ashish)
C2
(Nepasoft, Ratnapark, S.P. Joshi)
C3
(NEA, KTM, Dr. S.R. Malla)
Fig.: Two entity types named employee and company and some of the member entities
in the entity set.
It is important to be able to specify how entities within a given entity set and
relationships within a given relationship set are distinguished. An entity type usually has an
attribute whose values are distinct for each individual entity in the collection. Such an
attribute is called a key attribute and its values can be used to identify each entity uniquely.
For example, the EmpId attribute is a key of the employee entity type. Some keys are
superkey, Candidate key and Primary Key.
Superkey: A super key is a set of one or more attributes that, taken collectively, allows
us to identify uniquely an entity in the entity set. For example, EmpId is a super key for the
entity set employee.
For example: suppose the attributes of the customer entity set are customer Name,
Social security, customer street, customerCity. Then social security is a superkey.
Candidate Key: There may be superkeys for which no proper subset is a superkey. Such
minimal superkeys are called candidate keys. Social – security and customerName,
CustomerStreet are candidate keys. Although the attributes social security and customer
Name together can distinguish customer entities, their combination does not form a candidate
key, since the attribute social security alone is a candidate key.
Primary key: Primary key is a candidate key that is chosen by the database designer as
the principal means of identifying entities within an entity set. A key (Primary, candidate and
super is a property of the entity set, rather than of the individual entites.
Page 19
19
Relationships & Relationship Types
A relationship is an association between entities. Each relationship is identified so that
its name is descriptive of the relationship. Verbs such as takes, teaches, and employs make
good relationship names. For example, a student takes a class, a professor teaches a class, a
department employs a professor and so on.
Rectangles represent entity sets, ellipses represent attributes. Similarly relationships are
represented by diamond shaped symbols as shown below in fig. and the lines link attributes to
entity sets and entity sets to relationship sets.
Professor Class Teaches
Fig.: An entity relationship
The figure shows a relationship between two entities (also known as participants in the
relationship) named professor and class respectively.
A relationships degree indicates the number of associated entities or participants. A
unary relationship exists when an association is maintained within a single entity. A binary
relationship exists when two entities are associated. A ternary relationship exists when three
entities are associated. Although higher degrees exist, they are rare and are not specifically
named.
Course Contributor Professor
Prerequisite Teaches
Class
Binary
CRF
Fund
Recipient
Unary Ternary
Fig.: Three types of relationships
A course within the course entity is a prerequisite for another course within that entity.
The existence of a course prerequisite means that a course requires a course i.e. a course has a
relationship with itself. Such a relationship is also called a recursive relationship.
Page 20
20
Connectivity: The relationships are all classified as M : N for example. A fund can have
many donors. A fund may support many researchers who become the fund receiptants and a
researcher may draw support from many funds. Contributors can make donations to many
funds.
The term connectivity is used to describe the relationship classification.
Professor Class Teaches
Student Class Enrolls in
M
N
1
M
One-to-Many relationship
Many-to-Many relationship
Fig.: Connectivity in an E-R diagram
Cardinality: Cardinality expresses the specific number of entity occurrences associated
with one occurrence of the related entity. The actual number of associated entities usually is a
function of an organizations policy. For example: For Purbanchal University limits the
professor to teaching a maximum of three classes per week. Therefore, the cardinality rule
governing the professor – class association is expressed as "one professor teaches upto three
classes per week. The cardinality is indicated by placing the appropriate numbers beside the
entities as shown in fig.
Professor Class Teaches I M
(0,3) (1,1)
One to many relationship
Fig.: Cardinality in an E-R diagram
The relationship between Professor and class is 1:M
The cardinality limits are (0,3) for professor indicating that a professor may teach a
minimum of zero and a maximum of three class
Page 21
21
The cardinality limits for class entity are (1,1) indicating that the minimum no. of
professor required to teach a class is one, as is the maximum number of Professors.
For binary relationship between entity sets A and B, the mapping cardinality must be
one of the following.
i) One to one: An entity in A is associated with at most one entity in B, and an entity in B
is associated with at most one entity A.
ii) One to many: An entity in A is associated with any number of entities in B and an
entity in B however, can be associated with almost one entity in A.
iii) Many to one: An entity in A is associated with at most one entity in B and an entity in
B, however can be associated with any number of entities in A.
iv) Many to many: An entity in A is associated with any number of entities in B and an
entity in B is associated with any number of entities in A
a1
a2
a3
b1
b2
b3
A B
a1
a2
a3
b1
b2
b3
A B
b4
One to One One to Many
a1
a2
a3
b1
b2
b3
Many to One
a4
a5
a1
a2
a3
b1
b2
b3
Many to Many
a4 b4
Fig.: Mapping Cardinalities
A relationship is an association among several entities.
Entity – Relationship Diagram (E-R diagram)
The E-R diagram is used to represent to E-R model. The E-R diagram consists of the
following major components.
i) Rectangles – to represent entity sets
ii) Ellipses – to represent attributes
iii) Diamonds – to represent Relationships
Page 22
22
iv) Lines – to link attributes to entity sets and entity sets to relationship sets -
v) Double ellipses – to represent multivalued attributes
vi) Dashed ellipses – to denote derived attributes –
vii) Double lines – to indicate total participation of an entity in a relationship set.
E1 R
Total participation of ∈2 in R
E2
For example: Suppose the attributes associated with customer are customerName,
SocialSecurity, CustomerStreet and CustomerCity. The attributes associated with loan are
loanNumber and amount. The relationship set borrower may be many to many, one to many,
many to one and one to one. To distinguish among these types, we know either a directed line
(→) or an undirected line (-) between the relationship set and the entity set.
Customer Loan Borrower
Social Security customer street
customer city customer name loan number Amount
Fig. E-R diagram corresponding to customers and loans
Similarly, we can see another example as follows:
Customer Account depositor
Social Security
customer city customer name Account number balance
Fig. E-R diagram showing one to many relationship
Page 23
23
A directed line from the relationship set depositor to the entity set account specifies that
a customer can deposit to many accounts. So is a one to many relationships.
Weak Entity Types
An entity set may not have sufficient attributes to form a primary key. Such an entity
set is termed as weak entity set. An entity set that has a primary key is termed as strong entity
set. For example: consider the entity set payment, which has three attributes:
PaymentNumber, PaymentDate and paymentAmount. Although each payment entity is
distinct, payment for different loans may share the same paymentNumber. Thus, this entity
set does not have a primary key. Hence it is a weak entity set.
The primary key of weak entity set is formed by the primary key of the strong entity set
on which the weak entity set is existence dependent, plus the weak entity set's discriminator.
In this case, the existence of entity payment depends on the existence of entity loan. If
loan is deleted, its associated payment entities must be deleted. So, entity set loan is dominant
and payment is subordinate. Discriminator of weak entity set is a set of attributes that can
uniquely identify weak entities that are related to the same owner entity. For example: The
discriminator of the weak entity set payment is the attribute payment Number. Since, for each
loan, a paymentNumber uniquely identities one single payment for that loan. Hence, in the
case of the entity set payment, its primary key is loanNumber, PaymentNumber
Loan
Loan Number
Amount
loanPayment Payment
PaymentNumber
PaymentAmount
PaymentDate
Fig. E-R diagram with a weak entity set.
Roles On Relationships
Each entity type that participates in a relationship type plays a particular role in the
relationship. The role name signifies the role that a participating entity from the entity type
plays in each relationship instance and helps to explain what the relationship means. For
Page 24
24
example, in the works_for relationship type, EMPLOYEE plays the role of employee or
worker and DEPARTMENT plays the role of department or employer.
Role names are not technically necessary in relationship types where all the
participating entity types are distinct, since each entity type name can be used as the role
name. However, in some cases the same entity type participates more than once in a
relationship type in different roles. In such cases, the role names become essential for
distinguishing the meaning of each participations. Such relationships are called recursive
relationships.
•
e1
e2
e3
e4
e5
r1
r2
r3
r4
r5
•
•
•
•
.
.
.
Employee Supervision
α
β
α
β
α
β
α
Fig.: Recursive relationship where employee entity type plays two roles
- supervisee
- Supervisor
The supervision relationship type relates an employee to a supervisor, where both
employee and supervisor entities are members of the same EMPLOYEE entity type.
Structural Constraints On Relationships Types
Relationship types usually have certain constraints that limit the possible combinations
of entities that may participate in the corresponding relationship set. These constraints are
determined from the miniworld situation that the relationship represent.
β
α
Page 25
25
•
e1
e2
e3
e4
e5
r1
r2
r3
r4
r5
•
•
•
•
.
.
.
Employee department
d1
d2
d3
.
.
.
.
.
.
.
works_for
•
•
•
.
.
Fig.: Some instances of the works_for relationship between employee and department.
Suppose the company has rule that each employee must work for exactly one
department.
There are two types of relationship constraints: Cardinality ratio and participation.
i. Cardinality ratios for binary relationships: The cardinality ratio for a binary relationship
specifies the number of relationship instances that an entity can participate in for example: in
the works_for binary relationship type, department : employee is of cardinality ratio I:N,
meaning that each department can be related to numerous employee but an employee can be
related to only one department. The possible cardinality ratios for binary relationships types
are 1:1, 1:N, N:1 and M:N.
ii. Participation constraints: The participation constraints specifies whether the existence
of an entity depends on its being related to another entity via the relationship types.
There are two types of participation constraints total and partial. If a Company policy
states that every employee must work for a department, then an employee can exist
only if it participants in a works_for relationship instance. Thus, the participation of
employee in works_for is called total participation meaning that every entity in the total
set of employee entity must be related to a department entity via works for. Total
participation is also called existence dependency.
Cardinality ratio and participation constraints, taken together is called the structural
constraints.
Page 26
26
NAMING CONVENTIONS
The choice of names for entity types, attributes, relationship types and roles is not
always straight forward. One should choose names that convey, as much as possible, the
meanings attached to different constructs in the schema. We choose to use singular names for
entity types, rather than plural ones, because the entity type name applies to each individual
entity belonging to that entity type.
In E-R diagrams, we will use the convention that entity type and relationship type
names are in uppercase, letters, attribute names are capitalized and roles names are in
lowercase letters. Generally the nouns appearing in the narrative tend to give rise to entity
type names and the verbs tend to indicate names of relationship types.
CUSTOMER ACCOUNT DEPOSITOR
CustomerName
CustomerCity
CustomerId AccountNumber Balance
Fig. E-R diagram using Naming conventions
Another naming convention involves choosing relationship names to make the ER
diagram of the schema readable from left to right and from top to bottom.
EMPLOYEE PROJECT WORKS_ON
EmployeeName EmployeeId ProjectName
Location
Fig. E-R diagram using convention for relationship Name from Left to right.
Page 27
27
Relational Model:
Relational Model Concept:
Students: Course:
Section: Grade Report:
Section
ID
Course
Number
Semester Year Instructor
75 MAT
305
Fall 99 J. Gupta
77 DB 315 Spring 98 P. Gurung
80 NEP
318
Spring 97 B. Twari
82 CS 310 Fall 99 K. Bista
Fig: A Database that stores Students and Course Information.
The relational model represents the database as a collection of relations. Informally, each
relation resembles a table of values or to some extent, a file of records. For example the
database of files that is shown above is similar to the relational model representation.
Course
Name
Course
Number
Credit
Hours
Departments
Computer
Science
CS 310 3 CS
Data base DB 315 3 CS
Maths MAT 305 3 Math
Nepali NEP 318 3 nepali
Name Std. ID Class Major
Laxman
05 11 Science
Pema 07 12 Management
Dil 12 11 English
Symbol
No.
Std ID Section
ID
Grade
30115 5 75 B+
30116 7 77 A
30117 12 80 C
Page 28
28
When a relation is thought of as a table of values, each row in the table represent a collection
of related data values. We introduced entity types and relationship types as concept for
modeling real world data. In the relational model, each row in the table represents a fact that
typically corresponds to a real world entity or relationship. The table names and column
names are used to help in interpreting the meaning of the values in each row. For example,
the first table of above figure is called STUDENT because each row represents facts about a
particular student entity. The column names:-Names, Std ID, Class and Major-specify how to
interpret the data values in each row based on the column each value is in. All values in a
column are the same data type.
In the formal relational model terminology a row is called a tuple, a column header is called
an attribute, and the table is called a relation. The data type describing the types of values that
can appear in each column is represented by a domain of possible values.
Fundamental Concepts on Relational Data Model:-
Relations:-
The relational data model organizes and represents data in the form table or relations.
Relation is terms that comes from Mathematics and represent a simple 2- dimensional table,
consisting of rows and columns of data.
WORKER:
Worker ID Name Hourly- Rate Skill-Type SUPV- ID
123 Birendra Negi 12 Electric 131
141 Pema Tamang 11 Plumbing 152
292 Dil Thapa 10 Roofing -
323 Shyam Shah 11 Driving -
152 Hariom Shah 15 Teaching -
Fig: A portion of the relation worker.
This above figure shows a relation with sample data values, which represents the WORKER
object set, and its attributes. Each column in the relation is an attribute of the relation. The
Page 29
29
name of the column is called the attribute name. We use the terms attribute and attribute
name rather than column name.
The number of attributes in a relation is called the degree of the relation. The degree of
WORKER is Five.
The rows of relation are also called tuples. It is assumed that there is no prescribed order to
the rows or tuples if a relation and that no tuples have identical set of values. The set of all
possible values that an attribute may have is the domain of the attribute.
* Null Values:
A null value means the value given an attribute in a tape if the attribute is inapplicable or its
value is unknown. For example, some employee in the WORKER relation do not have
supervisor. Consequently no values exist for SUPV-ID for Three employees. In addition
when we are entering data for a row in relation, we might not know the values of one or more
attributes for that row. In either can, we enter nothing,and that row is recorded in the database
with Null values for those attributes. A null value is not blank or zero. It is simply unknown
or inapplicable and may be supplied at a later time.
* Key:
In fact Key is a minimal set of attributes that uniquely identifies each row in a relation. In the
above figure, let us assume that the WORKER –ID attribute uniquely identifies a row in
WORKER, and we say that WORKER-ID is a key in the worker relation.
Any set of attributes that uniquely identifies each tuple in relation is called a super key. A key
of a relation is a minimal set of such attributes. That is, a key is a minimal super key. By
minimal, we mean that no subset of the set of key attributes will uniquely identify tuples in a
relation.
ASSIGNMENT
WORKER-ID BL DG-ID START DATE NO.-DAYS
123 312 10/10 5
Page 30
30
141 312 05/10 10
123 312 12/08 5
141 315 12/12 12
In the ASSIGNMENT relation, the key consist of the WORKER-ID and the BL DG-ID
attributes. Neither WORKER-ID alone nor BL DG-ID alone uniquely identifies every row,
but the two attributes together do provide that unique identification required for a key. A key
consisting of more than one attribute is called a Composite Key.
In any given relation, there may be more than one set of attributes that could be chosen as a
key, these are called Candidate Keys. Candidate Key is defined as “any set of attributes that
could be chosen as a key of a relation. For example WORKER-ID is a candidate key in
worker relation if it will always be unique. When one of the candidate key is selected as the
relation key, it may be called the Primary Key. The candidate key that is the easiest to use in
day to day data entry is normally selected as the primary key.
A Foreign Key is a set of attributes in one relation that constitutes a key in some other or
possibly the same relation that are used to indicate logical links between relation. WORKER-
ID in the WORKER relation is the example of foreign keys since WORKER-ID is the key of
the ASSIGNMENT relation.
RELATIONAL ALGEBRA:
A Query language is a language in which a user request information from the database. The
relation algebra is a procedural query language (in procedural language, the user instructs the
system, to perform a sequence of operations on the database to complete the desired result). It
consists of a set of operations that take one or two relations as input and produce a new
relation as their results. The fundamental operations in the relation algebra are select, project
union, set difference, Cartesian product and rename. In addition, to the fundamental
operations, there are several others operations-namely, set intersection, natural join, division,
and assignment.
Page 31
31
* Fundamental Operations:
In case of fundamental operations, the select, Project and rename operations are called unary
operations because they operate one to one relation and the other three operations, i.e. union,
set difference and Cartesian product operate on pairs of relations and are therefore called
binary operations. Let us describe all the fundamental operations in brief.
1. The Select operation:
The select operation selects tuples that satisfy a given predicate. We use the letter sigma
(б) to denote selection. The predicate appears as a subscript to sigma. The argument
relation is in parentheses after the sigma. The general format of the select operation is: -
sigma (selection condition)
LOAN
Loan Number Branch Name Amount
11 Round Hill 900
14 Down Town 1500
15 Perryridge 1500
16 Perryridge 1300
17 Down Town 1000
23 Red Wood 3000
93 Milanus 500
Figure: Loan Relation
Now in order to select those types of the loan relation where the branch name is “Perryridge”
we write
(б) Branch Name = “Perryridge” (Loan).
Here the result of this predicate from the loan relation is shown below:
Loan Number Branch Name Amount
15 Perryridge 1500
16 Perryridge 1300
Figure: - Result of (б) Branch Name = “Perryridge” (Loan)
Page 32
32
2. The Project Operation:
The Project operation is unary operation that returns its argument relation with certain
attributes left out. Since a relation is a set, any duplicate rows are eliminated. Projection is
denoted by the Greek letter pi (π). We list those attributes that we wish to appear in the
result as a subscript to pi. The argument relation follows in parentheses. Thus we write
the query to list all loan numbers and the amount of the loan as.
PI loan number, amount (loan)
The general format of the project operation is: pi (attribute List) ®
And the result of these queries is given below:
Loan Number Amount
11 900
14 1500
15 1500
16 1300
17 1000
23 2000
93 500
Figure: - Result of π Loan-Number, Amount (Loan)
3. The Rename Operation:
We can also define a formal Rename operation which can rename either the relation name
or the attributes names, or both in a manner similar to the way we define select and
project operations. The general rename operation when applied to a relation R of degree n
is denoted by any of the following three forms.
Ps (B1, B2, B3,………..Bn) ®
Where the symbol P denoted the rename operation. S is the new relation name and B1, B2,
B3,………..Bn are the new attribute names. The first expression renames both the
Page 33
33
notation and its attributes, the second expression renames the relation only and the third
expression renames the attributes only. If the attributes of R are (A1, A2, A3,………..An)
in that order, then each Ai is renamed as Bi.
4. The Union Operation:
The result of this operation denoted by R U S, is a relation that includes all tuples that are
either in R or in S relations or in both R and S. Duplicate tuples are eliminated. Let us see
an example.
Figure: The Depositor Relation Figure: The Borrowers Relation.
Now by the use of Union Operation, we can find the name of all customers with a loan in the
bank and also with an account in the bank. Hence the expression is:
π customer name (borrower) U π customer name (depositor)
The result of this expression which is extracted based on the above two relations are shown
below:
Customer Name Account Number
Shyam 101
Hariom 102
Chudamani 103
Dev Laxmi 104
Lekha 105
Customer Name Loan Number
Sanjeep 16
Bibek 17
Shahina 18
Dil Bd 19
Pema 20
Page 34
34
Customer Name
Sanjeep
Bibek
Shahina
Dil Bd
Pema
Dev Laxmi
Lekha
Figure: Name of all customer s who have either a loan or an account.
5. The Set Difference (Minus) Operation:
The set difference operation, denoted by R-S, allows us to find tuples that are in one
relation but are not in another. The expression R-S produces a relation containing those
tuples in R but not in S.
In case of example are can find all customers of the bank who an account but not a loan
by anything:
π Customer-name (depositor) – π Customer-name (borrower).
The result of this above expression is shown below.
Customer Name
Shyam
Hariom
Figure: Customer with an Account but not Loan
6. The Cartesian Product Operation:
The Cartesian product operation which is also known as cross product or Cross Join
operation is denoted by (*), allows us to combine tuples from two relations in a
combinational fashion. We write the Cartesian product of relations R and S as R*S.
Page 35
35
In general the result of R(A1, A2, A3,………..An)*S(B1, B2, B3,………..Bm) is a relation
Q with degree n+m attributes Q(A1, A2, A3,………..An, B1, B2, B3,………..Bm), in that
order. The resulting relation Q has one tuple for each combination of tuples- one from R
and one from S.
For example; Suppose that one want to find the name of all customers also have a loan at
the Perryridge branch we need the information in both the Loan relation and Borrower
relation to do so. If we write
Б branch name = “Parryridge”
(borrower*loan)
The result is shown in the following Table;
Customer
Name
Borrower
Loan Number
Loan Loan
Number
Branch Name Amount
Adams 16 15 Perryridge 1500
Adams 16 16 Perryridge 1300
Curry 17 15 Perryridge 1500
Curry 17 16 Perryridge 1300
Hayes 18 15 Perryridge 1500
Hayes 18 16 Perryridge 1300
Vanes 19 15 Perryridge 1500
Vanes 19 16 Perryridge 1300
Smith 20 15 Perryridge 1500
Smith 20 16 Perryridge 1300
Some Other Operations Of Relational Algebra:
• The Set- Intersaction Operation:
This operation produces a relation that includes all the tuples in both R and S relations
and is denoted by ∩. For example; we wish to find all customers who have both a loan
and account. Using set intersection, we can write
Customer Name
Hayes
Vanes
Page 36
36
Π customer- name (borrower) ∩ Π customer- name (Deposit). The result of this query
is:
• The Natural Join Operation:
The natural join is a binary operation that allows us to produces all the combination of
tuples from R and S relations that satisfy a join condition with only equality
comparisons except that the join attributes of S relation are not included in the
resulting relation. It is denoted by the “Join” symbol ∞
Let us consider an example to find the name of all customers who have a loan at the
bank, and find the amount of the loan. We express this query by using the natural join
as:
Π customer name, loan-number, amount (borrower ∞ loan)
The result of this above query is:
Customer
Name
Loan
Number
Adams 16
Curry 17
Hayes 18
Vanes 19
Smith 20
Loan
Number
Branch
Name
Amount
11 Roundhill 900
14 Downtown 1500
15 Perryridge 1500
16 Perryridge 1300
17 Downtown 1000
23 Redwood 2000
93 Milanus 500 Customer
Name
Loan Number Amount
Adams 16 1300
Curry 17 1000
Page 37
37
• The Division Operation:
The division operation denoted by ÷ is useful for special kinds of query that
sometimes occurs in database applications. Formally let R(X) and R(Y)be relations
and let Y≤X, that is every attributes of Y is also in schema X. The relations R÷S is a
relation on schema X-Y (that is on the schema containing all attributes of schema X
that are not schema Y). A tuple t is in R÷S if and only if both of conditions hold:
1. it is in π X-Y ®
2. For every tuple ts in S, there is a tuple tR in R satisfying both of the following:
•••• tR (Y) = ts (Y)
•••• tR (X-Y) = t
• The Assignment Operation:
It is convenient at times to write a relational algebra expression by assigning parts of
it to temporary relation variables. The assignment operation denoted by ←, works like
assignment in a programming language. For eg.
Temp1← R
Temp2← S
Result = Temp1- temp2.
Integrity Constraints
Integrity constraints guard against accidental damage to database.
Entity Integrity
Entity integrity ensures that each row in the table is uniquely identified. In other words,
entity integrity ensures a table does not have any duplicate rows. Example: Two separate
customers should not have the same customer number .SQL Server will allow duplicate rows
if entity integrity is not enforced. Entity integrity is a key concept in the relational database
model. Data in the relational database is independent of physical storage; there is no such
thing as the '5th customer row' in a table. Physical independence is achieved by being able to
reference each row by a unique value, sometimes referred to as a ‘key’. Entity integrity
ensures that each row in a table has a unique identifier that allows one row to be
Page 38
38
distinguished from another. Entity integrity is most often enforced by placing a primary key
(PK) constraint on a specific column (although it can also be enforced with a UNIQUE
constraint, a unique index, or the IDENTITY property) .The PK constraint forces each value
inserted into a column (or combination of columns) to be unique; if a user attempts to insert a
duplicate value into the column(s), the PK constraint will cause the insert to fail
A PK will not allow any Nulls to be inserted into the column(s) (A NULL entry would be
disallowed even if it would be the only NULL in the column and therefore unique.) . A PK is
referred to as a ' surrogate key' if the column contains no real data other than a uniqueness
identifier .If ‘real’ data can be used as a PK (e.g., a social security number), then it is referred
to as an ' intelligent key' .There can be only one PK per table .A composite PK is a PK that
consists of more than one column; it is used when none of the columns in the composite key
is unique by itself .Thus, there can be only one PK in a table but the PK can consist of more
than one column .If you need to enforce uniqueness on more than one column, use a PK
constraint on one column and a UNIQUE constraint or IDENTITY property on any other
columns that must not contain duplicates .Example: If the 'customer ID' column is the PK in
the 'customers' table and you also want to make sure there are no duplicate customer names,
you can place a UNIQUE constraint on the 'customer name' column . Non-PK columns on
which uniqueness is enforced are referred to as alternative keys or AKs; they get their name
from the fact that they are 'alternatives' to the PK and as such, make good candidates for
indexing or 'joining' on.
Domain Constraint
A domain of possible values must be associated with every attribute SQL allows the domain
declaration of an attribute to include the specification "not null" and thus prohibits insertion
of a null value for this attribute. Any database modification that would cause a null to be
inserted in a not null domain generates an error diagnostic. There are many situations where
the prohibition of null values is desirable. A particular case where it is essential to prohibit
null values is in the primary key of a relation schema.
The SQL-92 allows us to define domains using a create domain clause, as shown in the
following example.
create domain personName char (60)
We can then use the domain name personName to define the type of an attribute, just
like a built-in domain.
Page 39
39
Domain constraints are the most elementary form of integrity constraint. They are
tested easily by the system whenever a new data item is entered into the database. It is
possible for several attributes to have the same domain. The principle behind attribute
domains is similar to that behind typing of variables in programming languages.
The check clause in SQL-92 permits the schema designer to specify a predicate that
must be satisfied by any value assigned to a variable whose type is the domain. For instance,
a check clause can ensure that an hourly wage domain allows only values greater that a
specified value (such as minimum wage) as shown below.
create domain hourlywage numeric (5, 2)
Constraint wage, valuetestcheck (value> = 4.00)
The domain hourlywage is declared to be a decimal number with a total of five digits,
two of which are placed after the decimal point, and the domain has a constraint that ensures
that the hourlywage is equal to or greater that 4.00.
The check clause can also be used to restrict a domain not to contain any null values, as
shown below.
e.g.: create domain accountNumber char(10)
constraint accountNumber NullTest check (value not null)
create domain gender char (10)
constraint checkgendercheck (value in ("Male", "Female")
Referential Integrity
It is also required that a value that appears in one relation for a given set of attributes
also appears for a certain set of attributes in another relation. This condition is called
referential integrity.
The referential integrity constraint is specified between two relations and is used to
maintain the consistency among tuples of the two relations. Informally, the referential
integrity constraint states that a tuple in one relation that refers to another relation must refer
to an existing tuple in that relation. Consider the two relations EMPLOYEE and
DEPARTMENT as follows.
Page 40
40
EMPLOYEE DEPARTMENT
NAME SSN Address Sex Salary Dept
No.
Dept
No.
DeptName MGRSSN
The attribute dept No. of EMPLOYEE gives the department Number for which each
employee works. hence, its value in every EMPLOYEE tuple must match the dept no. value
of some tuple in the DEPARTMENT relation. To define referential integrity more formally,
we must first define the concept of a foreign key. The conditions for a foreign key between
two relation schemas R1 and R2 states that a set of attributes FK in relation schema R1 is a
foreign key of R1 that references relation R2 if it satisfies the following two rules.
i. The attributes in FK have the same domain as the primary key attributes PK of R2. The
attributes FK are said to reference or refer to the relation R2.
ii. A value of FK in a tuple t1 of the current state r1 (R1) either occurs as a value of PK for
some tuple t2 in the current state r2 (R2) or is null. In the former case, we have t1 [FK] =
t2 [PK], and we say that the tuple t1 references or refers to the tuple t2. R1 is called the
referencing relation and R2 is the referenced relation.
In a database of many relations, there are usually many referential integrity constraints.
To specify these constraints, we must first have a clear understanding of the meaning or role
that each set of attributes plays in the various relation schemas of the database.
In the EMPLOYEE relation the attribute deptNo refers to the department for which
employee work hence, we designate deptNo to be a foreign key of EMPLOYEE, referring to
the DEPARTMENT relation. This means that a value of deptNo in any tuple t1 of the
EMPLOYEE relation must match a value of the primary key of the department.
We can diagrammatically display referential integrity constraints by drawing a directed
arc from each foreign key to the relation it references. For clarity, the arrowhead may point to
primary key of the referenced relation.
Page 41
41
EMPLOYEE
NAME SSN address Sex Salary dept No
DEPARTMENT
dept
no.
dept Name mgrssn
DEPARTMENT_LOCATIONS
dept no. Locations
Fig. Referential integrity constraints
Referential integrity in SQL :
Primary and foreign key can be specified as part of the SQL create table statement.
The primary key clause of create table statement includes a list of attributes that
constitute a candidate key.
The UNIQUE clause of create table statement includes a list of the attributes that
constitute a candidate key.
The FOREIGN KEY clause of create table statement includes both a list of
attributes that constitute foreign key and the name of the relation referenced by the
foreign key.
Assertion
An assertion is a predicate expressing a condition that we wish the database always to
satisfy. Domain constraints and referential integrity constraints are special forms of
assertions. However, there are many constraints that we can't express using only these special
forms.
For example suppose the constraints are:
i. The sum of all loan amounts for each branch must be less than the sum of all account
balances at that branch.
ii. Every loan has at least one customer who maintains an account with a minimum
balance of 50,000.
Page 42
42
An assertion in SQL-92 takes the form
create assertion <assertionName> check <Predicate>
e.g.:
create assertion sumConstraint check
(not exists (select * from branch
where (select sum(amount) from loan
where loan.branchName = branch.branchName)
>= (select sum (amount) from account
where account.branchName = branch.branchName)))
When an assertion is created, the system tests it for validity. If the assertion is valid,
then any further modification to the database is allowed only if it does not cause that assertion
to be violated.
Triggers
A trigger is a statement that is executed automatically by the system as a side effect of a
modification to the database. Trigger must contain the following two requirements.
i. specify the conditions under which the trigger is to be executed.
ii. specify the actions to be taken when the trigger executes.
Triggers are useful mechanisms for alerting humans, or for performing certain tasks
automatically when certain conditions are met. Triggers are sometimes called rules or active
rules. Triggers are written in both front end and backend. If the triggers are written in
backend, they are called database triggers.
Types:
i. row level triggers
ii. statement level triggers
iii. before and after triggers
iv. database level triggers
Triggers can be written for events such as insert, update, delete, create, alter, drop etc.
For example suppose we want to store username and the system date into a table
logdata. For this purpose, the trigger can be written as follows.
CREATE OR REPLACE TRIGGER tg_before_update_user
BEFORE INSERT OR UPDATE
Page 43
43
ON Policies
FOR EACH ROW
BEGIN
INSERT INTO LOGDATA
VALUES (USER, SYSDATE); COMMIT;
END;
In this trigger, if any insert or update is made in the policies table, then the user name
& current date is stored in the logdata table.
QUERY PROCESSING
Introduction
Query Processing refers to the range of activities involved in extracting data from a
database. The range of activities include translation of queries from high level database
language into expressions that can be used at the physical level of the file system, a variety of
query optimizing transformation and actual evaluation of queries.
The basic steps involved in processing a query are given below:
a) Parsing and Translation:
At the initial step, a query expressed in high level query language such as SQL must
be translated into its internal form. This translation process is similar to the work performed
by the parser of the complier. In generating the internal form the query, the parser checks the
syntax of the users query, verifies that the relation names appearing in the query are names of
the relations in the database, and so on. The system constructs a parse- tree representation of
the query, which it then translates into a relational algebra expression.
b) Optimization :
A query typically has many possible executions strategies, and the process of
choosing a suitable one for processing a query is know as Query optimization. The main task
of query optimizer is to produce suitable execution plan.
Page 44
44
c) Evaluation:
A sequence of the primitive operations that can be used to evaluate a query is query
execution plan or query evaluation plan. The query execution engine takes a query evaluation
plan, executes that plan and returns the answer to the query.
The following figure illustrates the steps used in query processing.
Figure: Steps in Query Processing
For Example:
Consider the Query
Select Balance
From account
Where balance <2500
The query can be translated into either of the following relational algebra expression
• ∂ balance <2500(Ωbalance (account)) ___select
• Ω balance(∂ balance<2500(account)) ____ project
And Query Evaluation plan
Ω Balance
׀
Parser & translator
Relational algebra expression
Optimizer
Execution plan Evaluation engine
Query output
Query
Page 45
45
∂ Balance<2500
׀
account.
Query Optimization:
For a given query, there are generally a variety of methods for computing the answer.
It is the responsibility of data system to transform the query as entered by the user into an
equivalent query that can be computed more efficiently. The process of finding a good
strategy for processing a query is called Query Optimization.
In other words, Query Optimization is the process of selecting the most efficient
query evaluation plan from among the many strategies usually possible for processing a given
query, especially if the query is complex. Users are not expected to write their queries so that
they can be processed efficiently but the system is expected to construct a query evaluation
plan that minimizes the cost of query evaluation. This is the place where query optimization
comes into play.
Let us consider a scenario. One aspect of optimization occurs at the relational algebra
level, where the system attempts to find an expression that is equivalent to the given
expression, but more efficient to execute. Another aspect is selecting a detailed strategy for
processing the query such as choosing the algorithm to use for executing an operation,
choosing the specific indices to use, and so on.
Equivalence of Expression
To relational algebra expressions are said to be equivalent if the given two
expressions generate the same set of tuples on every legal data base instance. A legal
database instance is one that satisfies all the integrity constraints specified in the database
schema. Although the two expressions may generate the tuples in different orders, but they
would be considered equivalent as long as the set of tuples is the same.
Beside this, an Equivalence Rule says that expressions of two forms are always
equivalent, then we can replace an expression of the first form by an expression of the second
form, or vice-versa.
Page 46
46
In SQL, the inputs and outputs of tuples and a multi set version of the relational
algebra is used for evaluating SQL queries. Two expression in the multi set version of the
relational algebra are said to be equivalent if an every legal database, the two expression
generate the same multi set of tuples,
Query Decomposition: (For detail see internet)
Query Decomposition is the process in DBMS which break up or decompose a query
into sub queries that can be executed at the individual sites.
In fact, Query decomposer generates sub queries based on the conventional format
query intercepted from the initial processing.
OBJECT ORIENTED DATA MODEL
Introduction:
Traditional Data Model and systems, such as relational, network and hierarchical,
have been quite successful in developing the database technology required for many
traditional business database applications. However, they have certain short comings when
more complex database application must be designed and implemented. For e.g, database for
engineering design and manufacturing, scientific experiments, telecommunications,
geographic information system, multimedia and many more. These new applications have
requirements and characteristics that differ from those of traditional business applications,
such as more complex structures for objects, longer duration transaction, new data types for
sorting images or large textual items, etc.
Object oriented database were proposed to meet the needs of these more complex
applications mentioned above. The object oriented approach offers the flexibility to handle
some of these requirements without being limited by the data types and query languages
available in traditional database system. Some of the keys features for creating objected
database or model are given below.
1. The first keys feature is the power they give the designer to specify both the structure
of complex objects and the operations that can be applied to these objects.
2. Another reason for the creation of object oriented database is the increasing use of
object oriented programming language in developing software applications.
Page 47
47
3. The need for additional data modeling feature has been recognized by relation DBMS
vendors, and thus newer version of relational system are incorporating many of the
features that were proposed for object oriented database.
Design of Object Oriented Data Model
Mainly, design of the object oriented data model is based on the principal of object
oriented programming language means, design of object oriented data model includes the
elements that are included by object oriented programming languages which are give
below. Beside this, this model contain more sub elements but the given below are the core
elements.
1. Objects:
Simply, an object corresponds to an entity in the E-R model. The object oriented
paradigm is based on encapsulation of data and code related to an object into a single
unit, whose contents are not visible to the outside world. Object is also known as the
combination of data and function in single unit. Conceptually, all interactions between
an object and the rest of the system are via message.
2. Class:
Usually, these are many similar objects in a database. By similar, we mean that they
respond to the same message, use the same methods, and have variables of the same name
and type. It would be wasteful to define each such object separately. Therefore we group
similar objects to form a class. Each such object is called an instance of its class. All
objects in a class share a common definition, although they differ in the values assigned
to the variables. The notation of a class in the object oriented data model corresponds to
the notation of an entity set in the E-R model. Example of classes in our bank database is
Employee, Customers, Accounts and Loans.
3. Inheritance:
Inheritance is the process of creating a new class called derived class from the
existing class called the base class. Each sub class or derived class shares common
characteristics of base class. An object oriented database schema usually requires large
classes. Among many classes, there could be classes which are similar type. In such situation,
Page 48
48
inheritance plays vital role because it is mechanism to create derived class for e.g. bus and
truck are considered to be the member of class vehicle. In addition to those similar
characteristics, each member if the class vehicle unit characteristic for e.g. bus carries
passengers where as truck carries goods.
4. Polymorphism:
It is another important feature of object oriented database. The word polymorphism is
derived from the Latin words ‘poly’ means many and ‘morphism’ the concept is form.
Therefore the concept of using functions and operators in different ways, depending on
what they are operating is called polymorphism. This type of concept is frequently used in
object oriented database. For e.g. consider the addition operation. In case of numbers, the
operation will add numbers but in case of string, the operation will concatenate the given
strings.
Besides these, there could be other elements that are considered while designing
object oriented database, But those above four are considered while designing object oriented
database, but there above four are considered as the most,
SQL
Introduction
The history of SQL begins in an IBM laboratory in San Jose, California, where SQL
was developed in the late 1970s. The initials stand for Structured Query Language, and the
language itself is often referred to as "sequel." It was originally developed for IBM's DB2
product (a relational database management system, or RDBMS, that can still be bought today
for various platforms and environments). In fact, SQL makes an RDBMS possible. SQL is a
nonprocedural language, in contrast to the procedural or third-generation languages (3GLs)
such as COBOL and C that had been created up to that time.
Background
SQL = Structured Query Language
Created in late 70’s at IBM, under the name of SEQUEL
Went through major standardizations (which contributed to its wide acceptance):
SQL-86 (SQL1)
Queries + some schema definitions & manipulation
SQL-89
Do
cum
en
t2
Page 49
49
Referential integrity
SQL-92 (SQL2)
Revised and expanded
SQL-99 (SQL3)
Archive rules & triggers, recursive operations, aggregate operations, object-oriented
features
Consists of
A Data Definition Language (DDL) for declaring database schemas. e.g.
create table
Data Manipulation Language (DML) for modifying and querying database
instances . It also includes commands to insert tuples into, deletes tuples from,
and to modify tuples in the database. e.g.select,insert,update,delete,explain,lock
table etc.
Embedded DML : The embedded form of SQL is designed for use within
general-purpose programming languages, such as Cobol, Pascal, Fortran and C.
View Defintion: The SQL DDL includes commands for defining views.
Authorization: The SQL DDL includes commands for specifying access rights
to relations and views.
Integrity: The SQL DDL includes commands for specifying integrity
constraints that the data stored in the database must satisfy.
Transaction Control: SQL includes commands for specifying the beginning
and ending of transactions. e.g. set transaction, commit, rollback
Session Control: Manages the properties of user session. e.g. alter session
System Control : manipulates the properties of database. e.g. alter system
Basic Structure
The basic structure of an sql expression consists of three clauses: select, from and
where.
The select clause corresponds to the projection operation of the relational algebra. It is used
to list the attributes desired in the result of query.
The FROM CLAUSE corresponds to the Cartesian product operation of the relational
algebra. It lists the relations to be scanned in the evaluation of the expression.
The WHERE clause corresponds to the selection predicate of the relational algebra. It
consists of a predicate involving attributes of the relations that appear in the from clause.
Page 50
50
select attribute-expression
from table
[where condition]
i.e. A typical SQL query has the form
SELECT A1,A2,A3,….,An
FROM R1,R2,R3,…..,Rm
WHERE P
Each Ai represents an attribute, and each ri a relation. P is a predicate. The query is
equivalent to the relational algebra expression
∏A1,A2,…An(σρ(r1 × r2 × … × rm))
select Name
from Students
where name<'N';
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
Name
Ben
Dan
The result of this select statement is a relation consisting of a single attribute with the
heading Name.
We can rewrite the preceding query as
Select distinct name from students if we want duplicates to be removed. SQL allows us to use
the keyword ALL to specify explicitly that duplicates are not removed.
Select all name from students.
Since duplicate retention is the default, we will not use all in the query.
The asterisk symbol "*" is used to denote all attributes.
The select clause can also contain arithmetic expressions involving the operators, +,-,*, and /,
and operating on constants or attributes of tuples.
Select branchName, loanNumber, loanAmount*100
From Loan
Page 51
51
Similarly SQL uses the logical connectives and, or, and not rather than the
mathematical symbols in the where clause.
e.g. Select loanNumber from loan where loanAmount<50000 and
branchName="Kathmandu";
If we wish to find the loan number of those loans with loan amounts between 100000
and 500000, we can use
Select loanNumber
From loan
Where loanAmount between 100000 and 500000;
Operators in conditions: =, <> , < , > , <=, >=, or, and, not
Renaming Attributes
SQL provides a mechanism for renaming both relations and attributes. It uses as clause,
taking the form
oldName as NewName
select attribute-expression [as] target-attribute
from table
[where condition]
select Name as Names
from Students
where Name < 'N'
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
Names
Ben
Dan
Multiple Attributes
select Name as Names, Number * 10 as Num
from Students
where Name < 'N'
STUDENTS Name Number Sex Names Num
Page 52
52
Ben 3412 M
Dan 1234 M
Nel 2341 F
Ben 34120
Dan 12340
The full list of attributes may be referenced through the character ‘*’
select *
from Students
where Name < 'N'
Multiple Tables
select Code
from Students, Classes
where Students.Number = Classes.Num
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
CLASSES Code Num
670 1234
680 1234
680 4123
Code
670
680
The dot operator is available for distinguishing attributes of different tables
When no ambiguity arises, the identifying tables are not needed.
select Classes.Code
from Students, Classes
where Number = Num
Aliases for Tables
from table [as] alias,....
select S1.FN, S1.LN
from Students as S1, Students as S2
where S1.LN = S2.LN and S1.FN <> S2.FN
Page 53
53
STUDENTS FN LN
Ben Smith
Dan McLean
Nel Smith
FN LN
Ben Smith
Nel Smith
Duplicate Tuples
In general, SQL tables are multisets, allowing duplicated rows in the tables
Some SQL tables (e.g., with key attributes) are forced to be sets
Requests to remove the extra entries can be made with the ‘distinct’ keyword,
following the ‘select’ keyword.
select S1.LN
from Students as S1,
Students as S2
where S1.LN = S2.LN
and S1.FN <> S2.FN
select distinct S1.LN
from Students as S1,
Students as S2
where S1.LN = S2.LN
and S1.FN <> S2.FN
LN
Smith
Smith
LN
Smith
Set Operations
The set operations include UNION, INTERSECT AND EXCEPT operations on
relations.
select ... <union | intersect | except> [all] select ...
a. The UNION Operation:
To find all customers having a loan, an account, or both at the bank we write
SELECT customerName from depositor)
UNION
SELECT customerName from borrower)
If we want to retain all duplicates , we must write UNION ALL in place of union as
follows.
Page 54
54
(SELECT customerName from depositor)
UNION ALL
(SELECT customerName from borrower)
select FN as Name from Students
union
select LN as Name from Students
STUDENTS FN LN
Ben Smith
Dan McLean
Nel Smith
Name
Ben
Dan
Nel
Smith
McLean
The ‘all’ requests to retain duplicates. The default is to eliminate them
After aliasing, the tables involved in the operations must agree on their attributes,
and the ordering of the attributes
b. The INTERSECT operation: To find all customers who have both a loan and an account at
the bank, we write
(select customerName from depositor)
intersect
(select distinct customerName from borrower)
The intersect operation also automatically eliminates duplicates . If we want to retain
all duplicates, we must write INTERSECT ALL in place of intersect.
(select customerName from depositor)
intersect all
(select distinct customerName from borrower)
c. The EXCEPT operation: To find all customers who have an account but no loan at the
bank, we write
Page 55
55
(select distinct customerName from depositor)
except
(select customerName from borrower)
If we want to retain all duplicates , we must write EXCEPT ALL in place of except as
follows.
(select distinct customerName from depositor)
except all
(select customerName from borrower)
Ordering
Ordering may be imposed on rows of tables, based on values of attributes.
order by attribute [asc |desc],...
The default assumes ascending order
select Name,Number
from Students order by Name
String Comparisons
=, !=,... standard operations
like, binary operator for comparing string patterns
% , wild card for strings . The % character matches any substring.
_, wild card for characters . The – character matches any character.
(name like '%a_') is true for all names having ‘a’ as second letter from the end.
Like "ab\%cd%" escape "\" matches all strings beginning with "ab%cd"
suppose "Find the names of all customers whose street address includes the substring
'PUR'. " This query can be written as
Select customerName
From customer
Where customerStreet like "%PUR%";
Similarly,
like "ab\%cd%" escape "\" matches all strings beginning with "ab%cd".
Null Values
An attribute can be checked whether it ‘is null’ or it ‘is not null’.
Page 56
56
SQL allows the use of null values to indicate absence of information about the value of
an attribute.
The result of an arithmetic expressions (involving for example +,-,* or /) is null if any of the
input values is null. The result of any comparison involving a null value can be thought of as
being false.
select ...
from ...
where (x is null) and (y is not null)
Aggregate Queries
Aggregate functions are functions that take a collection ( a set or multiset) of values as
input and return a single value. SQL offers five built-in aggregate function.
Average: avg
Minimum: min
Maximum: max
Total: sum
Count: count
count(*), count ([distinct] attributes)
Counts the number of tuples
select count(*)
from Students as S1,
Students as S2
where S1.LN = S2.LN
and S1.FN <> S2.FN
STUDENTS FN LN Number
Ben Smith 3412
Dan McLean 1234
Nel Smith 2341
Count
2
sum ([distinct] attributes)
max ([distinct] attributes)
min ([distinct] attributes)
avg ([distinct] attributes)
Page 57
57
select min(Number), max(Number), avg(Number)
from Students
There are circumstances where we would like to apply aggregate function not only to a
single set of tuples, but also to a group of sets of tuples; we specify this in SQL using group
by clause. The attribute or attributes given in the group by clause are used to form groups.
e.g. "Find the average account balance at each branch". We write this query as follows.
Select branchName, avg(balance)
From account
Group by branchName;
Group Clauses
Tables might be partitioned to subsets of rows which agree on their entries for given
attributes. The attributes are to be specified within a ‘group by’ clause, and are the only ones
allowed in the projection specified by the ‘select’ component.
select Sex, sum(Number)
from Students
group by Sex
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
Sex sum_Number
M 4646
F 2341
Group predicates through ‘having’ clause may be added, to exclude subgroups which
don’t satisfy desirable conditions.
select Sex, sum(Number)
from Students
group by Sex
having sum(Number) < 3000
STUDENTS Name Number Sex
Ben 3412 M
Sex sum_Number
F 2341
Page 58
58
Dan 1234 M
Nel 2341 F
Nested Subqueries
A common use of subqueries is to perform tests for set membership, set comparisons,
and set ardinality.
a. Set Membership
SQL draws on the relational calculus for operations that allow testing tuples for
membership in a relation. The IN connectives tests for set membership , where the set is a
collection of values produced by a select clause. The NOT IN connective tests for the absence
of set membership. We begin by finding all account holders, and we write the subquery
(Select CustomerName
From depositor)
We then need to find those customers who are borrowers from the bank and who appear
in the list of account holders obtained in the subquery. We do so by nesting the subquery in
an outer SELET. The resulting query is
Select distinct customerName
From borrower
Where customerName in (Select customerName from depositor)
Similarly to "find all the customers who have both an account and a loan at the
kathmandu branch" , The query is
Select distinct customerName
From borrower,loan
Where borrower.loanNumber=loan.loanNumber and branchName="kathmandu" and
(branchName,customerName) IN (Select branchName,customerName from depositor,account
where depositor.accountNumber=account.accountNumber)
Similarly "find all customers who do have a loan at the bank who are others than
"Ram", "Mohan", "Barsha". The query is
Select distinct customerName
From borrower
Where customerName NOT IN("Ram","Mohan"," barsha")
Page 59
59
b. Set Comparison
Some, any, all are used for set comparison.
Consider the query "Find the names of all branches that have assets greater than those of at
least one branch located in Kathmandu". The query is
Select distinct T.branchName
From branch as T, branch as S
Where T.assets > S.assets and S.branchCity="Kathmandu"
The alternative style for writing the preceding query is using SOME. The phrase
"greater than at least one " is represented in SQL by >SOME as follows.
Select branchName
From branch
Where assets > some (select assets
from branch where branchCity="Kathmandu")
SQL also allows <some, <=some, >=some, =some, <>some comparisons.
Similary "Find the names of all branches that have assets greater than that of each
branch in kathmandu" . The query is
Select branchName
From branch
Where assets > all (select assets
from branch
where branchCity="Kathmandu")
select Name
from Students
where Number = all (select Number
from Same)
STUDENTS Name Number
Ben 3412
Dan 1234
Nel 2341
SAME Number
3412
3412
Name
Ben
Page 60
60
any
select Name
from Students
where Number = any (select Number
from Diff)
STUDENTS Name Number
Ben 3412
Dan 1234
Nel 2341
DIFF Number
3412
2341
Name
Ben
Nel
select Name
from Students
where (Name,Number) not in
(select Name,Number
from Student
where Number <> 1234)
STUDENTS Name Number
Ben 3412
Dan 1234
Nel 2341
Name
Dan
c. Test for empty Relations
SQL includes a feature for testing whether a subquery has any tuples in its result The
EXISTS construct returns the value TRUE if the argument subquery is nonempty. Using the
EXISTS construct , we can write the query "Find all customers who have both an account and
a loan at the bank" in another way as follows.
Select customerName
From borrower
Where exists (select * from depositor
where depositor.customerName=borrower.customerName)
Page 61
61
d. Test for the absence of duplicate tuples
SQL includes a feature for testing whether a subquery has any duplicate tuples in its
result. The UNIQUE construct returns the value true if the argument subquery contains no
duplicate tuples. Using the unique construct , we can write the query " Find all customers
who have only one account at the Kathmandu branch," as follows.
Select T. CustomerName
From depositor as T
Where unique(select R.customerName
From account,depositor as R
Where T.customerName=R.customerName and
R.accountNumber=account.accountNumber and
Account.branchName="Kathmandu")
Similary we can test for the existence of duplicates tuples in a subquery by using the
not unique construct. Consider the query "Find all customers who have at least two accounts
at the Kathmandu branch" can be written as
Select distinct T. Customer Name
From depositor as T
Where not unique(select R.customerName
From account,depositor as R
Where T.customerName=R.customerName and
R.accountNumber=account.accountNumber and
Account.branchName="Kathmandu")
Views
Views are virtual tables whose contents depend on other tables. Views are defined in
SQL using CREATE VIEW command.
Create or replace view viewName as <sql expression>
e.g. create or replace view v_employee as
select emid, empname from employee
Page 62
62
create view Males (Nm,Num)
select Name,Number
from Students
where Sex = 'M'
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
MALES Nm Num
Ben 3412
Dan 1234
We can use the view forcefully with compilation errors as follows.
Create force view viewname as <sql expression>
View can be created with compilation errors and later on the error can be fixed and
compiled using
Alter view viewname compile;
View can be created read only and with check option constraint. Create view with an
optional WITH READ ONLY specifies that the view will be read only. Similary WITH
CHECK OPTION specifies that inserts and updates done through the view should satisfy the
where clause of the view.
e.g. Create or replace view v_employee(empid,empname) as select
employeeid,empname from employee where salary>2999 with check option constraint
top_emp_sal;
Joined Relation
Instead of providing simple tables, combinations of them may be specified using the
inner, left outer, right outer, and full outer operations.
table-1 join-op table-2 on condition
select Name,Code as Course
from Students inner join Classes
on Student.Number = Classes.Number
Page 63
63
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
CLASSES Code Number
670 1234
680 1234
680 4123
Name Course
Dan 670
Dan 680
select Name,Code as Course
from Students natural inner join Classes
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
CLASSES Code Number
670 1234
680 1234
680 4123
Name Course
Dan 670
Dan 680
For natural joins, the ‘natural’ keyword can be specified before the operator instead of
providing the condition.
Attributes names might be renamed, to facilitate the natural join operation.
select Name,Code as Course
from Students natural inner join
Classes as C(Code,number)
STUDENTS Name Number Sex
Ben 3412 M
Page 64
64
Dan 1234 M
Nel 2341 F
CLASSES Code Num
670 1234
680 1234
680 4123
Name Course
Dan 670
Dan 680
The keyword NATURAL appears before the join type. The meaning of the join
condition natural, in terms of tuples from two relations match, is straightforward. The
ordering of the attributes in the result of a natural join is as follows. The join attributes appear
first, in the order in which they appear in the order in the left hand side relation. Next come
all nonjoin attributes of the left hand side relation and finally all nonjoin attributes of the right
hand side relation.
Page 65
65
Consider the following two tables with data.
LOAN (branchname, loannumber, amount)
downtown L-170 3000
redwood L-230 4000
perryridge L-260 1700
BORROWER(customername, loannumber)
jones L-170
smith L-230
Hayes L-155
branchname, loannumber, amount, customername, loannumber
downtown L-170 3000 jones L-170
redwood L-230 4000 smith L-230
fig. Result of loan inner join borrower on loan.loannumber=borrower.loannumber
branchname, loannumber, amount, customername, loannumber
downtown L-170 3000 jones L-170
redwood L-230 4000 smith L-230
perryridge L-260 1700 null null
fig. Result of loan left outer join borrower on loan.loannumber=borrower.loannumber
branchname, loannumber, amount, customername
downtown L-170 3000 jones
redwood L-230 4000 smith
null L-155 1700 hayes
fig. Result of loan natural right outer join borrower
branchname, loannumber, amount, customername
downtown L-170 3000 jones
redwood L-230 4000 smith
Page 66
66
perryridge L-260 1700 null
null L-155 null hayes
fig. Result of loan full outer join borrower using(loannumber)
Data Definition Language (DDL) in SQL
Table Definition
create table name (attributes)
create table Student(
Name varchar (5),
registrationNo number(4),
class char(1),
gender char(1),
Joining_dt date
)
Default Values
default value | user | null
create table Student(
Name varchar (5),
registrationNo number(4),
class char(1),
gender char(1) default 'M',
Joining_dt date
)
Constraints on Attributes
not null, unique, primary key ,foreign key, check
create table Student(
Name varchar (5),
registrationNo number(4) constraint pk_registrationNo Primary key,
class char(1) constraint uniq_class UNIQUE,
gender char(1) constraint chk_gender check(gender in('M','F')),
Joining_dt date
)
Page 67
67
Referential Triggers
Foreign key:
create table bank(
bank_code number(10),
bank_name varchar2(60),
constraint pk_bank_code primary key(bank_code)
)
Create table branch(
branch_code number(10) constraint pk_branch_code primary key,
bank_code number(10),
branch_name varchar2(60),
constraint fk_bank_code foreign key(bank_code) references bank(bank_code)
on delete set cascade
);
When violating referential constraints
In the default case, requested updates are rejected
Alternative actions may be requested
on <delete | update> < cascade | set null | no action>
updates (external table/master/parent) /deletes (external tables)
cascade Updates/ deletes child record automatically
set null change to null in the internal table(child or detail table)
no action reject action
User-defined Data Types
create domain [name] as known-domain [default-value] [constraints]
create domain person_name varchar2(60);
Data Manipulation Language(DML) in SQL
Inserting Rows
insert into table [(attributes)]
<values (values) | SQL-query >
Page 68
68
insert into Students (Name,Number,Sex)
values ('Don',4123,'F')
Page 69
69
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
Don 4123 F
The previous example inserts a single record, the following incorporates information
from an alternative table.
insert into Students
(select Name,Number,Sex
from Applicants
where State = 'OH')
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
APPLICANTS Name Number Sex State
Don 4123 F OH
Pam 3421 F MI
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
Don 4123 F
Incomplete insertions are similar to
insert into Students (Name,Number)
(select Name,Number
Page 70
70
from Applicants
where State = 'OH')
Deleting Rows
delete from table
[where condition]
delete from Students
where Number < 2000
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
Name Number Sex
Ben 3412 M
Nel 2341 F
Updating Attributes
update table
set attribute = <expr | SQL-query | null | default > ,...
[where condition]
update Students
set Name = 'Tom', Number = Number + 5
where Name = 'Dan'
STUDENTS Name Number Sex
Ben 3412 M
Dan 1234 M
Nel 2341 F
Name Number Sex
Ben 3412 M
Tom 1239 M
Nel 2341 F
Updating Table Definitions
alter table name
<
add constraint def |
drop constraint constraint |
add column def |
Page 71
71
drop column name |
alter column name < set default value | drop default >
>
Names can be assigned to constraints by a prefix of the form constraint name.
create table Student(
Name varchar (5) not null,
Number numeric(4) primary key,
constraint foo primary key(Number)
)
alter table Student
add column BirthDate date
alter table bank
add constraint pk_bankcode primary key(bank_code);
Removing Components
drop < table | view | assertion > name [restrict | cascade ]
restrict asks the action to take place only if the component is empty
cascade removes the component and its dependents
Page 72
72
Mysql Tutorial
9.5 Creating and Using a Database
Now that you know how to enter commands, it's time to access a database.
Suppose you have several pets in your home (your menagerie) and you'd like to keep track of
various types of information about them. You can do so by creating tables to hold your data
and loading them with the desired information. Then you can answer different sorts of
questions about your animals by retrieving data from the tables. This section shows you how
to:
• Create a database
• Create a table
• Load data into the table
• Retrieve data from the table in various ways
• Use multiple tables
The menagerie database will be simple (deliberately), but it is not difficult to think of real-
world situations in which a similar type of database might be used. For example, a database
like this could be used by a farmer to keep track of livestock, or by a veterinarian to keep
track of patient records.
Use the SHOW statement to find out what databases currently exist on the server:
mysql> SHOW DATABASES;
+----------+
| Database |
+----------+
| mysql |
| test |
| tmp |
+----------+
Page 73
73
The list of databases is probably different on your machine, but the mysql and test databases
are likely to be among them. The mysql database is required because it describes user access
privileges. The test database is often provided as a workspace for users to try things out.
If the test database exists, try to access it:
mysql> USE test
Database changed
Note that USE, like QUIT, does not require a semicolon. (You can terminate such statements
with a semicolon if you like; it does no harm.) The USE statement is special in another way,
too: it must be given on a single line.
You can use the test database (if you have access to it) for the examples that follow, but
anything you create in that database can be removed by anyone else with access to it. For this
reason, you should probably ask your MySQL administrator for permission to use a database
of your own. Suppose you want to call yours menagerie. The administrator needs to execute a
command like this:
mysql> GRANT ALL ON menagerie.* TO your_mysql_name;
where your_mysql_name is the MySQL user name assigned to you.
9.5.1 Creating and Selecting a Database
If the administrator creates your database for you when setting up your permissions, you can
begin using it. Otherwise, you need to create it yourself:
mysql> CREATE DATABASE menagerie;
Under Unix, database names are case sensitive (unlike SQL keywords), so you must always
refer to your database as menagerie, not as Menagerie, MENAGERIE, or some other variant.
This is also true for table names. (Under Windows, this restriction does not apply, although
you must refer to databases and tables using the same lettercase throughout a given query.)
Creating a database does not select it for use; you must do that explicitly. To make menagerie
the current database, use this command:
Page 74
74
mysql> USE menagerie
Database changed
Your database needs to be created only once, but you must select it for use each time you
begin a mysql session. You can do this by issuing a USE statement as shown above.
Alternatively, you can select the database on the command line when you invoke mysql. Just
specify its name after any connection parameters that you might need to provide. For
example:
shell> mysql -h host -u user -p menagerie
Enter password: ********
Note that menagerie is not your password on the command just shown. If you want to supply
your password on the command line after the -p option, you must do so with no intervening
space (for example, as -pmypassword, not as -p mypassword). However, putting your
password on the command line is not recommended, because doing so exposes it to snooping
by other users logged in on your machine.
9.5.2 Creating a Table
Creating the database is the easy part, but at this point it's empty, as SHOW TABLES will tell
you:
mysql> SHOW TABLES;
Empty set (0.00 sec)
The harder part is deciding what the structure of your database should be: what tables you
will need and what columns will be in each of them.
You'll want a table that contains a record for each of your pets. This can be called the pet
table, and it should contain, as a bare minimum, each animal's name. Because the name by
itself is not very interesting, the table should contain other information. For example, if more
than one person in your family keeps pets, you might want to list each animal's owner. You
might also want to record some basic descriptive information such as species and sex.
Page 75
75
How about age? That might be of interest, but it's not a good thing to store in a database. Age
changes as time passes, which means you'd have to update your records often. Instead, it's
better to store a fixed value such as date of birth. Then, whenever you need age, you can
calculate it as the difference between the current date and the birth date. MySQL provides
functions for doing date arithmetic, so this is not difficult. Storing birth date rather than age
has other advantages, too:
• You can use the database for tasks such as generating reminders for upcoming pet
birthdays. (If you think this type of query is somewhat silly, note that it is the same
question you might ask in the context of a business database to identify clients to
whom you'll soon need to send out birthday greetings, for that computer-assisted
personal touch.)
• You can calculate age in relation to dates other than the current date. For example, if
you store death date in the database, you can easily calculate how old a pet was when
it died.
You can probably think of other types of information that would be useful in the pet table, but
the ones identified so far are sufficient for now: name, owner, species, sex, birth, and death.
Use a CREATE TABLE statement to specify the layout of your table:
mysql> CREATE TABLE pet (name VARCHAR(20), owner VARCHAR(20),
-> species VARCHAR(20), sex CHAR(1), birth DATE, death DATE);
VARCHAR is a good choice for the name, owner, and species columns because the column
values will vary in length. The lengths of those columns need not all be the same, and need
not be 20. You can pick any length from 1 to 255, whatever seems most reasonable to you. (If
you make a poor choice and it turns out later that you need a longer field, MySQL provides
an ALTER TABLE statement.)
Animal sex can be represented in a variety of ways, for example, "m" and "f" , or perhaps
"male" and "female". It's simplest to use the single characters "m" and "f" .
The use of the DATE data type for the birth and death columns is a fairly obvious choice.
Now that you have created a table, SHOW TABLES should produce some output:
Page 76
76
mysql> SHOW TABLES;
+---------------------+
| Tables in menagerie |
+---------------------+
| pet |
+---------------------+
To verify that your table was created the way you expected, use a DESCRIBE statement:
mysql> DESCRIBE pet;
+---------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| name | varchar(20) | YES | | NULL | |
| owner | varchar(20) | YES | | NULL | |
| species | varchar(20) | YES | | NULL | |
| sex | char(1) | YES | | NULL | |
| birth | date | YES | | NULL | |
| death | date | YES | | NULL | |
+---------+-------------+------+-----+---------+-------+
You can use DESCRIBE any time, for example, if you forget the names of the columns in
your table or what types they are.
9.5.3 Loading Data into a Table
After creating your table, you need to populate it. The LOAD DATA and INSERT statements
are useful for this.
Suppose your pet records can be described as shown below. (Observe that MySQL expects
dates in YYYY-MM-DD format; this may be different than what you are used to.)
name owner species sex birth death
Fluffy Harold cat f 1993-02-04
Claws Gwen cat m 1994-03-17
Page 77
77
Buffy Harold dog f 1989-05-13
Fang Benny dog m 1990-08-27
Bowser Diane dog m 1998-08-31 1995-07-29
Chirpy Gwen bird f 1998-09-11
Whistler Gwen bird 1997-12-09
Slim Benny snake m 1996-04-29
Because you are beginning with an empty table, an easy way to populate it is to create a text
file containing a row for each of your animals, then load the contents of the file into the table
with a single statement.
You could create a text file `pet.txt' containing one record per line, with values separated by
tabs, and given in the order in which the columns were listed in the CREATE TABLE
statement. For missing values (such as unknown sexes or death dates for animals that are still
living), you can use NULL values. To represent these in your text file, use \N. For example,
the record for Whistler the bird would look like this (where the whitespace between values is
a single tab character):
Whistler Gwen bird \N 1997-12-09 \N
To load the text file `pet.txt' into the pet table, use this command:
mysql> LOAD DATA LOCAL INFILE "pet.txt" INTO TABLE pet;
You can specify the column value separator and end of line marker explicitly in the LOAD
DATA statement if you wish, but the defaults are tab and linefeed. These are sufficient for
the statement to read the file `pet.txt' properly.
When you want to add new records one at a time, the INSERT statement is useful. In its
simplest form, you supply values for each column, in the order in which the columns were
listed in the CREATE TABLE statement. Suppose Diane gets a new hamster named Puffball.
You could add a new record using an INSERT statement like this:
mysql> INSERT INTO pet
Page 78
78
-> VALUES ('Puffball','Diane','hamster','f','1999-03-30',NULL);
Note that string and date values are specified as quoted strings here. Also, with INSERT, you
can insert NULL directly to represent a missing value. You do not use \N like you do with
LOAD DATA.
From this example, you should be able to see that there would be a lot more typing involved
to load your records initially using several INSERT statements rather than a single LOAD
DATA statement.
9.5.4 Retrieving Information from a Table
The SELECT statement is used to pull information from a table. The general form of the
statement is:
SELECT what_to_select
FROM which_table
WHERE conditions_to_satisfy
what_to_select indicates what you want to see. This can be a list of columns, or * to indicate
``all columns.'' which_table indicates the table from which you want to retrieve data. The
WHERE clause is optional. If it's present, conditions_to_satisfy specifies conditions that rows
must satisfy to qualify for retrieval.
9.5.4.1 Selecting All Data
The simplest form of SELECT retrieves everything from a table:
mysql> SELECT * FROM pet;
+----------+--------+---------+------+------------+------------+
| name | owner | species | sex | birth | death |
+----------+--------+---------+------+------------+------------+
| Fluffy | Harold | cat | f | 1993-02-04 | NULL |
| Claws | Gwen | cat | m | 1994-03-17 | NULL |
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
Page 79
79
| Fang | Benny | dog | m | 1990-08-27 | NULL |
| Bowser | Diane | dog | m | 1998-08-31 | 1995-07-29 |
| Chirpy | Gwen | bird | f | 1998-09-11 | NULL |
| Whistler | Gwen | bird | NULL | 1997-12-09 | NULL |
| Slim | Benny | snake | m | 1996-04-29 | NULL |
| Puffball | Diane | hamster | f | 1999-03-30 | NULL |
+----------+--------+---------+------+------------+------------+
This form of SELECT is useful if you want to review your entire table, for instance, after
you've just loaded it with your initial dataset. As it happens, the output just shown reveals an
error in your data file: Bowser appears to have been born after he died! Consulting your
original pedigree papers, you find that the correct birth year is 1989, not 1998.
There are are least a couple of ways to fix this:
• Edit the file pet.txt' to correct the error, then empty the table and reload it using
DELETE and LOAD DATA:
• mysql> SET AUTOCOMMIT=1; # Used for quick re-create of the table
• mysql> DELETE FROM pet;
• mysql> LOAD DATA LOCAL INFILE "pet.txt" INTO TABLE pet;
However, if you do this, you must also re-enter the record for Puffball.
• Fix only the erroneous record with an UPDATE statement:
• mysql> UPDATE pet SET birth = "1989-08-31" WHERE name = "Bowser";
As shown above, it is easy to retrieve an entire table. But typically you don't want to do that,
particularly when the table becomes large. Instead, you're usually more interested in
answering a particular question, in which case you specify some constraints on the
information you want. Let's look at some selection queries in terms of questions about your
pets that they answer.
9.5.4.2 Selecting Particular Rows
You can select only particular rows from your table. For example, if you want to verify the
change that you made to Bowser's birth date, select Bowser's record like this:
Page 80
80
mysql> SELECT * FROM pet WHERE name = "Bowser";
+--------+-------+---------+------+------------+------------+
| name | owner | species | sex | birth | death |
+--------+-------+---------+------+------------+------------+
| Bowser | Diane | dog | m | 1989-08-31 | 1995-07-29 |
+--------+-------+---------+------+------------+------------+
The output confirms that the year is correctly recorded now as 1989, not 1998.
String comparisons are normally case insensitive, so you can specify the name as "bowser",
"BOWSER", etc. The query result will be the same.
You can specify conditions on any column, not just name. For example, if you want to know
which animals were born after 1998, test the birth column:
mysql> SELECT * FROM pet WHERE birth >= "1998-1-1";
+----------+-------+---------+------+------------+-------+
| name | owner | species | sex | birth | death |
+----------+-------+---------+------+------------+-------+
| Chirpy | Gwen | bird | f | 1998-09-11 | NULL |
| Puffball | Diane | hamster | f | 1999-03-30 | NULL |
+----------+-------+---------+------+------------+-------+
You can combine conditions, for example, to locate female dogs:
mysql> SELECT * FROM pet WHERE species = "dog" AND sex = "f";
+-------+--------+---------+------+------------+-------+
| name | owner | species | sex | birth | death |
+-------+--------+---------+------+------------+-------+
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
+-------+--------+---------+------+------------+-------+
The preceding query uses the AND logical operator. There is also an OR operator:
mysql> SELECT * FROM pet WHERE species = "snake" OR species = "bird";
+----------+-------+---------+------+------------+-------+
Page 81
81
| name | owner | species | sex | birth | death |
+----------+-------+---------+------+------------+-------+
| Chirpy | Gwen | bird | f | 1998-09-11 | NULL |
| Whistler | Gwen | bird | NULL | 1997-12-09 | NULL |
| Slim | Benny | snake | m | 1996-04-29 | NULL |
+----------+-------+---------+------+------------+-------+
AND and OR may be intermixed. If you do that, it's a good idea to use parentheses to indicate
how conditions should be grouped:
mysql> SELECT * FROM pet WHERE (species = "cat" AND sex = "m")
-> OR (species = "dog" AND sex = "f");
+-------+--------+---------+------+------------+-------+
| name | owner | species | sex | birth | death |
+-------+--------+---------+------+------------+-------+
| Claws | Gwen | cat | m | 1994-03-17 | NULL |
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
+-------+--------+---------+------+------------+-------+
9.5.4.3 Selecting Particular Columns
If you don't want to see entire rows from your table, just name the columns in which you're
interested, separated by commas. For example, if you want to know when your animals were
born, select the name and birth columns:
mysql> SELECT name, birth FROM pet;
+----------+------------+
| name | birth |
+----------+------------+
| Fluffy | 1993-02-04 |
| Claws | 1994-03-17 |
| Buffy | 1989-05-13 |
| Fang | 1990-08-27 |
| Bowser | 1989-08-31 |
| Chirpy | 1998-09-11 |
Page 82
82
| Whistler | 1997-12-09 |
| Slim | 1996-04-29 |
| Puffball | 1999-03-30 |
+----------+------------+
To find out who owns pets, use this query:
mysql> SELECT owner FROM pet;
+--------+
| owner |
+--------+
| Harold |
| Gwen |
| Harold |
| Benny |
| Diane |
| Gwen |
| Gwen |
| Benny |
| Diane |
+--------+
However, notice that the query simply retrieves the owner field from each record, and some
of them appear more than once. To minimize the output, retrieve each unique output record
just once by adding the keyword DISTINCT:
mysql> SELECT DISTINCT owner FROM pet;
+--------+
| owner |
+--------+
| Benny |
| Diane |
| Gwen |
| Harold |
+--------+
Page 83
83
You can use a WHERE clause to combine row selection with column selection. For example,
to get birth dates for dogs and cats only, use this query:
mysql> SELECT name, species, birth FROM pet
-> WHERE species = "dog" OR species = "cat";
+--------+---------+------------+
| name | species | birth |
+--------+---------+------------+
| Fluffy | cat | 1993-02-04 |
| Claws | cat | 1994-03-17 |
| Buffy | dog | 1989-05-13 |
| Fang | dog | 1990-08-27 |
| Bowser | dog | 1989-08-31 |
+--------+---------+------------+
9.5.4.4 Sorting Rows
You may have noticed in the preceding examples that the result rows are displayed in no
particular order. However, it's often easier to examine query output when the rows are sorted
in some meaningful way. To sort a result, use an ORDER BY clause.
Here are animal birthdays, sorted by date:
mysql> SELECT name, birth FROM pet ORDER BY birth;
+----------+------------+
| name | birth |
+----------+------------+
| Buffy | 1989-05-13 |
| Bowser | 1989-08-31 |
| Fang | 1990-08-27 |
| Fluffy | 1993-02-04 |
| Claws | 1994-03-17 |
| Slim | 1996-04-29 |
| Whistler | 1997-12-09 |
| Chirpy | 1998-09-11 |
Page 84
84
| Puffball | 1999-03-30 |
+----------+------------+
To sort in reverse order, add the DESC (descending) keyword to the name of the column you
are sorting by:
mysql> SELECT name, birth FROM pet ORDER BY birth DESC;
+----------+------------+
| name | birth |
+----------+------------+
| Puffball | 1999-03-30 |
| Chirpy | 1998-09-11 |
| Whistler | 1997-12-09 |
| Slim | 1996-04-29 |
| Claws | 1994-03-17 |
| Fluffy | 1993-02-04 |
| Fang | 1990-08-27 |
| Bowser | 1989-08-31 |
| Buffy | 1989-05-13 |
+----------+------------+
You can sort on multiple columns. For example, to sort by type of animal, then by birth date
within animal type with youngest animals first, use the following query:
mysql> SELECT name, species, birth FROM pet ORDER BY species, birth DESC;
+----------+---------+------------+
| name | species | birth |
+----------+---------+------------+
| Chirpy | bird | 1998-09-11 |
| Whistler | bird | 1997-12-09 |
| Claws | cat | 1994-03-17 |
| Fluffy | cat | 1993-02-04 |
| Fang | dog | 1990-08-27 |
| Bowser | dog | 1989-08-31 |
| Buffy | dog | 1989-05-13 |
Page 85
85
| Puffball | hamster | 1999-03-30 |
| Slim | snake | 1996-04-29 |
+----------+---------+------------+
Note that the DESC keyword applies only to the column name immediately preceding it
(birth); species values are still sorted in ascending order.
9.5.4.5 Date Calculations
MySQL provides several functions that you can use to perform calculations on dates, for
example, to calculate ages or extract parts of dates.
To determine how many years old each of your pets is, compute age as the difference
between the birth date and the current date. Do this by converting the two dates to days, take
the difference, and divide by 365 (the number of days in a year):
mysql> SELECT name, (TO_DAYS(NOW())-TO_DAYS(birth))/365 FROM pet;
+----------+-------------------------------------+
| name | (TO_DAYS(NOW())-TO_DAYS(birth))/365 |
+----------+-------------------------------------+
| Fluffy | 6.15 |
| Claws | 5.04 |
| Buffy | 9.88 |
| Fang | 8.59 |
| Bowser | 9.58 |
| Chirpy | 0.55 |
| Whistler | 1.30 |
| Slim | 2.92 |
| Puffball | 0.00 |
+----------+-------------------------------------+
Although the query works, there are some things about it that could be improved. First, the
result could be scanned more easily if the rows were presented in some order. Second, the
heading for the age column isn't very meaningful.
Page 86
86
The first problem can be handled by adding an ORDER BY name clause to sort the output by
name. To deal with the column heading, provide a name for the column so that a different
label appears in the output (this is called a column alias):
mysql> SELECT name, (TO_DAYS(NOW())-TO_DAYS(birth))/365 AS age
-> FROM pet ORDER BY name;
+----------+------+
| name | age |
+----------+------+
| Bowser | 9.58 |
| Buffy | 9.88 |
| Chirpy | 0.55 |
| Claws | 5.04 |
| Fang | 8.59 |
| Fluffy | 6.15 |
| Puffball | 0.00 |
| Slim | 2.92 |
| Whistler | 1.30 |
+----------+------+
To sort the output by age rather than name, just use a different ORDER BY clause:
mysql> SELECT name, (TO_DAYS(NOW())-TO_DAYS(birth))/365 AS age
-> FROM pet ORDER BY age;
+----------+------+
| name | age |
+----------+------+
| Puffball | 0.00 |
| Chirpy | 0.55 |
| Whistler | 1.30 |
| Slim | 2.92 |
| Claws | 5.04 |
| Fluffy | 6.15 |
| Fang | 8.59 |
| Bowser | 9.58 |
Page 87
87
| Buffy | 9.88 |
+----------+------+
A similar query can be used to determine age at death for animals that have died. You
determine which animals these are by checking whether or not the death value is NULL.
Then, for those with non-NULL values, compute the difference between the death and birth
values:
mysql> SELECT name, birth, death, (TO_DAYS(death)-TO_DAYS(birth))/365 AS age
-> FROM pet WHERE death IS NOT NULL ORDER BY age;
+--------+------------+------------+------+
| name | birth | death | age |
+--------+------------+------------+------+
| Bowser | 1989-08-31 | 1995-07-29 | 5.91 |
+--------+------------+------------+------+
The query uses death IS NOT NULL rather than death != NULL because NULL is a special
value. This is explained later. See section 9.5.4.6 Working with NULL Values.
What if you want to know which animals have birthdays next month? For this type of
calculation, year and day are irrelevant; you simply want to extract the month part of the birth
column. MySQL provides several date-part extraction functions, such as YEAR(),
MONTH(), and DAYOFMONTH(). MONTH() is the appropriate function here. To see how
it works, run a simple query that displays the value of both birth and MONTH(birth):
mysql> SELECT name, birth, MONTH(birth) FROM pet;
+----------+------------+--------------+
| name | birth | MONTH(birth) |
+----------+------------+--------------+
| Fluffy | 1993-02-04 | 2 |
| Claws | 1994-03-17 | 3 |
| Buffy | 1989-05-13 | 5 |
| Fang | 1990-08-27 | 8 |
| Bowser | 1989-08-31 | 8 |
| Chirpy | 1998-09-11 | 9 |
Page 88
88
| Whistler | 1997-12-09 | 12 |
| Slim | 1996-04-29 | 4 |
| Puffball | 1999-03-30 | 3 |
+----------+------------+--------------+
Finding animals with birthdays in the upcoming month is easy, too. Suppose the current
month is April. Then the month value is 4 and you look for animals born in May (month 5)
like this:
mysql> SELECT name, birth FROM pet WHERE MONTH(birth) = 5;
+-------+------------+
| name | birth |
+-------+------------+
| Buffy | 1989-05-13 |
+-------+------------+
There is a small complication if the current month is December, of course. You don't just add
one to the month number (12) and look for animals born in month 13, because there is no
such month. Instead, you look for animals born in January (month 1).
You can even write the query so that it works no matter what the current month is. That way
you don't have to use a particular month number in the query. DATE_ADD() allows you to
add a time interval to a given date. If you add a month to the value of NOW(), then extract
the month part with MONTH(), the result produces the month in which to look for birthdays:
mysql> SELECT name, birth FROM pet
-> WHERE MONTH(birth) = MONTH(DATE_ADD(NOW(), INTERVAL 1 MONTH));
A different way to accomplish the same task is to add 1 to get the next month after the current
one (after using the modulo function (MOD) to wrap around the month value to 0 if it is
currently 12):
mysql> SELECT name, birth FROM pet
-> WHERE MONTH(birth) = MOD(MONTH(NOW()), 12) + 1;
Page 89
89
Note that MONTH returns a number between 1 and 12. And MOD(something,12) returns a
number between 0 and 11. So the addition has to be after the MOD() otherwise we would go
from November (11) to January (1).
9.5.4.6 Working with NULL Values
The NULL value can be surprising until you get used to it. Conceptually, NULL means
missing value or unknown value and it is treated somewhat differently than other values. To
test for NULL, you cannot use the arithmetic comparison operators such as =, <, or !=. To
demonstrate this for yourself, try the following query:
mysql> SELECT 1 = NULL, 1 != NULL, 1 < NULL, 1 > NULL;
+----------+-----------+----------+----------+
| 1 = NULL | 1 != NULL | 1 < NULL | 1 > NULL |
+----------+-----------+----------+----------+
| NULL | NULL | NULL | NULL |
+----------+-----------+----------+----------+
Clearly you get no meaningful results from these comparisons. Use the IS NULL and IS NOT
NULL operators instead:
mysql> SELECT 1 IS NULL, 1 IS NOT NULL;
+-----------+---------------+
| 1 IS NULL | 1 IS NOT NULL |
+-----------+---------------+
| 0 | 1 |
+-----------+---------------+
In MySQL , 0 means false and 1 means true.
This special treatment of NULL is why, in the previous section, it was necessary to determine
which animals are no longer alive using death IS NOT NULL instead of death != NULL.
9.5.4.7 Pattern Matching
Page 90
90
MySQL provides standard SQL pattern matching as well as a form of pattern matching based
on extended regular expressions similar to those used by Unix utilities such as vi, grep, and
sed.
SQL pattern matching allows you to use `_' to match any single character and `%' to match an
arbitrary number of characters (including zero characters). In MySQL , SQL patterns are case
insensitive by default. Some examples are shown below. Note that you do not use = or !=
when you use SQL patterns; use the LIKE or NOT LIKE comparison operators instead.
To find names beginning with `b':
mysql> SELECT * FROM pet WHERE name LIKE "b%";
+--------+--------+---------+------+------------+------------+
| name | owner | species | sex | birth | death |
+--------+--------+---------+------+------------+------------+
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
| Bowser | Diane | dog | m | 1989-08-31 | 1995-07-29 |
+--------+--------+---------+------+------------+------------+
To find names ending with `fy':
mysql> SELECT * FROM pet WHERE name LIKE "%fy";
+--------+--------+---------+------+------------+-------+
| name | owner | species | sex | birth | death |
+--------+--------+---------+------+------------+-------+
| Fluffy | Harold | cat | f | 1993-02-04 | NULL |
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
+--------+--------+---------+------+------------+-------+
To find names containing a `w':
mysql> SELECT * FROM pet WHERE name LIKE "%w%";
+----------+-------+---------+------+------------+------------+
| name | owner | species | sex | birth | death |
+----------+-------+---------+------+------------+------------+
| Claws | Gwen | cat | m | 1994-03-17 | NULL |
Page 91
91
| Bowser | Diane | dog | m | 1989-08-31 | 1995-07-29 |
| Whistler | Gwen | bird | NULL | 1997-12-09 | NULL |
+----------+-------+---------+------+------------+------------+
To find names containing exactly five characters, use the _' pattern character:
mysql> SELECT * FROM pet WHERE name LIKE "_____";
+-------+--------+---------+------+------------+-------+
| name | owner | species | sex | birth | death |
+-------+--------+---------+------+------------+-------+
| Claws | Gwen | cat | m | 1994-03-17 | NULL |
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
+-------+--------+---------+------+------------+-------+
The other type of pattern matching provided by MySQL uses extended regular expressions.
When you test for a match for this type of pattern, use the REGEXP and NOT REGEXP
operators (or RLIKE and NOT RLIKE, which are synonyms).
Some characteristics of extended regular expressions are:
• `.' matches any single character.
• A character class `[...]' matches any character within the brackets. For example, [abc]'
matches a', b', or c'. To name a range of characters, use a dash. `[a-z]' matches any
lowercase letter, whereas `[0-9]' matches any digit.
• `*' matches zero or more instances of the thing preceding it. For example, `x*'
matches any number of `x' characters, `[0-9]*' matches any number of digits, and `.*'
matches any number of anything.
• Regular expressions are case sensitive, but you can use a character class to match both
lettercases if you wish. For example, `[aA]' matches lowercase or uppercase `a' and
`[a-zA-Z]' matches any letter in either case.
• The pattern matches if it occurs anywhere in the value being tested. (SQL patterns
match only if they match the entire value.)
• To anchor a pattern so that it must match the beginning or end of the value being
tested, use `^' at the beginning or `$' at the end of the pattern.
Page 92
92
To demonstrate how extended regular expressions work, the LIKE queries shown above are
rewritten below to use REGEXP.
To find names beginning with `b', use ^' to match the beginning of the name and `[bB]' to
match either lowercase or uppercase `b':
mysql> SELECT * FROM pet WHERE name REGEXP "^[bB]";
+--------+--------+---------+------+------------+------------+
| name | owner | species | sex | birth | death |
+--------+--------+---------+------+------------+------------+
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
| Bowser | Diane | dog | m | 1989-08-31 | 1995-07-29 |
+--------+--------+---------+------+------------+------------+
To find names ending with `fy', use $' to match the end of the name:
mysql> SELECT * FROM pet WHERE name REGEXP "fy$";
+--------+--------+---------+------+------------+-------+
| name | owner | species | sex | birth | death |
+--------+--------+---------+------+------------+-------+
| Fluffy | Harold | cat | f | 1993-02-04 | NULL |
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
+--------+--------+---------+------+------------+-------+
To find names containing a `w', use [wW]' to match either lowercase or uppercase `w':
mysql> SELECT * FROM pet WHERE name REGEXP "[wW]";
+----------+-------+---------+------+------------+------------+
| name | owner | species | sex | birth | death |
+----------+-------+---------+------+------------+------------+
| Claws | Gwen | cat | m | 1994-03-17 | NULL |
| Bowser | Diane | dog | m | 1989-08-31 | 1995-07-29 |
| Whistler | Gwen | bird | NULL | 1997-12-09 | NULL |
+----------+-------+---------+------+------------+------------+
Page 93
93
Because a regular expression pattern matches if it occurs anywhere in the value, it is not
necessary in the previous query to put a wild card on either side of the pattern to get it to
match the entire value like it would be if you used a SQL pattern.
To find names containing exactly five characters, use ^' and $' to match the beginning and
end of the name, and five instances of `.' in between:
mysql> SELECT * FROM pet WHERE name REGEXP "^.....$";
+-------+--------+---------+------+------------+-------+
| name | owner | species | sex | birth | death |
+-------+--------+---------+------+------------+-------+
| Claws | Gwen | cat | m | 1994-03-17 | NULL |
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
+-------+--------+---------+------+------------+-------+
You could also write the previous query using the `n' ``repeat-n-times'' operator:
mysql> SELECT * FROM pet WHERE name REGEXP "^.5$";
+-------+--------+---------+------+------------+-------+
| name | owner | species | sex | birth | death |
+-------+--------+---------+------+------------+-------+
| Claws | Gwen | cat | m | 1994-03-17 | NULL |
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
+-------+--------+---------+------+------------+-------+
9.5.4.8 Counting Rows
Databases are often used to answer the question, ``How often does a certain type of data
occur in a table?'' For example, you might want to know how many pets you have, or how
many pets each owner has, or you might want to perform various kinds of censuses on your
animals.
Counting the total number of animals you have is the same question as ``How many rows are
in the pet table?'' because there is one record per pet. The COUNT() function counts the
number of non-NULL results, so the query to count your animals looks like this:
Page 94
94
mysql> SELECT COUNT(*) FROM pet;
+----------+
| COUNT(*) |
+----------+
| 9 |
+----------+
Earlier, you retrieved the names of the people who owned pets. You can use COUNT() if you
want to find out how many pets each owner has:
mysql> SELECT owner, COUNT(*) FROM pet GROUP BY owner;
+--------+----------+
| owner | COUNT(*) |
+--------+----------+
| Benny | 2 |
| Diane | 2 |
| Gwen | 3 |
| Harold | 2 |
+--------+----------+
Note the use of GROUP BY to group together all records for each owner. Without it, all you
get is an error message:
mysql> SELECT owner, COUNT(owner) FROM pet;
ERROR 1140 at line 1: Mixing of GROUP columns (MIN(),MAX(),COUNT()...)
with no GROUP columns is illegal if there is no GROUP BY clause
COUNT() and GROUP BY are useful for characterizing your data in various ways. The
following examples show different ways to perform animal census operations.
Number of animals per species:
mysql> SELECT species, COUNT(*) FROM pet GROUP BY species;
+---------+----------+
| species | COUNT(*) |
+---------+----------+
Page 95
95
| bird | 2 |
| cat | 2 |
| dog | 3 |
| hamster | 1 |
| snake | 1 |
+---------+----------+
Number of animals per sex:
mysql> SELECT sex, COUNT(*) FROM pet GROUP BY sex;
+------+----------+
| sex | COUNT(*) |
+------+----------+
| NULL | 1 |
| f | 4 |
| m | 4 |
+------+----------+
(In this output, NULL indicates sex unknown.)
Number of animals per combination of species and sex:
mysql> SELECT species, sex, COUNT(*) FROM pet GROUP BY species, sex;
+---------+------+----------+
| species | sex | COUNT(*) |
+---------+------+----------+
| bird | NULL | 1 |
| bird | f | 1 |
| cat | f | 1 |
| cat | m | 1 |
| dog | f | 1 |
| dog | m | 2 |
| hamster | f | 1 |
| snake | m | 1 |
+---------+------+----------+
Page 96
96
You need not retrieve an entire table when you use COUNT(). For example, the previous
query, when performed just on dogs and cats, looks like this:
mysql> SELECT species, sex, COUNT(*) FROM pet
-> WHERE species = "dog" OR species = "cat"
-> GROUP BY species, sex;
+---------+------+----------+
| species | sex | COUNT(*) |
+---------+------+----------+
| cat | f | 1 |
| cat | m | 1 |
| dog | f | 1 |
| dog | m | 2 |
+---------+------+----------+
Or, if you wanted the number of animals per sex only for known-sex animals:
mysql> SELECT species, sex, COUNT(*) FROM pet
-> WHERE sex IS NOT NULL
-> GROUP BY species, sex;
+---------+------+----------+
| species | sex | COUNT(*) |
+---------+------+----------+
| bird | f | 1 |
| cat | f | 1 |
| cat | m | 1 |
| dog | f | 1 |
| dog | m | 2 |
| hamster | f | 1 |
| snake | m | 1 |
+---------+------+----------+
9.5.5 Using More Than one Table
Page 97
97
The pet table keeps track of which pets you have. If you want to record other information
about them, such as events in their lives like visits to the vet or when litters are born, you
need another table. What should this table look like? It needs:
• To contain the pet name so you know which animal each event pertains to.
• A date so you know when the event occurred.
• A field to describe the event.
• An event type field, if you want to be able to categorize events.
Given these considerations, the CREATE TABLE statement for the event table might look
like this:
mysql> CREATE TABLE event (name VARCHAR(20), date DATE,
-> type VARCHAR(15), remark VARCHAR(255));
As with the pet table, it's easiest to load the initial records by creating a tab-delimited text file
containing the information:
Fluffy 1995-05-15 litter 4 kittens, 3 female, 1 male
Buffy 1993-06-23 litter 5 puppies, 2 female, 3 male
Buffy 1994-06-19 litter 3 puppies, 3 female
Chirpy 1999-03-21 vet needed beak straightened
Slim 1997-08-03 vet broken rib
Bowser 1991-10-12 kennel
Fang 1991-10-12 kennel
Fang 1998-08-28 birthday Gave him a new chew toy
Claws 1998-03-17 birthday Gave him a new flea collar
Whistler 1998-12-09 birthday First birthday
Load the records like this:
mysql> LOAD DATA LOCAL INFILE "event.txt" INTO TABLE event;
Page 98
98
Based on what you've learned from the queries you've run on the pet table, you should be
able to perform retrievals on the records in the event table; the principles are the same. But
when is the event table by itself insufficient to answer questions you might ask?
Suppose you want to find out the ages of each pet when they had their litters. The event table
indicates when this occurred, but to calculate the age of the mother, you need her birth date.
Because that is stored in the pet table, you need both tables for the query:
mysql> SELECT pet.name, (TO_DAYS(date) - TO_DAYS(birth))/365 AS age, remark
-> FROM pet, event
-> WHERE pet.name = event.name AND type = "litter";
+--------+------+-----------------------------+
| name | age | remark |
+--------+------+-----------------------------+
| Fluffy | 2.27 | 4 kittens, 3 female, 1 male |
| Buffy | 4.12 | 5 puppies, 2 female, 3 male |
| Buffy | 5.10 | 3 puppies, 3 female |
+--------+------+-----------------------------+
There are several things to note about this query:
• The FROM clause lists two tables because the query needs to pull information from
both of them.
• When combining (joining) information from multiple tables, you need to specify how
records in one table can be matched to records in the other. This is easy because they
both have a name column. The query uses WHERE clause to match up records in the
two tables based on the name values.
• Because the name column occurs in both tables, you must be specific about which
table you mean when referring to the column. This is done by prepending the table
name to the column name.
You need not have two different tables to perform a join. Sometimes it is useful to join a
table to itself, if you want to compare records in a table to other records in that same table.
For example, to find breeding pairs among your pets, you can join the pet table with itself to
pair up males and females of like species:
Page 99
99
mysql> SELECT p1.name, p1.sex, p2.name, p2.sex, p1.species
-> FROM pet AS p1, pet AS p2
-> WHERE p1.species = p2.species AND p1.sex = "f" AND p2.sex = "m";
+--------+------+--------+------+---------+
| name | sex | name | sex | species |
+--------+------+--------+------+---------+
| Fluffy | f | Claws | m | cat |
| Buffy | f | Fang | m | dog |
| Buffy | f | Bowser | m | dog |
+--------+------+--------+------+---------+
In this query, we specify aliases for the table name in order to refer to the columns and keep
straight which instance of the table each column reference is associated with.
9.6 Getting Information About Databases and Tables
What if you forget the name of a database or table, or what the structure of a given table is
(for example, what its columns are called)? MySQL addresses this problem through several
statements that provide information about the databases and tables it supports.
You have already seen SHOW DATABASES, which lists the databases managed by the
server. To find out which database is currently selected, use the DATABASE() function:
mysql> SELECT DATABASE();
+------------+
| DATABASE() |
+------------+
| menagerie |
+------------+
If you haven't selected any database yet, the result is blank.
To find out what tables the current database contains (for example, when you're not sure
about the name of a table), use this command:
mysql> SHOW TABLES;
Page 100
100
+---------------------+
| Tables in menagerie |
+---------------------+
| event |
| pet |
+---------------------+
If you want to find out about the structure of a table, the DESCRIBE command is useful; it
displays information about each of a table's columns:
mysql> DESCRIBE pet;
+---------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| name | varchar(20) | YES | | NULL | |
| owner | varchar(20) | YES | | NULL | |
| species | varchar(20) | YES | | NULL | |
| sex | char(1) | YES | | NULL | |
| birth | date | YES | | NULL | |
| death | date | YES | | NULL | |
+---------+-------------+------+-----+---------+-------+
Field indicates the column name, Type is the data type for the column, Null indicates whether
or not the column can contain NULL values, Key indicates whether or not the column is
indexed, and Default specifies the column's default value.
If you have indexes on a table, SHOW INDEX FROM tbl_name produces information about
them.