Top Banner
UNIT-I 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases Database Design Object-based and semistructured databases Data Storage and Querying Transaction Management Database Architecture Database Users and Administrators Overall Structure 1
188
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dbms Complete Notes With Addons

UNIT-I

1: Introduction

Purpose of Database Systems

View of Data

Database Languages

Relational Databases

Database Design

Object-based and semistructured databases

Data Storage and Querying

Transaction Management

Database Architecture

Database Users and Administrators

Overall Structure

1

Page 2: Dbms Complete Notes With Addons

Database Management System (DBMS)

DBMS contains information about a particular enterprise

Database Applications:

Banking: all transactions

Airlines: reservations, schedules

Universities: registration, grades

Sales: customers, products, purchases

Online retailers: order tracking, customized recommendations

Manufacturing: production, inventory, orders, supply chain

Human resources: employee records, salaries, tax deductions

Purpose of Database Systems

In the early days, database applications were built directly on top of

file systems

Drawbacks of using file systems to store data:

Atomicity of updates

Failures may leave database in an inconsistent state with partial

updates carried out

Example: Transfer of funds from one account to another should

either complete or not happen at all

Concurrent access by multiple users

Concurrent accessed needed for performance

Uncontrolled concurrent accesses can lead to inconsistencies

– Example: Two people reading a balance and updating it at the same time Security problems

2

Page 3: Dbms Complete Notes With Addons

S.NO Module as per Lecture

1 DBS Application and L1

DBMS Vs File Systems

2. View of DATA L2

3. DB Language (DML, DDL) L3

4. DB Users and Administrator L4

5. Data storage and Querying L5

6. DBMS Architecture L6

3

Page 4: Dbms Complete Notes With Addons

Database System Applications

• DBMS contains information about a particular enterprise

– Collection of interrelated data

– Set of programs to access the data

– An environment that is both convenient and efficient to use

• Database Applications:

– Banking: all transactions

– Airlines: reservations, schedules

– Universities: registration, grades

– Sales: customers, products, purchases

– Online retailers: order tracking, customized recommendations

– Manufacturing: production, inventory, orders, supply chain

– Human resources: employee records, salaries, tax deductions

• Databases touch all aspects of our lives

What Is a DBMS?

• A very large, integrated collection of data.

• Models real-world enterprise.

– Entities (e.g., students, courses)

– Relationships (e.g., Madonna is taking CS564)

• A Database Management System (DBMS) is a software package designed to store and manage databases.

Why Use a DBMS?

• Data independence and efficient access.

• Reduced application development time.

• Data integrity and security.

4

Page 5: Dbms Complete Notes With Addons

• Uniform data administration.

• Concurrent access, recovery from crashes.

Why Study Databases??

• Shift from computation to information

– at the “low end”: scramble to webspace (a mess!)

– at the “high end”: scientific applications

• Datasets increasing in diversity and volume.

– Digital libraries, interactive video, Human Genome project, EOS project

– ... need for DBMS exploding

• DBMS encompasses most of CS

– OS, languages, theory, AI, multimedia, logic

Files vs. DBMS

• Application must stage large datasets between main memory and secondary storage (e.g., buffering, page-oriented access, 32-bit addressing, etc.)

• Special code for different queries

• Must protect data from inconsistency due to multiple concurrent users

• Crash recovery

• Security and access control

Purpose of Database Systems

• In the early days, database applications were built directly on top of file systems

• Drawbacks of using file systems to store data:

– Data redundancy and inconsistency

• Multiple file formats, duplication of information in different files

– Difficulty in accessing data

• Need to write a new program to carry out each new task

5

Page 6: Dbms Complete Notes With Addons

– Data isolation — multiple files and formats

– Integrity problems

• Integrity constraints (e.g. account balance > 0) become “buried” in program code rather than being stated explicitly

• Hard to add new constraints or change existing ones

• Drawbacks of using file systems (cont.)

– Atomicity of updates

• Failures may leave database in an inconsistent state with partial updates carried out

• Example: Transfer of funds from one account to another should either complete or not happen at all

– Concurrent access by multiple users

• Concurrent accessed needed for performance

• Uncontrolled concurrent accesses can lead to inconsistencies

• Example: Two people reading a balance and updating it at the same time

– Security problems

• Hard to provide user access to some, but not all, data

• Database systems offer solutions to all the above problems

Levels of Abstraction

• Physical level: describes how a record (e.g., customer) is stored.

• Logical level: describes data stored in database, and the relationships among the data.

type customer = record

customer_id : string; customer_name : string;customer_street : string;customer_city : string;

end;

6

Page 7: Dbms Complete Notes With Addons

• View level: application programs hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes.

• DBMS used to maintain, query large datasets.

• Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security.

• Levels of abstraction give data independence.

• A DBMS typically has a layered architecture.

• DBAs hold responsible jobs and are well-paid! J

• DBMS R&D is one of the broadest, most exciting areas in CS.

View of Data

An architecture for a database system

Instances and Schemas

• Similar to types and variables in programming languages

• Schema – the logical structure of the database

7

Page 8: Dbms Complete Notes With Addons

– Example: The database consists of information about a set of customers and accounts and the relationship between them)

– Analogous to type information of a variable in a program

– Physical schema: database design at the physical level

– Logical schema: database design at the logical level

Instances and Schemas

• Instance – the actual content of the database at a particular point in time

– Analogous to the value of a variable

• Physical Data Independence – the ability to modify the physical schema without changing the logical schema

– Applications depend on the logical schema

– In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.

Data Models

• A collection of tools for describing

– Data

– Data relationships

– Data semantics

– Data constraints

• Relational model

• Entity-Relationship data model (mainly for database design)

• Object-based data models (Object-oriented and Object-relational)

• Semi structured data model (XML)

• Other older models:

– Network model

- Hierarchical model

8

Page 9: Dbms Complete Notes With Addons

Data Models

• A data model is a collection of concepts for describing data.

• A schema is a description of a particular collection of data, using the a given data model.

• The relational model of data is the most widely used model today.

– Main concept: relation, basically a table with rows and columns.

– Every relation has a schema, which describes the columns, or fields.

Example: University Database

• Conceptual schema:

– Students(sid: string, name: string, login: string,

age: integer, gpa:real)

– Courses(cid: string, cname:string, credits:integer)

– Enrolled(sid:string, cid:string, grade:string)

• Physical schema:

– Relations stored as unordered files.

– Index on first column of Students.

• External Schema (View):

– Course_info(cid:string,enrollment:integer)

Data Independence

• Applications insulated from how data is structured and stored.

• Logical data independence : Protection from changes in logical structure of data.

Physical data independence: Protection from changes in physical structure of data.

DATA BASE LANGUAGEData Manipulation Language (DML)

• Language for accessing and manipulating the data organized by the appropriate data model

9

Page 10: Dbms Complete Notes With Addons

– DML also known as query language

• Two classes of languages

– Procedural – user specifies what data is required and how to get those data

– Declarative (nonprocedural) – user specifies what data is required without specifying how to get those data

• SQL is the most widely used query language

Data Definition Language (DDL)

• Specification notation for defining the database schema

Example: create table account ( account_number char(10),

branch_name char(10),

balance integer)

• DDL compiler generates a set of tables stored in a data dictionary

• Data dictionary contains metadata (i.e., data about data)

– Database schema

– Data storage and definition language

• Specifies the storage structure and access methods used

– Integrity constraints

• Domain constraints

• Referential integrity (e.g. branch_name must correspond to a valid branch in the branch table)

– Authorization

Relational Model

• Example of tabular data in the relational model

10

Page 11: Dbms Complete Notes With Addons

A Sample Relational Database

SQL

• SQL: widely used non-procedural language

11

Page 12: Dbms Complete Notes With Addons

– Example: Find the name of the customer with customer-id 192-83-7465select customer.customer_namefrom customerwhere customer.customer_id = ‘192-83-7465’

– Example: Find the balances of all accounts held by the customer with customer-id 192-83-7465

select account.balancefrom depositor, accountwhere depositor.customer_id = ‘192-83-7465’ and

depositor.account_number = account.account_number

SQL

• Application programs generally access databases through one of

– Language extensions to allow embedded SQL

– Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database

Database Users

Users are differentiated by the way they expect to interact with

the system

• Application programmers – interact with system through DML calls

• Sophisticated users – form requests in a database query language

• Specialized users – write specialized database applications that do not fit into the traditional data processing framework

• Naïve users – invoke one of the permanent application programs that have been written previously

– Examples, people accessing database over the web, bank tellers, clerical staff

Database Administrator

• Coordinates all the activities of the database system

– has a good understanding of the enterprise’s information resources and needs.

• Database administrator's duties include:

12

Page 13: Dbms Complete Notes With Addons

– Storage structure and access method definition

– Schema and physical organization modification

– Granting users authority to access the database

– Backing up data

– Monitoring performance and responding to changes

• Database tuning

Data storage and Querying

• Storage management

• Query processing

• Transaction processing

Storage Management

• Storage manager is a program module that provides the interface between the low- level data stored in the database and the application programs and queries submitted to the system.

• The storage manager is responsible to the following tasks:

– Interaction with the file manager

– Efficient storing, retrieving and updating of data

• Issues:

– Storage access

– File organization

– Indexing and hashing

Query Processing

1.Parsing and translation

2. Optimization

3. Evaluation

13

Page 14: Dbms Complete Notes With Addons

• Alternative ways of evaluating a given query

– Equivalent expressions

– Different algorithms for each operation

• Cost difference between a good and a bad way of evaluating a query can be enormous

• Need to estimate the cost of operations

– Depends critically on statistical information about relations which the database must maintain

Need to estimate statistics for intermediate results to compute cost of complex expressions

Transaction Management

• A transaction is a collection of operations that performs a single logical function in a database application

• Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.

14

Page 15: Dbms Complete Notes With Addons

• Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.

Database Architecture

The architecture of a database systems is greatly influenced by

the underlying computer system on which the database is running:

• Centralized

• Client-server

• Parallel (multiple processors and disks)

• Distributed

Overall System Structure

15

Page 16: Dbms Complete Notes With Addons

16

Page 17: Dbms Complete Notes With Addons

Database Application Architectures

17

Page 18: Dbms Complete Notes With Addons

UNIT-II

History of Database Systems

1950s and early 1960s:

Data processing using magnetic tapes for storage

Tapes provide only sequential access

Punched cards for input

Late 1960s and 1970s:

Hard disks allow direct access to data

Network and hierarchical data models in widespread use

Ted Codd defines the relational data model

Would win the ACM Turing Award for this work

IBM Research begins System R prototype

UC Berkeley begins Ingres prototype

High-performance (for the era) transaction processing

History (cont.)

1980s:

Research relational prototypes evolve into commercial systems

SQL becomes industry standard

18

Page 19: Dbms Complete Notes With Addons

Parallel and distributed database systems

Object-oriented database systems

1990s:

Large decision support and data-mining applications

Large multi-terabyte data warehouses

Emergence of Web commerce

2000s:

XML and XQuery standards

Automated database administration

Increasing use of highly parallel database systems

Web-scale distributed data storage systems

Database design:

Conceptual design: (ER Model is used at this stage.)

What are the entities and relationships in the enterprise?

What information about these entities and relationships should we store in the database?

What are the integrity constraints or business rules that hold?

A database `schema’ in the ER Model can be represented pictorially (ER diagrams).

Can map an ER diagram into a relational schema

Modeling:

A database can be modeled as:

a collection of entities,

relationship among entities.

An entity is an object that exists and is distinguishable from other objects.

Example: specific person, company, event, plant

Entities have attributes

19

Page 20: Dbms Complete Notes With Addons

Example: people have names and addresses

An entity set is a set of entities of the same type that share the same properties.

Example: set of all persons, companies, trees, holidays

Entity Sets customer and loan:

An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set.

Domain – the set of permitted values for each attribute

Attribute types:

Simple and composite attributes.

Single-valued and multi-valued attributes

Example: multivalued attribute: phone_numbers

Derived attributes

Can be computed from other attributes

Example: age, given date_of_birth

20

Page 21: Dbms Complete Notes With Addons

Mapping Cardinality Constraints:

Express the number of entities to which another entity can be associated via a relationship set.

Most useful in describing binary relationship sets.

For a binary relationship set the mapping cardinality must be one of the following types:

One to one

One to many

Many to one

Many to many

Mapping Cardinalities:

21

Page 22: Dbms Complete Notes With Addons

ER Model Basics:

Entity: Real-world object distinguishable from other objects. An entity is described (in DB) using a set of attributes.

Entity Set: A collection of similar entities. E.g., all employees.

All entities in an entity set have the same set of attributes. (Until we consider ISA hierarchies, anyway!)

Each entity set has a key.

Each attribute has a domain.

ER Model Basics (Contd.):

Relationship: Association among two or more entities. E.g., Attishoo works in Pharmacy department.

Relationship Set: Collection of similar relationships.

22

Page 23: Dbms Complete Notes With Addons

An n-ary relationship set R relates n entity sets E1 ... En; each relationship in R involves entities e1 E1, ..., en En

Same entity set could participate in different relationship sets, or in different “roles” in same set.

A relationship is an association among several entities

Example:Hayes depositor A-102customer entity relationship set account entity

A relationship set is a mathematical relation among n ³ 2 entities, each taken from entity sets

{(e1, e2, … en) | e1 Î E1, e2 Î E2, …, en Î En}

where (e1, e2, …, en) is a relationship

Example:

(Hayes, A-102) Î depositor

Relationship Set borrower:

Relationship Sets (Cont.):

An attribute can also be property of a relationship set.

For instance, the depositor relationship set between entity sets customer and account may have the attribute access-date

23

Page 24: Dbms Complete Notes With Addons

Degree of a Relationship Set:

Refers to number of entity sets that participate in a relationship set.

Relationship sets that involve two entity sets are binary (or degree two). Generally, most relationship sets in a database system are binary.

Relationship sets may involve more than two entity sets.

Additional features of the ER model

Participation Constraints

Does every department have a manager?

If so, this is a participation constraint: the participation of Departments in Manages is said to be total (vs. partial).

Every Departments entity must appear in an instance of the Manages relationship.

24

Page 25: Dbms Complete Notes With Addons

Weak Entities:

A weak entity can be identified uniquely only by considering the primary key of another (owner) entity.

Owner entity set and weak entity set must participate in a one-to-many relationship set (one owner, many weak entities).

Weak entity set must have total participation in this identifying relationship set.

Weak Entity Sets (Cont.):

We depict a weak entity set by double rectangles.

We underline the discriminator of a weak entity set with a dashed line.

payment_number – discriminator of the payment entity set

Primary key for payment – (loan_number, payment_number)

25

Page 26: Dbms Complete Notes With Addons

Note: the primary key of the strong entity set is not explicitly stored with the weak entity set, since it is implicit in the identifying relationship.

If loan_number were explicitly stored, payment could be made a strong entity, but then the relationship between payment and loan would be duplicated by an implicit relationship defined by the attribute loan_number common to payment and loan

More Weak Entity Set Examples:

In a university, a course is a strong entity and a course_offering can be modeled as a weak entity

The discriminator of course_offering would be semester (including year) and section_number (if there is more than one section)

If we model course_offering as a strong entity we would model course_number as an attribute.

Then the relationship with course would be implicit in the course_number attribute

26

Page 27: Dbms Complete Notes With Addons

27

Page 28: Dbms Complete Notes With Addons

UNIT-III

1.Introduction to relational model

2.Enforcing integrity constraints

3.Logical Database Design

4.Logical Database Design

5. Introduction to Views

6.Relational Algebra

7.Tuple Relational Calculus

8. Domain Relational Calculus

28

Page 29: Dbms Complete Notes With Addons

Relational Database: Definitions

• Relational database: a set of relations

• Relation: made up of 2 parts:

– Instance : a table, with rows and columns. #Rows = cardinality, #fields = degree / arity.

– Schema : specifies name of relation, plus name and type of each column.

• E.G. Students (sid: string, name: string, login: string, age: integer, gpa: real).

• Can think of a relation as a set of rows or tuples (i.e., all rows are distinct).

Example Instance of Students Relation

Cardinality = 3, degree = 5, all rows distinct

Do all columns in a relation instance have to

be distinct?

Relational Query Languages

• A major strength of the relational model: supports simple, powerful querying of data.

• Queries can be written intuitively, and the DBMS is responsible for efficient evaluation.

– The key: precise semantics for relational queries.

– Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change.

The SQL Query Language

29

Page 30: Dbms Complete Notes With Addons

SELECT *

FROM Students S

WHERE S.age=18

• To find just names and logins, replace the first line:

• SELECT S.name, S.login

Querying Multiple Relations

• What does the following query compute?

• SELECT S.name, E.cid

• FROM Students S, Enrolled E

• WHERE S.sid=E.sid AND E.grade=“A”

we get:

Creating Relations in SQL

• Creates the Students relation. Observe that the type of each field is specified, and enforced by the DBMS whenever tuples are added or modified.

• As another example, the Enrolled table holds information about courses that students take.

CREATE TABLE Students

• (sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER,

gpa: REAL)

CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2))

Destroying and Altering Relations

DROP TABLE Students

• Destroys the relation Students. The schema information and the tuples are deleted.

• ALTER TABLE Students ADD COLUMN firstYear: integer

30

Page 31: Dbms Complete Notes With Addons

The schema of Students is altered by adding a new field; every tuple in the current instance is extended with a null value in the new field.

Adding and Deleting Tuples

• Can insert a single tuple using:

INSERT INTO Students (sid, name, login, age, gpa)

VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)

Can delete all tuples satisfying some condition (e.g., name = Smith):

DELETE

FROM Students S

WHERE S.name = ‘Smith’

• Integrity Constraints (ICs)

• IC: condition that must be true for any instance of the database; e.g., domain constraints.

– ICs are specified when schema is defined.

– ICs are checked when relations are modified.

• A legal instance of a relation is one that satisfies all specified ICs.

– DBMS should not allow illegal instances.

• If the DBMS checks ICs, stored data is more faithful to real-world meaning.

– Avoids data entry errors, too!

• Primary Key Constraints

• A set of fields is a key for a relation if :

1. No two distinct tuples can have same values in all key fields, and

2. This is not true for any subset of the key.

– Part 2 false? A superkey.

– If there’s >1 key for a relation, one of the keys is chosen (by DBA) to be the primary key.

• E.g., sid is a key for Students. (What about name?) The set {sid, gpa} is a superkey.

31

Page 32: Dbms Complete Notes With Addons

Primary and Candidate Keys in SQL

Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key.

For a given student and course, there is a single grade.” vs. “Students can take only one course, and receive a single grade for that course; further, no two students in a course receive the same grade.”

Used carelessly, an IC can prevent the storage of database instances that arise in practice!

CREATE TABLE Enrolled

(sid CHAR(20)

cid CHAR(20),

grade CHAR(2),

PRIMARY KEY (sid,cid) )

CREATE TABLE Enrolled

(sid CHAR(20)

cid CHAR(20),

grade CHAR(2),

PRIMARY KEY (sid),

UNIQUE (cid, grade) )

Foreign Keys, Referential Integrity

• Foreign key : Set of fields in one relation that is used to `refer’ to a tuple in another relation. (Must correspond to primary key of the second relation.) Like a `logical pointer’.

• E.g. sid is a foreign key referring to Students:

– Enrolled(sid: string, cid: string, grade: string)

– If all foreign key constraints are enforced, referential integrity is achieved, i.e., no dangling references.

– Can you name a data model w/o referential integrity?

32

Page 33: Dbms Complete Notes With Addons

• Links in HTML!

• Foreign Keys in SQL

• Only students listed in the Students relation should be allowed to enroll for courses.

• CREATE TABLE Enrolled

• (sid CHAR(20), cid CHAR(20), grade CHAR(2),

• PRIMARY KEY (sid,cid),

• FOREIGN KEY (sid) REFERENCES Students )

• Enrolled

Students

• Enforcing Referential Integrity

• Consider Students and Enrolled; sid in Enrolled is a foreign key that references Students.

• What should be done if an Enrolled tuple with a non-existent student id is inserted? (Reject it!)

• What should be done if a Students tuple is deleted?

– Also delete all Enrolled tuples that refer to it.

– Disallow deletion of a Students tuple that is referred to.

– Set sid in Enrolled tuples that refer to it to a default sid.

– (In SQL, also: Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown’ or `inapplicable’.)

• Similar if primary key of Students tuple is updated.

33

Page 34: Dbms Complete Notes With Addons

• Referential Integrity in SQL

• SQL/92 and SQL:1999 support all 4 options on deletes and updates.

• Default is NO ACTION (delete/update is rejected)

• CASCADE (also delete all tuples that refer to deleted tuple)

• SET NULL / SET DEFAULT (sets foreign key value of referencing tuple)

• CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES StudentsON DELETE CASCADE ON UPDATE SET DEFAULT )

Where do ICs Come From?

• ICs are based upon the semantics of the real-world enterprise that is being described in the database relations.

• We can check a database instance to see if an IC is violated, but we can NEVER infer that an IC is true by looking at an instance.

– An IC is a statement about all possible instances!

– From example, we know name is not a key, but the assertion that sid is a key is given to us.

• Key and foreign key ICs are the most common; more general ICs supported too.

Logical DB Design: ER to Relational

CREATE TABLE Employees (ssn CHAR(11), name CHAR(20), lot INTEGER,

PRIMARY KEY (ssn))

Relationship Sets to Tables

In translating a relationship set to a relation, attributes of the relation must include:

Keys for each participating entity set (as foreign keys).

34

Page 35: Dbms Complete Notes With Addons

This set of attributes forms a superkey for the relation.

All descriptive attributes.

CREATE TABLE Works_In( ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did)

REFERENCES Departments)

• Review: Key Constraints

• Each dept has at most one manager, according to the key constraint on Manages.

Translating ER Diagrams with Key Constraints

Map relationship to a table:

Note that did is the key now!

Separate tables for Employees and Departments.

Since each department has a unique manager, we could instead combine Manages and Departments.

35

Page 36: Dbms Complete Notes With Addons

CREATE TABLE Manages(

ssn CHAR(11),

did INTEGER,

since DATE,

PRIMARY KEY (did),

FOREIGN KEY (ssn) REFERENCES Employees,

FOREIGN KEY (did) REFERENCES Departments)

CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR(20), budget REAL,

ssn CHAR(11),since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees)

Review: Participation Constraints

• Does every department have a manager?

– If so, this is a participation constraint: the participation of Departments in Manages is said to be total (vs. partial).

• Every did value in Departments table must appear in a row of the Manages table (with a non-null ssn value!)

Participation Constraints in SQL

We can capture participation constraints involving one entity set in a binary relationship, but little else (without resorting to CHECK constraints).

36

Page 37: Dbms Complete Notes With Addons

CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR(20), budget REAL, ssn CHAR(11) NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE NO ACTION)

• Review: Weak Entities

• A weak entity can be identified uniquely only by considering the primary key of another (owner) entity.

– Owner entity set and weak entity set must participate in a one-to-many relationship set (1 owner, many weak entities).

– Weak entity set must have total participation in this identifying relationship set.

Translating Weak Entity Sets

Weak entity set and identifying relationship set are translated into a single table.

When the owner entity is deleted, all owned weak entities must also be deleted.

CREATE TABLE Dep_Policy ( pname CHAR(20), age INTEGER, cost REAL,

ssn CHAR(11) NOT NULL, PRIMARY KEY (pname, ssn), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE)

Review: ISA Hierarchies

As in C++, or other PLs, attributes are inherited.

If we declare A ISA B, every A entity is also considered to be a B entity.

37

Page 38: Dbms Complete Notes With Addons

• Overlap constraints: Can Joe be an Hourly_Emps as well as a Contract_Emps entity? (Allowed/disallowed)

• Covering constraints: Does every Employees entity also have to be an Hourly_Emps or a Contract_Emps entity? (Yes/no)

• Translating ISA Hierarchies to Relations

• General approach:

• 3 relations: Employees, Hourly_Emps and Contract_Emps.

• Hourly_Emps: Every employee is recorded in Employees. For hourly emps, extra info recorded in Hourly_Emps (hourly_wages, hours_worked, ssn); must delete Hourly_Emps tuple if referenced Employees tuple is deleted).

• Queries involving all employees easy, those involving just Hourly_Emps require a join to get some attributes.

• Alternative: Just Hourly_Emps and Contract_Emps.

• Hourly_Emps: ssn, name, lot, hourly_wages, hours_worked.

• Each employee must be in one of these two subclasses.

Review: Binary vs. Ternary Relationships

What are the additional constraints in the 2nd diagram?

38

Page 39: Dbms Complete Notes With Addons

The key constraints allow us to combine Purchaser with Policies and Beneficiary with Dependents.

Participation constraints lead to NOT NULL constraints.

What if Policies is a weak entity set?

CREATE TABLE Policies ( policyid INTEGER, cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (policyid). FOREIGN KEY (ssn) REFERENCES Employees,

ON DELETE CASCADE)

CREATE TABLE Dependents ( pname CHAR(20), age INTEGER, policyid INTEGER,

PRIMARY KEY (pname, policyid) FOREIGN KEY (policyid) REFERENCES Policies,

ON DELETE CASCADE)

Views

39

Page 40: Dbms Complete Notes With Addons

A view is just a relation, but we store a definition, rather than a set of tuples.

CREATE VIEW YoungActiveStudents (name, grade)

AS SELECT S.name, E.grade

FROM Students S, Enrolled E

WHERE S.sid = E.sid and S.age<21

Views can be dropped using the DROP VIEW command.

How to handle DROP TABLE if there’s a view on the table?

• DROP TABLE command has options to let the user specify this.

Views and Security

Views can be used to present necessary information (or a summary), while hiding details in underlying relation(s).

Given YoungStudents, but not Students or Enrolled, we can find students s who have are enrolled, but not the cid’s of the courses they are enrolled in.

• View Definition

• A relation that is not of the conceptual model but is made visible to a user as a “virtual relation” is called a view.

• A view is defined using the create view statement which has the form

create view v as < query expression >

where <query expression> is any legal SQL expression. The view name is represented by v.

• Once a view is defined, the view name can be used to refer to the virtual relation that the view generates.

Example Queries

A view consisting of branches and their customers

create view all_customer as (select branch_name, customer_name from depositor, account where depositor.account_number =

40

Page 41: Dbms Complete Notes With Addons

account.account_number )

union (select branch_name, customer_name from borrower, loan where borrower.loan_number = loan.loan_number )

n Find all customers of the Perryridge branch

select customer_namefrom all_customerwhere branch_name = 'Perryridge'

• Uses of Views

• Hiding some information from some users

– Consider a user who needs to know a customer’s name, loan number and branch name, but has no need to see the loan amount.

– Define a view (create view cust_loan_data as select customer_name, borrower.loan_number, branch_name from borrower, loan where borrower.loan_number = loan.loan_number )

– Grant the user permission to read cust_loan_data, but not borrower or loan

• Predefined queries to make writing of other queries easier

– Common example: Aggregate queries used for statistical analysis of data

• Processing of Views

• When a view is created

– the query expression is stored in the database along with the view name

– the expression is substituted into any query using the view

• Views definitions containing views

– One view may be used in the expression defining another view

– A view relation v1 is said to depend directly on a view relation v2 if v2 is used in the expression defining v1

41

Page 42: Dbms Complete Notes With Addons

– A view relation v1 is said to depend on view relation v2 if either v1 depends directly to v2 or there is a path of dependencies from v1 to v2

– A view relation v is said to be recursive if it depends on itself.

• View Expansion

• A way to define the meaning of views defined in terms of other views.

• Let view v1 be defined by an expression e1 that may itself contain uses of view relations.

• View expansion of an expression repeats the following replacement step:

repeatFind any view relation vi in e1

Replace the view relation vi by the expression defining vi until no more view relations are present in e1

• As long as the view definitions are not recursive, this loop will terminate

• With Clause

• The with clause provides a way of defining a temporary view whose definition is available only to the query in which the with clause occurs.

• Find all accounts with the maximum balance

with max_balance (value) as select max (balance) from account select account_number from account, max_balance where account.balance = max_balance.value

Complex Queries using With Clause

Find all branches where the total account deposit is greater than the average of the total account deposits at all branches.

with branch_total (branch_name, value) as select branch_name, sum (balance) from account group by branch_name with branch_total_avg (value) as select avg (value)

42

Page 43: Dbms Complete Notes With Addons

from branch_total select branch_name from branch_total, branch_total_avg where branch_total.value >= branch_total_avg.value

• Note: the exact syntax supported by your database may vary slightly.

E.g. Oracle syntax is of the form

with branch_total as ( select .. ), branch_total_avg as ( select .. )select …

• Update of a View

• Create a view of all loan data in the loan relation, hiding the amount attribute

create view loan_branch asselect loan_number, branch_namefrom loan

• Add a new tuple to loan_branch

insert into loan_branchvalues ('L-37‘, 'Perryridge‘)

This insertion must be represented by the insertion of the tuple

('L-37', 'Perryridge', null )

into the loan relation

• Formal Relational Query Languages

• Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:

– Relational Algebra : More operational, very useful for representing execution plans.

– Relational Calculus : Lets users describe what they want, rather than how to compute it. (Non-operational, declarative.)

• Preliminaries

• A query is applied to relation instances, and the result of a query is also a relation instance.

43

Page 44: Dbms Complete Notes With Addons

– Schemas of input relations for a query are fixed (but query will run regardless of instance!)

– The schema for the result of a given query is also fixed! Determined by definition of query language constructs.

• Positional vs. named-field notation:

– Positional notation easier for formal definitions, named-field notation more readable.

– Both used in SQL

• Example Instances

• “Sailors” and “Reserves” relations for our examples.

• We’ll use positional or named field notation, assume that names of fields in query results are `inherited’ from names of fields in query input relations.

• Relational Algebra

• Basic operations:

– Selection ( ) Selects a subset of rows from relation.

– Projection ( ) Deletes unwanted columns from relation.

– Cross-product ( ) Allows us to combine two relations.

– Set-difference ( ) Tuples in reln. 1, but not in reln. 2.

44

Page 45: Dbms Complete Notes With Addons

– Union ( ) Tuples in reln. 1 and in reln. 2.

• Additional operations:

– Intersection, join, division, renaming: Not essential, but (very!) useful.

• Since each operation returns a relation, operations can be composed! (Algebra is “closed”.)

• Projection

• Deletes attributes that are not in projection list.

• Schema of result contains exactly the fields in the projection list, with the same names that they had in the (only) input relation.

• Projection operator has to eliminate duplicates! (Why??)

– Note: real systems typically don’t do duplicate elimination unless the user explicitly asks for it. (Why not?)

• Selection

• Selects rows that satisfy selection condition.

• No duplicates in result! (Why?)

• Schema of result identical to schema of (only) input relation.

• Result relation can be the input for another relational algebra operation! (Operator composition.)

45

Page 46: Dbms Complete Notes With Addons

Union, Intersection, Set-Difference

46

Page 47: Dbms Complete Notes With Addons

• Cross-Product

• Each row of S1 is paired with each row of R1.

• Result schema has one field per field of S1 and R1, with field names `inherited’ if possible.

– Conflict: Both S1 and R1 have a field called sid.

Renaming operator:

47

Page 48: Dbms Complete Notes With Addons

• Joins

• Equi-Join : A special case of condition join where the condition c contains only equalities.

• Result schema similar to cross-product, but only one copy of fields for which equality is specified.

Natural Join: Equijoin on all common fields.

• Division

• Not supported as a primitive operator, but useful for expressing queries like: Find sailors who have reserved all boats.

• Let A have 2 fields, x and y; B have only field y:

– A/B =

– i.e., A/B contains all x tuples (sailors) such that for every y tuple (boat) in B, there is an xy tuple in A.

– Or: If the set of y values (boats) associated with an x value (sailor) in A contains all y values in B, the x value is in A/B.

• In general, x and y can be any lists of fields; y is the list of fields in B, and x y is the list of fields of A.

48

Page 49: Dbms Complete Notes With Addons

• Expressing A/B Using Basic Operators

• Division is not essential op; just a useful shorthand.

– (Also true of joins, but joins are so common that systems implement joins specially.)

• Idea: For A/B, compute all x values that are not `disqualified’ by some y value in B.

– x value is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A.

Disqualified x values

A/B: all disqualified tuples

x A( ) Find names of sailors who’ve reserved boat #103

• Solution 1:

49

Page 50: Dbms Complete Notes With Addons

Solution 2:

( , Re )Temp serves

bid1

103

( , )Temp Temp Sailors2 1

sname Temp( )2

Solution 3:

sname bidserves Sailors( (Re ))

103

Find names of sailors who’ve reserved a red boat

• Information about boat color only available in Boats; so need an extra join:

A more efficient solution:

sname sid bid color redBoats s Sailors( ((

' ') Re ) )

A query optimizer can find this, given the first solution!

• Find sailors who’ve reserved a red or a green boat

• Can identify all red or green boats, then find sailors who’ve reserved one of these boats:

Can also define Tempboats using union! (How?)

What happens if is replaced by in this query?

• Find sailors who’ve reserved a red and a green boat

50

Page 51: Dbms Complete Notes With Addons

• Previous approach won’t work! Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (note that sid is a key for Sailors):

• Relational Calculus

• Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus (DRC).

• Calculus has variables, constants, comparison ops, logical connectives and quantifiers.

– TRC : Variables range over (i.e., get bound to) tuples.

– DRC : Variables range over domain elements (= field values).

– Both TRC and DRC are simple subsets of first-order logic.

• Expressions in the calculus are called formulas. An answer tuple is essentially an assignment of constants to variables that make the formula evaluate to true.

Domain Relational Calculus

• Query has the form:

Answer includes all tuples that make the formula be true.

Formula is recursively defined, starting with simple atomic formulas (getting tuples from relations or making comparisons of values), and building bigger and better formulas using the logical connectives.

DRC Formulas

• Atomic formula:

• , or X op Y, or X op constant

51

Page 52: Dbms Complete Notes With Addons

• op is one of

• Atomic formula:

– , or X op Y, or X op constant

– op is one of

• Formula:

– an atomic formula, or

– , where p and q are formulas, or

– , where variable X is free in p(X), or

– , where variable X is free in p(X)

• The use of quantifiers and is said to bind X.

– A variable that is not bound is free.

• Free and Bound Variables

• The use of quantifiers and in a formula is said to bind X.

– A variable that is not bound is free.

• Let us revisit the definition of a query:

There is an important restriction: the variables x1, ..., xn that appear to the left of `|’ must be the only free variables in the formula p(...).

Find all sailors with a rating above 7

52

Page 53: Dbms Complete Notes With Addons

• The condition ensures that the domain variables I, N, T and A are bound to fields of the same Sailors tuple.

• The term to the left of `|’ (which should be read as such that) says that every tuple that satisfies T>7 is in the answer.

• Modify this query to answer:

– Find sailors who are older than 18 or have a rating under 9, and are called ‘Joe’.

Find sailors rated > 7 who have reserved boat #103

• We have used as a shorthand for

• Note the use of to find a tuple in Reserves that `joins with’ the Sailors tuple under consideration.

Find sailors rated > 7 who’ve reserved a red boat

• Observe how the parentheses control the scope of each quantifier’s binding.

• This may look cumbersome, but with a good user interface, it is very intuitive. (MS Access, QBE)

Find sailors who’ve reserved all boats

• Find all sailors I such that for each 3-tuple either it is not a tuple in Boats or there is a tuple in Reserves showing that sailor I has reserved it.

53

Page 54: Dbms Complete Notes With Addons

Find sailors who’ve reserved all boats (again!)

• Simpler notation, same query. (Much clearer!)

• To find sailors who’ve reserved all red boats:

....

Unsafe Queries, Expressive Power

• It is possible to write syntactically correct calculus queries that have an infinite number of answers! Such queries are called unsafe.

e.g.,

• It is known that every query that can be expressed in relational algebra can be expressed as a safe query in DRC / TRC; the converse is also true.

• Relational Completeness : Query language (e.g., SQL) can express every query that is expressible in relational algebra/calculus.

54

Page 55: Dbms Complete Notes With Addons

UNIT-IV

1.The Form of a Basic SQL Queries

2. Query operations & NESTED Queries

3. NESTED Queries

4. Aggregate Operators

5. Null Values

6. Complex I.C in SQL-92

7. Triggers and Active Databases

8. Designing Active Databases

55

Page 56: Dbms Complete Notes With Addons

• History

• IBM Sequel language developed as part of System R project at the IBM San Jose Research Laboratory

• Renamed Structured Query Language (SQL)

• ANSI and ISO standard SQL:

– SQL-86

– SQL-89

– SQL-92

– SQL:1999 (language name became Y2K compliant!)

– SQL:2003

• Commercial systems offer most, if not all, SQL-92 features, plus varying feature sets from later standards and special proprietary features.

– Not all examples here may work on your particular system.

• Data Definition Language

• Allows the specification of:

• The schema for each relation, including attribute types.

• Integrity constraints

• Authorization information for each relation.

• Non-standard SQL extensions also allow specification of

– The set of indices to be maintained for each relations.

– The physical storage structure of each relation on disk.

• Create Table Construct

• An SQL relation is defined using the create table command:

create table r (A1 D1, A2 D2, ..., An Dn,(integrity-constraint1),...,(integrity-constraintk))

56

Page 57: Dbms Complete Notes With Addons

– r is the name of the relation

– each Ai is an attribute name in the schema of relation r

– Di is the data type of attribute Ai

Example:

create table branch(branch_name char(15),branch_city char(30),assets integer)

• Domain Types in SQL

• char(n). Fixed length character string, with user-specified length n.

• varchar(n). Variable length character strings, with user-specified maximum length n.

• int. Integer (a finite subset of the integers that is machine-dependent).

• smallint. Small integer (a machine-dependent subset of the integer domain type).

• numeric(p,d). Fixed point number, with user-specified precision of p digits, with n digits to the right of decimal point.

• real, double precision. Floating point and double-precision floating point numbers, with machine-dependent precision.

• float(n). Floating point number, with user-specified precision of at least n digits.

• More are covered in Chapter 4.

• Integrity Constraints on Tables

• not null

• primary key (A1, ..., An )

• Example: Declare branch_name as the primary key for branch

• .

• create table branch (branch_name char(15), branch_city char(30) not null, assets integer, primary key (branch_name))

57

Page 58: Dbms Complete Notes With Addons

• primary key declaration on an attribute automatically ensures not null in SQL-92 onwards, needs to be explicitly stated in SQL-89

• Basic Insertion and Deletion of Tuples

• Newly created table is empty

• Add a new tuple to account

insert into accountvalues ('A-9732', 'Perryridge', 1200)

– Insertion fails if any integrity constraint is violated

• Delete all tuples from account

delete from account

Note: Will see later how to delete selected tuples

• Drop and Alter Table Constructs

• The drop table command deletes all information about the dropped relation from the database.

• The alter table command is used to add attributes to an existing relation:

alter table r add A D

where A is the name of the attribute to be added to relation r and D is the domain of A.

– All tuples in the relation are assigned null as the value for the new attribute.

• The alter table command can also be used to drop attributes of a relation:

alter table r drop A

where A is the name of an attribute of relation r

– Dropping of attributes not supported by many databases

– Basic Query Structure

– A typical SQL query has the form:

select A1, A2, ..., An

from r1, r2, ..., rm

where P

58

Page 59: Dbms Complete Notes With Addons

– Ai represents an attribute

– Ri represents a relation

– P is a predicate.

– This query is equivalent to the relational algebra expression.

– The result of an SQL query is a relation.

• The select Clause

• The select clause list the attributes desired in the result of a query

– corresponds to the projection operation of the relational algebra

• Example: find the names of all branches in the loan relation:select branch_namefrom loan

• In the relational algebra, the query would be:

Õbranch_name (loan)

• NOTE: SQL names are case insensitive (i.e., you may use upper- or lower-case letters.)

– E.g. Branch_Name ≡ BRANCH_NAME ≡ branch_name

– Some people use upper case wherever we use bold font.

• SQL allows duplicates in relations as well as in query results.

• To force the elimination of duplicates, insert the keyword distinct after select.

• Find the names of all branches in the loan relations, and remove duplicates

select distinct branch_namefrom loan

• The keyword all specifies that duplicates not be removed.

select all branch_namefrom loan

• An asterisk in the select clause denotes “all attributes”

select *from loan

59

Page 60: Dbms Complete Notes With Addons

• The select clause can contain arithmetic expressions involving the operation, +, –, *, and /, and operating on constants or attributes of tuples.

• E.g.:

select loan_number, branch_name, amount * 100 from loan

• The where Clause

• The where clause specifies conditions that the result must satisfy

– Corresponds to the selection predicate of the relational algebra.

• To find all loan number for loans made at the Perryridge branch with loan amounts greater than $1200.

select loan_numberfrom loanwhere branch_name = 'Perryridge' and amount > 1200

• Comparison results can be combined using the logical connectives and, or, and not.

The from Clause

The from clause lists the relations involved in the query

Corresponds to the Cartesian product operation of the relational algebra.

Find the Cartesian product borrower X loan

select *from borrower, loan

n Find the name, loan number and loan amount of all customers having a loan at the Perryridge branch.

select customer_name, borrower.loan_number, amount from borrower, loan where borrower.loan_number = loan.loan_number and branch_name = 'Perryridge'

The Rename Operation

SQL allows renaming relations and attributes using the as clause:

old-name as new-name

60

Page 61: Dbms Complete Notes With Addons

E.g. Find the name, loan number and loan amount of all customers; rename the column name loan_number as loan_id.

select customer_name, borrower.loan_number as loan_id, amountfrom borrower, loanwhere borrower.loan_number = loan.loan_number

Tuple Variables

Tuple variables are defined in the from clause via the use of the as clause.

Find the customer names and their loan numbers and amount for all customers having a loan at some branch.

select customer_name, T.loan_number, S.amount from borrower as T, loan as S where T.loan_number = S.loan_number

n Find the names of all branches that have greater assets than some branch located in Brooklyn.

select distinct T.branch_name from branch as T, branch as S where T.assets > S.assets and S.branch_city = 'Brooklyn'

n Keyword as is optional and may be omitted borrower as T ≡ borrower T

n Some database such as Oracle require as to be omitted

n Example Instances

n We will use these instances of the Sailors and Reserves relations in our examples.

n If the key for the Reserves relation contained only the attributes sid and bid, how would the semantics differ?

61

Page 62: Dbms Complete Notes With Addons

• Basic SQL Query

• SELECT [DISTINCT] target-list

• FROM relation-list

• WHERE qualification

• relation-list A list of relation names (possibly with a range-variable after each name).

• target-list A list of attributes of relations in relation-list

• qualification Comparisons (Attr op const or Attr1 op Attr2, where op is one of ) combined using AND, OR and NOT.

• DISTINCT is an optional keyword indicating that the answer should not contain duplicates. Default is that duplicates are not eliminated!

• Conceptual Evaluation Strategy

• Semantics of an SQL query defined in terms of the following conceptual evaluation strategy:

• Compute the cross-product of relation-list.

• Discard resulting tuples if they fail qualifications.

• Delete attributes that are not in target-list.

• If DISTINCT is specified, eliminate duplicate rows.

62

Page 63: Dbms Complete Notes With Addons

• This strategy is probably the least efficient way to compute a query! An optimizer will find more efficient strategies to compute the same answers.

• Example of Conceptual Evaluation

• SELECT S.sname

• FROM Sailors S, Reserves R

• WHERE S.sid=R.sid AND R.bid=103

A Note on Range Variables

Really needed only if the same relation appears twice in the FROM clause. The previous query can also be written as:

SELECT S.sname

FROM Sailors S, Reserves R

WHERE S.sid=R.sid AND bid=103 OR

SELECT sname

FROM Sailors, Reserves

WHERE Sailors.sid=Reserves.sid

AND bid=103

It is good style,

however, to use

range variables

always!

63

Page 64: Dbms Complete Notes With Addons

• Find sailors who’ve reserved at least one boat

• SELECT S.sid

• FROM Sailors S, Reserves R

• WHERE S.sid=R.sid

• Would adding DISTINCT to this query make a difference?

• What is the effect of replacing S.sid by S.sname in the SELECT clause? Would adding DISTINCT to this variant of the query make a difference?

• Expressions and Strings

• SELECT S.age, age1=S.age-5, 2*S.age AS age2

• FROM Sailors S

• WHERE S.sname LIKE ‘B_%B’

• Illustrates use of arithmetic expressions and string pattern matching: Find triples (of ages of sailors and two fields defined by expressions) for sailors whose names begin and end with B and contain at least three characters.

• AS and = are two ways to name fields in result.

• LIKE is used for string matching. `_’ stands for any one character and `%’ stands for 0 or more arbitrary characters.

• String Operations

• SQL includes a string-matching operator for comparisons on character strings. The operator “like” uses patterns that are described using two special characters:

– percent (%). The % character matches any substring.

– underscore (_). The _ character matches any character.

• Find the names of all customers whose street includes the substring “Main”.

select customer_namefrom customerwhere customer_street like '% Main%'

• Match the name “Main%”

like 'Main\%' escape '\'

64

Page 65: Dbms Complete Notes With Addons

• SQL supports a variety of string operations such as

– concatenation (using “||”)

– converting from upper to lower case (and vice versa)

– finding string length, extracting substrings, etc.

• Ordering the Display of Tuples

• List in alphabetic order the names of all customers having a loan in Perryridge branch

select distinct customer_namefrom borrower, loanwhere borrower loan_number = loan.loan_number and branch_name = 'Perryridge' order by customer_name

• We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default.

– Example: order by customer_name desc

• Duplicates

• In relations with duplicates, SQL can define how many copies of tuples appear in the result.

• Multiset versions of some of the relational algebra operators – given multiset relations r1 and r2:

1. q (r1): If there are c1 copies of tuple t1 in r1, and t1 satisfies selections q,, then there are c1 copies of t1 in q (r1).

2. PA (r ): For each copy of tuple t1 in r1, there is a copy of tuple PA (t1) in PA (r1) where PA (t1) denotes the projection of the single tuple t1.

3. r1 x r2 : If there are c1 copies of tuple t1 in r1 and c2 copies of tuple t2 in r2, there are c1 x c2 copies of the tuple t1. t2 in r1 x r2

• Example: Suppose multiset relations r1 (A, B) and r2 (C) are as follows:

r1 = {(1, a) (2,a)} r2 = {(2), (3), (3)}

• Then PB(r1) would be {(a), (a)}, while PB(r1) x r2 would be

{(a,2), (a,2), (a,3), (a,3), (a,3), (a,3)}

65

Page 66: Dbms Complete Notes With Addons

• SQL duplicate semantics:

select A1,, A2, ..., An

from r1, r2, ..., rm

where P

is equivalent to the multiset version of the expression:

• Set Operations

• The set operations union, intersect, and except operate on relations and correspond to the relational algebra operations È, Ç, .

• Each of the above operations automatically eliminates duplicates; to retain all duplicates use the corresponding multiset versions union all, intersect all and except all.

Suppose a tuple occurs m times in r and n times in s, then, it occurs:

– m + n times in r union all s

– min(m,n) times in r intersect all s

– max(0, m – n) times in r except all s

• Set Operations

• Find all customers who have a loan, an account, or both:

• (select customer_name from depositor)union(select customer_name from borrower)

• Find all customers who have both a loan and an account

• (select customer_name from depositor)intersect(select customer_name from borrower

• Find all customers who have an account but no loan

• (select customer_name from depositor)except(select customer_name from borrower)

66

Page 67: Dbms Complete Notes With Addons

UNIT-VI

1. Transaction concept & State

2. Implementation of atomicity and durability

3. Serializability

4. Recoverability

5. Implementation of isolation

6. Lock based protocols

7. Lock based protocols

8. Timestamp based protocols

9. Validation based protocol

67

Page 68: Dbms Complete Notes With Addons

Transaction Concept

• A transaction is a unit of program execution that accesses and possibly updates various data items.

• E.g. transaction to transfer $50 from account A to account B:

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)

• Two main issues to deal with:

– Failures of various kinds, such as hardware failures and system crashes

– Concurrent execution of multiple transactions

Example of Fund Transfer

• Transaction to transfer $50 from account A to account B:

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)

• Atomicity requirement

– if the transaction fails after step 3 and before step 6, money will be “lost” leading to an inconsistent database state

68

Page 69: Dbms Complete Notes With Addons

• Failure could be due to software or hardware

– the system should ensure that updates of a partially executed transaction are not reflected in the database

• Durability requirement — once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist even if there are software or hardware failures.

• Transaction to transfer $50 from account A to account B:

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)

• Consistency requirement in above example:

– the sum of A and B is unchanged by the execution of the transaction

• In general, consistency requirements include

• Explicitly specified integrity constraints such as primary keys and foreign keys

• Implicit integrity constraints

– e.g. sum of balances of all accounts, minus sum of loan amounts must equal value of cash-in-hand

– A transaction must see a consistent database.

– During transaction execution the database may be temporarily inconsistent.

– When the transaction completes successfully the database must be consistent

69

Page 70: Dbms Complete Notes With Addons

• Erroneous transaction logic can lead to inconsistency

• Isolation requirement — if between steps 3 and 6, another transaction T2 is allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be). T1 T2

1. read(A)

2. A := A – 50

3. write(A) read(A), read(B), print(A+B)

4. read(B)

5. B := B + 50

6. write(B

• Isolation can be ensured trivially by running transactions serially

– that is, one after the other.

• However, executing multiple transactions concurrently has significant benefits, as we will see later.

ACID Properties

A transaction is a unit of program execution that accesses and possibly updates various data items.To preserve the integrity of data the database system must ensure:

• Atomicity. Either all operations of the transaction are properly reflected in the database or none are.

• Consistency. Execution of a transaction in isolation preserves the consistency of the database.

70

Page 71: Dbms Complete Notes With Addons

• Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions.

– That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished execution before Ti started, or Tj started execution after Ti finished.

• Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.

Transaction State

• Active – the initial state; the transaction stays in this state while it is executing

• Partially committed – after the final statement has been executed.

• Failed -- after the discovery that normal execution can no longer proceed.

• Aborted – after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted:

– restart the transaction

• can be done only if no internal logical error

– kill the transaction

• Committed – after successful completion.

71

Page 72: Dbms Complete Notes With Addons

Implementation of Atomicity and Durability

• The recovery-management component of a database system implements the support for atomicity and durability.

• E.g. the shadow-database scheme:

– all updates are made on a shadow copy of the database

• db_pointer is made to point to the updated shadow copy after

– the transaction reaches partial commit and

– all updated pages have been flushed to disk.

72

Page 73: Dbms Complete Notes With Addons

• db_pointer always points to the current consistent copy of the database.

– In case transaction fails, old consistent copy pointed to by db_pointer can be used, and the shadow copy can be deleted.

• The shadow-database scheme:

– Assumes that only one transaction is active at a time.

– Assumes disks do not fail

– Useful for text editors, but

• extremely inefficient for large databases (why?)

– Variant called shadow paging reduces copying of data, but is still not practical for large databases

– Does not handle concurrent transactions

• Will study better schemes in Chapter 17.

Concurrent Executions

• Multiple transactions are allowed to run concurrently in the system. Advantages are:

73

Page 74: Dbms Complete Notes With Addons

– increased processor and disk utilization, leading to better transaction throughput

• E.g. one transaction can be using the CPU while another is reading from or writing to the disk

– reduced average response time for transactions: short transactions need not wait behind long ones.

• Concurrency control schemes – mechanisms to achieve isolation

– that is, to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database

Will study in Chapter 16, after studying notion of correctness of concurrent executions

Schedules

• Schedule – a sequences of instructions that specify the chronological order in which instructions of concurrent transactions are executed

– a schedule for a set of transactions must consist of all instructions of those transactions

– must preserve the order in which the instructions appear in each individual transaction.

• A transaction that successfully completes its execution will have a commit instructions as the last statement

– by default transaction assumed to execute commit instruction as its last step

• A transaction that fails to successfully complete its execution will have an abort instruction as the last statement

Schedule 1

• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.

• A serial schedule in which T1 is followed by T2 :

74

Page 75: Dbms Complete Notes With Addons

Schedule 2

• A serial schedule where T2 is followed by T1

Schedule 3

• Let T1 and T2 be the transactions defined previously. The following schedule is not a serial schedule, but it is equivalent to Schedule 1.

75

Page 76: Dbms Complete Notes With Addons

In Schedules 1, 2 and 3, the sum A + B is preserved.

Schedule 4

The following concurrent schedule does not preserve the value of (A + B ).

Serializability

• Basic Assumption – Each transaction preserves database consistency.

• Thus serial execution of a set of transactions preserves database consistency.

• A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of:

1. conflict serializability

76

Page 77: Dbms Complete Notes With Addons

2. view serializability

• Simplified view of transactions

– We ignore operations other than read and write instructions

– We assume that transactions may perform arbitrary computations on data in local buffers in between reads and writes.

– Our simplified schedules consist of only read and write instructions.

Conflicting Instructions

• Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q.

1. li = read(Q), lj = read(Q). li and lj don’t conflict. 2. li = read(Q), lj = write(Q). They conflict. 3. li = write(Q), lj = read(Q). They conflict 4. li = write(Q), lj = write(Q). They conflict

• Intuitively, a conflict between li and lj forces a (logical) temporal order between them.

– If li and lj are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule.

Conflict Serializability

• If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent.

• We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule

View Serializability

• Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the following three conditions are met, for each data item Q,

77

Page 78: Dbms Complete Notes With Addons

1. If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’ also transaction Ti must read the initial value of Q.

2. If in schedule S transaction Ti executes read(Q), and that value was produced by transaction Tj (if any), then in schedule S’ also transaction Ti must read the value of Q that was produced by the same write(Q) operation of transaction Tj .

3. The transaction (if any) that performs the final write(Q) operation in schedule S must also perform the final write(Q) operation in schedule S’.

As can be seen, view equivalence is also based purely on reads and writes alone.

• A schedule S is view serializable if it is view equivalent to a serial schedule.

• Every conflict serializable schedule is also view serializable.

• Below is a schedule which is view-serializable but not conflict serializable.

• What serial schedule is above equivalent to?

• Every view serializable schedule that is not conflict serializable has blind writes.

Other Notions of Serializability

• The schedule below produces same outcome as the serial schedule < T1, T5 >, yet is not conflict equivalent or view equivalent to it.

78

Page 79: Dbms Complete Notes With Addons

Determining such equivalence requires analysis of operations other than read and write.

Recoverable Schedules

Need to address the effect of transaction failures on concurrently running transactions.

• Recoverable schedule — if a transaction Tj reads a data item previously written by a transaction Ti , then the commit operation of Ti appears before the commit operation of Tj.

The following schedule (Schedule 11) is not recoverable if T9 commits immediately after the read

• If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent database state. Hence, database must ensure that schedules are recoverable.

Cascading Rollbacks

79

Page 80: Dbms Complete Notes With Addons

Cascading rollback – a single transaction failure leads to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable)

• If T10 fails, T11 and T12 must also be rolled back.

• Can lead to the undoing of a significant amount of work

Cascadeless Schedules

• Cascadeless schedules — cascading rollbacks cannot occur; for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the read operation of Tj.

• Every cascadeless schedule is also recoverable

• It is desirable to restrict the schedules to those that are cascadeless

Concurrency Control

• A database must provide a mechanism that will ensure that all possible schedules are

– either conflict or view serializable, and

– are recoverable and preferably cascadeless

• A policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency

– Are serial schedules recoverable/cascadeless?

• Testing a schedule for serializability after it has executed is a little too late!

• Goal – to develop concurrency control protocols that will assure serializability.

80

Page 81: Dbms Complete Notes With Addons

Concurrency Control vs. Serializability Tests

• Concurrency-control protocols allow concurrent schedules, but ensure that the schedules are conflict/view serializable, and are recoverable and cascadeless .

• Concurrency control protocols generally do not examine the precedence graph as it is being created

– Instead a protocol imposes a discipline that avoids nonseralizable schedules.

– We study such protocols in Chapter 16.

• Different concurrency control protocols provide different tradeoffs between the amount of concurrency they allow and the amount of overhead that they incur.

• Tests for serializability help us understand why a concurrency control protocol is correct.

Weak Levels of Consistency

• Some applications are willing to live with weak levels of consistency, allowing schedules that are not serializable

– E.g. a read-only transaction that wants to get an approximate total balance of all accounts

– E.g. database statistics computed for query optimization can be approximate (why?)

– Such transactions need not be serializable with respect to other transactions

• Tradeoff accuracy for performance

Levels of Consistency in SQL-92

• Serializable — default

• Repeatable read — only committed records to be read, repeated reads of same record must return same value. However, a transaction may not be serializable – it may find some records inserted by a transaction but not find others.

• Read committed — only committed records can be read, but successive reads of record may return different (but committed) values.

• Read uncommitted — even uncommitted records may be read.

81

Page 82: Dbms Complete Notes With Addons

• Lower degrees of consistency useful for gathering approximateinformation about the database

• Warning: some database systems do not ensure serializable schedules by default

• E.g. Oracle and PostgreSQL by default support a level of consistency called snapshot isolation (not part of the SQL standard)

Transaction Definition in SQL

• Data manipulation language must include a construct for specifying the set of actions that comprise a transaction.

• In SQL, a transaction begins implicitly.

• A transaction in SQL ends by:

– Commit work commits current transaction and begins a new one.

– Rollback work causes current transaction to abort.

• In almost all database systems, by default, every SQL statement also commits implicitly if it executes successfully

– Implicit commit can be turned off by a database directive

• E.g. in JDBC, connection.setAutoCommit(false);

Implementation of Isolation

• Schedules must be conflict or view serializable, and recoverable, for the sake of database consistency, and preferably cascadeless.

• A policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency.

• Concurrency-control schemes tradeoff between the amount of concurrency they allow and the amount of overhead that they incur.

• Some schemes allow only conflict-serializable schedules to be generated, while others allow view-serializable schedules that are not conflict-serializable.

82

Page 83: Dbms Complete Notes With Addons

Figure 15.6

Testing for Serializability

• Consider some schedule of a set of transactions T1, T2, ..., Tn

• Precedence graph — a direct graph where the vertices are the transactions (names).

• We draw an arc from Ti to Tj if the two transaction conflict, and Ti accessed the data item on which the conflict arose earlier.

• We may label the arc by the item that was accessed.

• Example 1

Example Schedule (Schedule A) + Precedence Graph

83

Page 84: Dbms Complete Notes With Addons

T1 T2 T3 T4 T5

read(X)

read(Y)read(Z)

read(V) read(W) read(W)

read(Y)write(Y)

write(Z)

read(U)

read(Y)write(Y)read(Z)write(Z

read(U)write(U)

T5

Test for Conflict Serializability

• A schedule is conflict serializable if and only if its precedence graph is acyclic.

• Cycle-detection algorithms exist which take order n2 time, where n is the number of vertices in the graph.

84

Page 85: Dbms Complete Notes With Addons

– (Better algorithms take order n + e where e is the number of edges.)

• If precedence graph is acyclic, the serializability order can be obtained by a topological sorting of the graph.

– This is a linear order consistent with the partial order of the graph.

– For example, a serializability order for Schedule A would beT5 ® T1 ® T3 ® T2 ® T4

• Are there others?

Test for View Serializability

• The precedence graph test for conflict serializability cannot be used directly to test for view serializability.

– Extension to test for view serializability has cost exponential in the size of the precedence graph.

• The problem of checking if a schedule is view serializable falls in the class of NP-complete problems.

– Thus existence of an efficient algorithm is extremely unlikely.

• However practical algorithms that just check some sufficient conditions for view serializability can still be used.

Lock-Based Protocols

• A lock is a mechanism to control concurrent access to a data item

85

Page 86: Dbms Complete Notes With Addons

• Data items can be locked in two modes :

1. exclusive (X) mode. Data item can be both read as well as

written. X-lock is requested using lock-X instruction.

2. shared (S) mode. Data item can only be read. S-lock is

requested using lock-S instruction.

• Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted.

• Lock-compatibility matrix

• A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on the item by other transactions

• Any number of transactions can hold shared locks on an item,

– but if any transaction holds an exclusive on the item no other transaction may hold any lock on the item.

• If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other transactions have been released. The lock is then granted.

• Example of a transaction performing locking:

T2: lock-S(A);

read (A);

unlock(A);

lock-S(B);

86

Page 87: Dbms Complete Notes With Addons

read (B);

unlock(B);

display(A+B)

• Locking as above is not sufficient to guarantee serializability — if A and B get updated in-between the read of A and B, the displayed sum would be wrong.

• A locking protocol is a set of rules followed by all transactions while requesting and releasing locks. Locking protocols restrict the set of possible schedules.

Pitfalls of Lock-Based Protocols

• Consider the partial schedule

• Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to release its lock on A.

• Such a situation is called a deadlock.

– To handle a deadlock one of T3 or T4 must be rolled back and its locks released.

• The potential for deadlock exists in most locking protocols. Deadlocks are a necessary evil.

• Starvation is also possible if concurrency control manager is badly designed. For example:

87

Page 88: Dbms Complete Notes With Addons

– A transaction may be waiting for an X-lock on an item, while a sequence of other transactions request and are granted an S-lock on the same item.

– The same transaction is repeatedly rolled back due to deadlocks.

• Concurrency control manager can be designed to prevent starvation.

The Two-Phase Locking Protocol

• This is a protocol which ensures conflict-serializable schedules.

• Phase 1: Growing Phase

– transaction may obtain locks

– transaction may not release locks

• Phase 2: Shrinking Phase

– transaction may release locks

– transaction may not obtain locks

• The protocol assures serializability. It can be proved that the transactions can be serialized in the order of their lock points (i.e. the point where a transaction acquired its final lock).

• Two-phase locking does not ensure freedom from deadlocks

• Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified protocol called strict two-phase locking. Here a transaction must hold all its exclusive locks till it commits/aborts.

• Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit.

• There can be conflict serializable schedules that cannot be obtained if two-phase locking is used.

• However, in the absence of extra information (e.g., ordering of access to data), two-phase locking is needed for conflict serializability in the following sense:

88

Page 89: Dbms Complete Notes With Addons

Given a transaction Ti that does not follow two-phase locking, we can find a transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not conflict serializable

Lock Conversions

• Two-phase locking with lock conversions:

– First Phase:

– can acquire a lock-S on item

– can acquire a lock-X on item

– can convert a lock-S to a lock-X (upgrade)

– Second Phase:

– can release a lock-S

– can release a lock-X

– can convert a lock-X to a lock-S (downgrade)

• This protocol assures serializability. But still relies on the programmer to insert the various locking instructions.

Automatic Acquisition of Locks

• A transaction Ti issues the standard read/write instruction, without explicit locking calls.

• The operation read(D) is processed as:

if Ti has a lock on D

then

read(D)

else begin

if necessary wait until no other

transaction has a lock-X on D

grant Ti a lock-S on D;

89

Page 90: Dbms Complete Notes With Addons

read(D)

end

• write(D) is processed as:

if Ti has a lock-X on D

then

write(D)

else begin

if necessary wait until no other trans. has any lock on D,

if Ti has a lock-S on D

then

upgrade lock on D to lock-X

else

grant Ti a lock-X on D

write(D)

end;

• All locks are released after commit or abort

Implementation of Locking

• A lock manager can be implemented as a separate process to which transactions send lock and unlock requests

• The lock manager replies to a lock request by sending a lock grant messages (or a message asking the transaction to roll back, in case of a deadlock)

• The requesting transaction waits until its request is answered

• The lock manager maintains a data-structure called a lock table to record granted locks and pending requests

• The lock table is usually implemented as an in-memory hash table indexed on the name of the data item being locked

90

Page 91: Dbms Complete Notes With Addons

Lock Table

• Black rectangles indicate granted locks, white ones indicate waiting requests

• Lock table also records the type of lock granted or requested

• New request is added to the end of the queue of requests for the data item, and granted if it is compatible with all earlier locks

• Unlock requests result in the request being deleted, and later requests are checked to see if they can now be granted

• If transaction aborts, all waiting or granted requests of the transaction are deleted

– lock manager may keep a list of locks held by each transaction, to implement this efficiently

• Black rectangles indicate granted locks, white ones indicate waiting requests

• Lock table also records the type of lock granted or requested

• New request is added to the end of the queue of requests for the data item, and granted if it is compatible with all earlier locks

• Unlock requests result in the request being deleted, and later requests are checked to see if they can now be granted

• If transaction aborts, all waiting or granted requests of the transaction are deleted

– lock manager may keep a list of locks held by each transaction, to implement this efficiently

91

Page 92: Dbms Complete Notes With Addons

Graph-Based Protocols

• Graph-based protocols are an alternative to two-phase locking

• Impose a partial ordering ® on the set D = {d1, d2 ,..., dh} of all data items.

– If di ® dj then any transaction accessing both di and dj must access di before accessing dj.

– Implies that the set D may now be viewed as a directed acyclic graph, called a database graph.

• The tree-protocol is a simple kind of graph protocol.

Tree Protocol

1. Only exclusive locks are allowed.

2. The first lock by Ti may be on any data item. Subsequently, a data Q can be locked by Ti only if the parent of Q is currently locked by Ti.

3. Data items may be unlocked at any time.

4. A data item that has been locked and unlocked by Ti cannot subsequently be relocked by Ti

Timestamp-Based Protocols

92

Page 93: Dbms Complete Notes With Addons

• Each transaction is issued a timestamp when it enters the system. If an old transaction Ti has time-stamp TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti) <TS(Tj).

• The protocol manages concurrent execution such that the time-stamps determine the serializability order.

• In order to assure such behavior, the protocol maintains for each data Q two timestamp values:

– W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q) successfully.

– R-timestamp(Q) is the largest time-stamp of any transaction that executed read(Q) successfully.

• The timestamp ordering protocol ensures that any conflicting read and write operations are executed in timestamp order.

• Suppose a transaction Ti issues a read(Q)

1. If TS(Ti) £ W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten.

n Hence, the read operation is rejected, and Ti is rolled back.

2. If TS(Ti)³ W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is set to max(R-timestamp(Q), TS(Ti)).

• Suppose that transaction Ti issues write(Q).

1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the system assumed that that value would never be produced.

n Hence, the write operation is rejected, and Ti is rolled back.

2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.

n Hence, this write operation is rejected, and Ti is rolled back.

93

Page 94: Dbms Complete Notes With Addons

3. Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).

Example Use of the Protocol

A partial schedule for several data items for transactions with

timestamps 1, 2, 3, 4, 5

read(Y) read(X)

read(Z)

read(Y)

write(Y)

write(Z)

read(X)

abort

read(X)

write(Z)

abort

write(Y)

write(Z))

Correctness of Timestamp-Ordering Protocol

• The timestamp-ordering protocol guarantees serializability since all the arcs in the precedence graph are of the form:

Thus, there will be no cycles in the precedence graph

• Timestamp protocol ensures freedom from deadlock as no transaction ever waits.

94

Page 95: Dbms Complete Notes With Addons

• But the schedule may not be cascade-free, and may not even be recoverable.

Thomas’ Write Rule

• Modified version of the timestamp-ordering protocol in which obsolete write operations may be ignored under certain circumstances.

• When Ti attempts to write data item Q, if TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of {Q}.

– Rather than rolling back Ti as the timestamp ordering protocol would have done, this {write} operation can be ignored.

• Otherwise this protocol is the same as the timestamp ordering protocol.

• Thomas' Write Rule allows greater potential concurrency.

– Allows some view-serializable schedules that are not conflict-serializable.

Validation-Based Protocol

• Execution of transaction Ti is done in three phases.

1. Read and execution phase: Transaction Ti writes only to

temporary local variables

2. Validation phase: Transaction Ti performs a ``validation test''

to determine if local variables can be written without violating

serializability.

3. Write phase: If Ti is validated, the updates are applied to the

database; otherwise, Ti is rolled back.

• The three phases of concurrently executing transactions can be interleaved, but each transaction must go through the three phases in that order.

– Assume for simplicity that the validation and write phase occur together, atomically and serially

• I.e., only one transaction executes validation/write at a time.

95

Page 96: Dbms Complete Notes With Addons

• Also called as optimistic concurrency control since transaction executes fully in the hope that all will go well during validation

• Each transaction Ti has 3 timestamps

– Start(Ti) : the time when Ti started its execution

– Validation(Ti): the time when Ti entered its validation phase

– Finish(Ti) : the time when Ti finished its write phase

• Serializability order is determined by timestamp given at validation time, to increase concurrency.

– Thus TS(Ti) is given the value of Validation(Ti).

• This protocol is useful and gives greater degree of concurrency if probability of conflicts is low.

– because the serializability order is not pre-decided, and

– relatively few transactions will have to be rolled back.

Validation Test for Transaction Tj

• If for all Ti with TS (Ti) < TS (Tj) either one of the following condition holds:

– finish(Ti) < start(Tj)

– start(Tj) < finish(Ti) < validation(Tj) and the set of data items written by Ti does not intersect with the set of data items read by Tj.

then validation succeeds and Tj can be committed. Otherwise, validation fails and Tj is aborted.

• Justification: Either the first condition is satisfied, and there is no overlapped execution, or the second condition is satisfied and

n the writes of Tj do not affect reads of Ti since they occur after Ti has finished its reads.

n the writes of Ti do not affect reads of Tj since Tj does not read any item written by Ti.

Schedule Produced by Validation

96

Page 97: Dbms Complete Notes With Addons

• Example of schedule produced using validation

read(B) read(B)

B:= B-50

read(A)

A:= A+50

read(A)

(validate)

display (A+B)

(validate)

write (B)

write (A)

Multiple Granularity

• Allow data items to be of various sizes and define a hierarchy of data granularities, where the small granularities are nested within larger ones

• Can be represented graphically as a tree (but don't confuse with tree-locking protocol)

• When a transaction locks a node in the tree explicitly, it implicitly locks all the node's descendents in the same mode.

• Granularity of locking (level in tree where locking is done):

– fine granularity (lower in tree): high concurrency, high locking overhead

– coarse granularity (higher in tree): low locking overhead, low concurrency

Example of Granularity Hierarchy

97

Page 98: Dbms Complete Notes With Addons

The levels, starting from the coarsest (top) level are

– database

– area

– file

record

Intention Lock Modes

• In addition to S and X lock modes, there are three additional lock modes with multiple granularity:

– intention-shared (IS): indicates explicit locking at a lower level of the tree but only with shared locks.

– intention-exclusive (IX): indicates explicit locking at a lower level with exclusive or shared locks

– shared and intention-exclusive (SIX): the subtree rooted by that node is locked explicitly in shared mode and explicit locking is being done at a lower level with exclusive-mode locks.

• intention locks allow a higher level node to be locked in S or X mode without having to check all descendent nodes.

Compatibility Matrix with Intention Lock Modes

• The compatibility matrix for all lock modes is:

98

Page 99: Dbms Complete Notes With Addons

IS IX S S IX X

IS ü ü ü ü ´

IX ü ü ´ ´ ´

S ü ´ ü ´ ´

S IX ü ´ ´ ´ ´

X ´ ´ ´ ´ ´

Multiple Granularity Locking Scheme

• Transaction Ti can lock a node Q, using the following rules:

1. The lock compatibility matrix must be observed.

2. The root of the tree must be locked first, and may be locked in any mode.

3. A node Q can be locked by Ti in S or IS mode only if the parent of Q is currently locked by Ti in either IX or IS mode.

4. A node Q can be locked by Ti in X, SIX, or IX mode only if the parent of Q is currently locked by Ti in either IX or SIX mode.

5. Ti can lock a node only if it has not previously unlocked any node (that is, Ti is two-phase).

6. Ti can unlock a node Q only if none of the children of Q are currently locked by Ti.

• Observe that locks are acquired in root-to-leaf order, whereas they are released in leaf-to-root order.

99

Page 100: Dbms Complete Notes With Addons

UNIT-VIII

1. Data on external storage &

File organization and indexing

2. Index data structures

3. Comparison of file organizations

4. Comparison of file organizations

5. Indexes and performance tuning

6. Indexes and performance tuning

7. Intuition for tree indexes & ISAM

8. B+ tree

100

Page 101: Dbms Complete Notes With Addons

Data on External Storage

• Disks: Can retrieve random page at fixed cost

– But reading several consecutive pages is much cheaper than reading them in random order

• Tapes: Can only read pages in sequence

– Cheaper than disks; used for archival storage

• File organization: Method of arranging a file of records on external storage.

– Record id (rid) is sufficient to physically locate record

– Indexes are data structures that allow us to find the record ids of records with given values in index search key fields

• Architecture: Buffer manager stages pages from external storage to main memory buffer pool. File and index layers make calls to the buffer manager.

101

Page 102: Dbms Complete Notes With Addons

Alternative File Organizations

Many alternatives exist, each ideal for some situations, and not so good in others:

– Heap (random order) files: Suitable when typical access is a file scan retrieving all records.

– Sorted Files: Best if records must be retrieved in some order, or only a `range’ of records is needed.

– Indexes: Data structures to organize records via trees or hashing.

• Like sorted files, they speed up searches for a subset of records, based on values in certain (“search key”) fields

• Updates are much faster than in sorted files.

Index Classification

• Primary vs. secondary: If search key contains primary key, then called primary index.

– Unique index: Search key contains a candidate key.

• Clustered vs. unclustered: If order of data records is the same as, or `close to’, order of

data entries, then called clustered index.

– Alternative 1 implies clustered; in practice, clustered also implies Alternative 1 (since sorted files are rare).

– A file can be clustered on at most one search key.

– Cost of retrieving data records through index varies greatly based on whether index is clustered or not!

Clustered vs. Unclustered Index

• Suppose that Alternative (2) is used for data entries, and that the data records are stored in a Heap file.

– To build clustered index, first sort the Heap file (with some free space on each page for future inserts).

– Overflow pages may be needed for inserts. (Thus, order of data recs is `close to’, but not identical to, the sort order.)

102

Page 103: Dbms Complete Notes With Addons

Index entries

direct search for

CLUSTERED data entries

Data entries

Data Records

(Index File) (Data file)

UNCLUSTERED

103

Page 104: Dbms Complete Notes With Addons

Data entries

Data Records

Indexes

• An index on a file speeds up selections on the search key fields for the index.

– Any subset of the fields of a relation can be the search key for an index on the relation.

– Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation).

• An index contains a collection of data entries, and supports efficient retrieval of all data

entries k* with a given key value k.

– Given data entry k*, we can find record with key k in at most one disk I/O. (Details

soon …)B+ Tree Indexes

Pages

(Sorted by search key)

Leaf pages contain data entries, and are chained (prev & next)

Non-leaf pages have index entries; only used to direct searches:

index entry

104

Page 105: Dbms Complete Notes With Addons

Example B+ Tree

Hash-Based Indexes

105

Page 106: Dbms Complete Notes With Addons

Alternatives for Data Entry k* in Index

• In a data entry k* we can store:

– Data record with key value k, or

– <k, rid of data record with search key value k>, or

– <k, list of rids of data records with search key k>

• Choice of alternative for data entries is orthogonal to the indexing technique used to locate data entries with a given key value k.

– Examples of indexing techniques: B+ trees, hash-based structures

– Typically, index contains auxiliary information that directs searches to the desired data entries

• In a data entry k* we can store:

– Data record with key value k, or

– <k, rid of data record with search key value k>, or

– <k, list of rids of data records with search key k>

106

Page 107: Dbms Complete Notes With Addons

• Choice of alternative for data entries is orthogonal to the indexing technique used to locate data entries with a given key value k.

– Examples of indexing techniques: B+ trees, hash-based structures

– Typically, index contains auxiliary information that directs searches to the desired data entries

• Alternatives 2 and 3:

– Data entries typically much smaller than data records. So, better than Alternative 1 with large data records, especially if search keys are small. (Portion of index structure used to direct search, which depends on size of data entries, is much smaller than with Alternative 1.)

– Alternative 3 more compact than Alternative 2, but leads to variable sized data entries even if search keys are of fixed length.

Cost Model for Our Analysis

We ignore CPU costs, for simplicity:

– B: The number of data pages

– R: Number of records per page

– D: (Average) time to read or write disk page

– Measuring number of page I/O’s ignores gains of pre-fetching a sequence of pages; thus, even I/O cost is only approximated.

– Average-case analysis; based on several simplistic assumptions.

Comparing File Organizations

• Heap files (random order; insert at eof)

• Sorted files, sorted on <age, sal>

• Clustered B+ tree file, Alternative (1), search key <age, sal>

• Heap file with unclustered B + tree index on search key <age, sal>

• Heap file with unclustered hash index on search key <age, sal>

Operations to Compare

107

Page 108: Dbms Complete Notes With Addons

• Scan: Fetch all records from disk

• Equality search

• Range selection

• Insert a record

• Delete a record

Assumptions in Our Analysis

• Heap Files:

– Equality selection on key; exactly one match.

• Sorted Files:

– Files compacted after deletions.

• Indexes:

– Alt (2), (3): data entry size = 10% size of record

– Hash: No overflow buckets.

• 80% page occupancy => File size = 1.25 data size

– Tree: 67% occupancy (this is typical).

• Implies file size = 1.5 data size

• Scans:

– Leaf levels of a tree-index are chained.

– Index data-entries plus actual file scanned for unclustered indexes.

• Range searches:

– We use tree indexes to restrict the set of data records fetched, but ignore hash indexes.

Cost of Operations

108

Page 109: Dbms Complete Notes With Addons

Understanding the Workload

• For each query in the workload:

– Which relations does it access?

– Which attributes are retrieved?

– Which attributes are involved in selection/join conditions? How selective are these conditions likely to be?

• For each update in the workload:

– Which attributes are involved in selection/join conditions? How selective are these conditions likely to be?

– The type of update (INSERT/DELETE/UPDATE), and the attributes that are affected.

Choice of Indexes

• What indexes should we create?

– Which relations should have indexes? What field(s) should be the search key? Should we build several indexes?

109

Page 110: Dbms Complete Notes With Addons

• For each index, what kind of an index should it be?

– Clustered? Hash/tree?

• One approach: Consider the most important queries in turn. Consider the best plan using the current indexes, and see if a better plan is possible with an additional index. If so, create it.

– Obviously, this implies that we must understand how a DBMS evaluates queries and creates query evaluation plans!

– For now, we discuss simple 1-table queries.

• Before creating an index, must also consider the impact on updates in the workload!

– Trade-off: Indexes can make queries go faster, updates slower. Require disk space, too.

Index Selection Guidelines

• Attributes in WHERE clause are candidates for index keys.

– Exact match condition suggests hash index.

– Range query suggests tree index.

• Clustering is especially useful for range queries; can also help on equality queries if there are many duplicates.

• Multi-attribute search keys should be considered when a WHERE clause contains several conditions.

– Order of attributes is important for range queries.

– Such indexes can sometimes enable index-only strategies for important queries.

• For index-only strategies, clustering is not important!

Examples of Clustered Indexes

110

Page 111: Dbms Complete Notes With Addons

• B+ tree index on E.age can be used to get qualifying tuples.

– How selective is the condition?

– Is the index clustered?

• Consider the GROUP BY query.

– If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly.

– Clustered E.dno index may be better!

• Equality queries and duplicates:

– Clustering on E.hobby helps!

SELECT E.dno

FROM Emp E

WHERE E.age>40

SELECT E.dno, COUNT (*)

FROM Emp E

WHERE E.age>10

GROUP BY E.dno

SELECT E.dno

FROM Emp E

WHERE E.hobby=Stamps

Indexes with Composite Search Keys

• Composite Search Keys: Search on a combination of fields.

– Equality query: Every field value is equal to a constant value. E.g. wrt <sal,age> index:

• age=20 and sal =75

– Range query: Some field value is not a constant. E.g.:

111

Page 112: Dbms Complete Notes With Addons

• age =20; or age=20 and sal > 10

• Data entries in index sorted by search key to support range queries.

– Lexicographic order, or

– Spatial order.

Data entries in index Data entries

sorted by <sal>

sorted by <sal,age>

Composite Search Keys

112

Page 113: Dbms Complete Notes With Addons

• To retrieve Emp records with age=30 AND sal=4000, an index on <age,sal> would be better than an index on age or an index on sal.

– Choice of index key orthogonal to clustering etc.

• If condition is: 20<age<30 AND 3000<sal<5000:

– Clustered tree index on <age,sal> or <sal,age> is best.

• If condition is: age=30 AND 3000<sal<5000:

– Clustered <age,sal> index much better than <sal,age> index!

• Composite indexes are larger, updated more often.

Index-Only Plans

• A number of queries can be answered without retrieving any tuples from one or more of the relations involved if a suitable index is available.

<E. age,E.sal>

Or

<E.sal, E.age>

SELECT AVG(E.sal)

113

Page 114: Dbms Complete Notes With Addons

FROM Emp E

WHERE E.age=25 AND

E.sal BETWEEN 3000 AND 5000

Summary

• Many alternative file organizations exist, each appropriate in some situation.

• If selection queries are frequent, sorting the file or building an index is important.

– Hash-based indexes only good for equality search.

– Sorted files and tree-based indexes best for range search; also good for equality search. (Files rarely kept sorted in practice; B+ tree index is better.)

Index is a collection of data entries plus a way to quickly find entries with given key values.

• Data entries can be actual data records, <key, rid> pairs, or <key, rid-list> pairs.

– Choice orthogonal to indexing technique used to locate data entries with a given key value.

• Can have several indexes on a given file of data records, each with a different search key.

• Indexes can be classified as clustered vs. unclustered, primary vs. secondary, and dense vs. sparse. Differences have important consequences for utility/performance.

Introduction

• As for any index, 3 alternatives for data entries k*:

– Data record with key value k

– <k, rid of data record with search key value k>

– <k, list of rids of data records with search key k>

• Choice is orthogonal to the indexing technique used to locate data entries k*.

• Tree-structured indexing techniques support both range searches and equality searches.

• ISAM : static structure; B+ tree: dynamic, adjusts gracefully under inserts and deletes.

Range Searches

114

Page 115: Dbms Complete Notes With Addons

• ``Find all students with gpa > 3.0’’

– If data is in sorted file, do binary search to find first such student, then scan to find others.

– Cost of binary search can be quite high.

Simple idea: Create an `index’ file.

ISAM

Index entry

Comments on ISAM

115

Page 116: Dbms Complete Notes With Addons

• File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then space for overflow pages.

• Index entries: <search key value, page id>; they `direct’ search for data entries, which are in leaf pages.

• Search : Start at root; use key comparisons to go to leaf. Cost log F N ; F = # entries/index pg, N = # leaf pgs

• Insert : Find leaf data entry belongs to, and put it there.

• Delete : Find and remove from leaf; if empty overflow page, de-allocate.

Example ISAM Tree

• Each node can hold 2 entries; no need for `next-leaf-page’ pointers. (Why?)

116

Page 117: Dbms Complete Notes With Addons

117

Page 118: Dbms Complete Notes With Addons

B+ Tree: Most Widely Used Index

• Insert/delete at log F N cost; keep tree height-balanced. (F = fanout, N = # leaf pages)

• Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree.

• Supports equality and range-searches efficiently.

118

Page 119: Dbms Complete Notes With Addons

Example B+ Tree

• Search begins at root, and key comparisons direct it to a leaf (as in ISAM).

• Search for 5*, 15*, all data entries >= 24* ...

B+ Trees in Practice

• Typical order: 100. Typical fill-factor: 67%.

– average fanout = 133

• Typical capacities:

– Height 4: 1334 = 312,900,700 records

– Height 3: 1333 = 2,352,637 records

• Can often hold top levels in buffer pool:

– Level 1 = 1 page = 8 Kbytes

– Level 2 = 133 pages = 1 Mbyte

– Level 3 = 17,689 pages = 133 MBytes

Inserting a Data Entry into a B+ Tree

• Find correct leaf L.

• Put data entry onto L.

– If L has enough space, done!

– Else, must split L (into L and a new node L2)

119

Page 120: Dbms Complete Notes With Addons

• Redistribute entries evenly, copy up middle key.

• Insert index entry pointing to L2 into parent of L.

• This can happen recursively

– To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.)

• Splits “grow” tree; root split increases height.

Tree growth: gets wider or one level taller at top.

Inserting 8* into Example B+ Tree

• Observe how minimum occupancy is guaranteed in both leaf and index pg splits.

• Note difference between copy-up and push-up; be sure you understand the reasons for this.

120

Page 121: Dbms Complete Notes With Addons

Example B+ Tree After Inserting 8*

Notice that root was split, leading to increase in height

In this example, we can avoid split by re-distributing entries; however, this is usually not done in practice.

Deleting a Data Entry from a B+ Tree

• Start at root, find leaf L where entry belongs.

• Remove the entry.

– If L is at least half-full, done!

– If L has only d-1 entries,

• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

• If re-distribution fails, merge L and sibling.

• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

Merge could propagate to root, decreasing height.

Example Tree After (Inserting 8*, Then) Deleting 19* and 20* ...

121

Page 122: Dbms Complete Notes With Addons

• Deleting 19* is easy.

Deleting 20* is done with re-distribution. Notice how middle key is copied up.

... And Then Deleting 24*

• Must merge.

• Observe `toss’ of index entry (on right), and `pull down’ of index entry (below).

122

Page 123: Dbms Complete Notes With Addons

123

Page 124: Dbms Complete Notes With Addons

ADDONS

What's a database ?

A database is a collection of data organized in a particular way.Databases can be of many types such as Flat File Databases, Relational Databases, Distributed Databases etc.

What's SQL ?

In 1971, IBM researchers created a simple non-procedural language called Structured English Query Language. or SEQUEL. This was based on Dr. Edgar F. (Ted) Codd's design of a relational model for data storage where he described a universal programming language for accessing databases.

In the late 80's ANSI and ISO (these are two organizations dealing with standards for a wide variety of things) came out with a standardized version called Structured Query Language or SQL. SQL is prounced as 'Sequel'. There have been several versions of SQL and the latest one is SQL-99. Though SQL-92 is the current universally adopted standard.

SQL is the language used to query all databases. It's simple to learn and appears to do very little but is the heart of a successful database application. Understanding SQL and using it efficiently is highly imperative in designing an efficient database application. The better your understanding of SQL the more versatile you'll be in getting information out of databases.

What's an RDBMS ?

This concept was first described around 1970 by Dr. Edgar F. Codd in an IBM research publication called "System R4 Relational".

A relational database uses the concept of linked two-dimensional tables which comprise of rows and columns. A user can draw relationships between multiple tables and present the output as a table again. A user of a relational database need not understand the representation of data in order to retrieve it. Relational programming is non-procedural.

[What's procedural and non-procedural ?

Programming languages are procedural if they use programming elements such as conditional statements (if-then-else, do-while etc.). SQL has none of these types of statements.]

In 1979, Relational Software released the world's first relational database called Oracle V.2

124

Page 125: Dbms Complete Notes With Addons

What a DBMS ?

MySQL and mSQL are database management systems or DBMS. These software packages are used to manipulate a database. All DBMSs use their own implementation of SQL. It may be a subset or a superset of the instructions provided by SQL 92.

MySQL, due to it's simplicity uses a subset of SQL 92 (also known as SQL2).

What's Database Normalization ?

Normalization is the process where a database is designed in a way that removes redundancies, and increases the clarity in organizing data in a database.

In easy English, it means take similar stuff out of a collection of data and place them into tables. Keep doing this for each new table recursively and you'll have a Normalized database. From this resultant database you should be able to recreate the data into it's original state if there is a need to do so.

The important thing here is to know when to Normalize and when to be practical. That will come with experience. For now, read on...

Normalization of a database helps in modifying the design at later times and helps in being prepared if a change is required in the database design. Normalization raises the efficiency of the datatabase in terms of management, data storage and scalability.

Now Normalization of a Database is achieved by following a set of rules called 'forms' in creating the database.

These rules are 5 in number (with one extra one stuck in-between 3&4) and they are:

1st Normal Form or 1NF:

Each Column Type is Unique.

2nd Normal Form or 2NF:

The entity under consideration should already be in the 1NF and all attributes within the entity should depend solely on the entity's unique identifier.

3rd Normal Form or 3NF:

The entity should already be in the 2NF and no column entry should be dependent on any other entry (value) other than the key for the table.If such an entity exists, move it outside into a new table.

125

Page 126: Dbms Complete Notes With Addons

Now if these 3NF are achieved, the database is considered normalized. But there are three more 'extended' NF for the elitist.

These are:

BCNF (Boyce & Codd):

The database should be in 3NF and all tables can have only one primary key.

4NF:

Tables cannot have multi-valued dependencies on a Primary Key.

5NF:

There should be no cyclic dependencies in a composite key.

Well this is a highly simplified explanation for Database Normalization. One can study this process extensively though. After working with databases for some time you'll automatically create Normalized databases. As, it's logical and practical.

For now, don't worry too much about Normalization. The quickest way to grasp SQL and Databases is to plunge headlong into creating tables and start messing around with SQL statements. After you go through the tutorial examples and also the example contacts database, look at the example provided in creating a normalized database near the very end of this tutorial. And then try to think how you would like to create your own database.

Much of database design depends on how YOU want to keep the data. In real life situations often you may find it more convenient to store data in tables designed in a way that does fall a bit short of keeping all the NFs happy. But that's what databases are all about. Making your life simpler.

Onto SQL

There are four basic commands which are the workhorses for SQL and figure in almost all queries to a database.

INSERT - Insert Data

DELETE - Delete Data

SELECT - Pull Data

UPDATE - Change existing Data

126

Page 127: Dbms Complete Notes With Addons

As you can see SQL is like English.

Let's build a real world example database using MySQL and perform some SQL operations on it.

A database that practically anyone could use would be a Contacts database.

In our example we are going to create create a database with the following fields:

FirstName LastName

BirthDate

StreetAddress

City

State

Zip

Country

TelephoneHome

TelephoneWork

Email

CompanyName

Designation

First, lets decide how we are going to store this data in the database. For illustration purposes, we are going to keep this data in multiple tables.

This will let us exercise all of the SQL commands pertaining to retrieving data from multiple tables. Also we can separate different kinds of entities into different tables. So let's say you have thousands of friends and need to send a mass email to all of them, a SELECT statement (covered later) will look at only one table.

Well, we can keep the FirstName, LastName and BirthDate in one table.Address related data in another.Company Details in another.Emails can be separated into another.Telephones can be separated into another.

Let's build the database in MySQL.

127

Page 128: Dbms Complete Notes With Addons

While building a database - you need to understand the concept of data types. Data types allow the user to define how data is stored in fields or cells within a database. It's a way to define how your data will actually exist. Whether it's a Date or a string consisting of 20 characters, an integer etc. When we build tables within a database we also define the contents of each field in each row in the table using a data type. It's imperative that you use only the data type that fits your needs and don't use a data type that reserves more memory than the data in the field actually requires.

Let's look at various Data Types under MySQL.

  TypeSize in bytes

Description

 TINYINT (length) 1

Integer with unsigned range of 0-255 and a signed range from -128-127

SMALLINT (length) 2Integer with unsigned range of 0-65535 and a signed range from -32768-32767

MEDIUMINT(length) 3Integer with unsigned range of 0-16777215 and a signed range from -8388608-8388607

INT(length) 4Integer with unsigned range of 0-429467295 and a signed range from -2147483648-2147483647

BIGINT(length) 8Integer with unsigned range of 0-18446744 and a signed range from -9223372036854775808-9223372036854775807

FLOAT(length, decimal) 4Floating point number with max. value +/-3.402823466E38 and min.(non-zero) value +/11.175494351E-38

DOUBLEPRECISION(length, decimal)

8Floating point number with max. value +/- -1.7976931348623157E308 and min. (non-zero) value +/-2.2250738585072014E-308

DECIMAL(length, decimal) lengthFloating point number with the range of the DOUBLE type that is stored as a CHAR field type.

TIMESTAMP(length) 4

YYYYMMDDHHMMSS or YYMMDDHHMMSS or YYYYMMDD, YYMMDD. A Timestamp value is updated each time the row changes value. A NULL value sets the field to the current time.

DATE 3 YYYY-MM-DD

TIME 3 HH:MM:DD

DATETIME 8 YYYY-MM-DD HH:MM:SS

YEAR 1 YYYY or YY

CHAR(length) lengthA fixed length text string where fields shorter than the assigned length are filled with trailing spaces.

VARCHAR(length) lengthA fixed length text string (255 Character Max) where unused trailing spaces are removed before storing.

TINYTEXT length+1A text field with max. length of 255 characters.

128

Page 129: Dbms Complete Notes With Addons

TINYBLOB length+1A binary field with max. length of 255 characters.

TEXT length+164Kb of text

BLOB length+164Kb of data

MEDIUMTEXT length+316Mb of text

MEDIUMBLOB length+316 Mb of data

LONGTEXT length+44GB of text

LONGBLOB length+44GB of data

ENUM 1,2This field can contain one of a possible 65535 number of options. Ex: ENUM('abc','def','ghi')

SET 1-8This type of field can contain any number of a set of predefined possible values.

 

The following examples will make things quite clear on declaring Data Types within SQL statements.

Steps in Creating the Database using MySQL

From the shell prompt (either in DOS or UNIX):

mysqladmin create contacts;

This will create an empty database called "contacts".

Now run the command line tool "mysql" and from the mysql prompt do the following:

mysql> use contacts;

(You'll get the response "Database changed")

The following commands entered into the MySQL prompt will create the tables in the database.

mysql> CREATE TABLE names (contact_id SMALLINT NOT NULL AUTO_INCREMENT PRIMARY KEY, FirstName CHAR(20), LastName CHAR(20), BirthDate DATE);

mysql> CREATE TABLE address(contact_id SMALLINT NOT NULL PRIMARY KEY, StreetAddress CHAR(50), City CHAR(20), State CHAR(20), Zip CHAR(15), Country CHAR(20));

mysql> CREATE TABLE telephones (contact_id SMALLINT NOT NULL PRIMARY KEY, TelephoneHome CHAR(20), TelephoneWork(20));

129

Page 130: Dbms Complete Notes With Addons

mysql> CREATE TABLE email (contact_id SMALLINT NOT NULL PRIMARY KEY, Email CHAR(20));

mysql> CREATE TABLE company_details (contact_id SMALLINT NOT NULL PRIMARY KEY, CompanyName CHAR(25), Designation CHAR(15));

Note: Here we assume that one person will have only one email address. Now if there were a situation where one person has multiple addresses, this design would be a problem. We'd need another field which would keep values that indicated to whom the email address belonged to. In this particular case email data ownership is indicated by the primary key. The same is true for telephones. We are assuming that one person has only one home telephone and one work telephone number. This need not be true. Similarly one person could work for multiple companies at the same time holding two different designation. In all these cases an extra field will solve the issue. For now however let's work with this small design.

KEYS:

The relationships between columns located in different tables are usually described through the use of keys.

As you can see we have a PRIMARY KEY in each table. The Primary key serves as a mechanism to refer to other fields within the same row. In this case, the Primary key is used to identify a relationship between a row under consideration and the person whose name is located inside the 'names' table. We use the AUTO_INCREMENT statement only for the 'names' table as we need to use the generated contact_id number in all the other tables for identification of the rows.

This type of table design where one table establishes a relationship with several other tables is known as a 'one to many' relationship.In a 'many to many' relationship we could have several Auto Incremented Primary Keys in various tables with several inter-relationships.

Foreign Key:

A foreign key is a field in a table which is also the Primary Key in another table. This is known commonly as 'referential integrity'.

Execute the following commands to see the newly created tables and their contents.

To see the tables inside the database:

mysql> SHOW TABLES;+-----------------------+| Tables in contacts |

130

Page 131: Dbms Complete Notes With Addons

+-----------------------+| address || company_details || email || names || telephones |+----------------------+5 rows in set (0.00 sec)

To see the columns within a particular table:

mysql>SHOW COLUMNS FROM address;+---------------+-------------+------+-----+---------+-------+---------------------------------+| Field | Type | Null | Key | Default | Extra | Privileges|+---------------+-------------+------+-----+---------+-------+---------------------------------+| contact_id | smallint(6) | | PRI | 0 | | select,insert,update,references || StreetAddress | char(50) | YES | | NULL | | select,insert,update,references || City | char(20) | YES | | NULL | | select,insert,update,references || State | char(20) | YES | | NULL | | select,insert,update,references || Zip | char(10) | YES | | NULL | | select,insert,update,references || Country | char(20) | YES | | NULL | | select,insert,update,references |+---------------+-------------+------+-----+---------+-------+------------------ ---------------+6 rows in set (0.00 sec)

So we have the tables created and ready. Now we put in some data.Let's start with the 'names' table as it uses a unique AUTO_INCREMENT field which in turn is used in the other tables.

Inserting data, one row at a time:

mysql> INSERT INTO names (FirstName, LastName, BirthDate) VALUES ('Yamila','Diaz ','1974-10-13');Query OK, 1 row affected (0.00 sec)

Inserting multiple rows at a time:

mysql> INSERT INTO names (FirstName, LastName, BirthDate) VALUES ('Nikki','Taylor','1972-03-04'),('Tia','Carrera','1975-09-18');Query OK, 2 rows affected (0.00 sec)Records: 2 Duplicates: 0 Warnings: 0

Let's see what the data looks like inside the table. We use the SELECT command for this.

mysql> SELECT * from NAMES;+------------+-----------+----------+------------+

131

Page 132: Dbms Complete Notes With Addons

| contact_id | FirstName | LastName | BirthDate |+------------+-----------+----------+------------+| 3 | Tia | Carrera | 1975-09-18 || 2 | Nikki | Taylor | 1972-03-04 || 1 | Yamila | Diaz | 1974-10-13 |+------------+-----------+----------+------------+3 rows in set (0.06 sec)

Try another handy command called 'DESCRIBE'.

mysql> DESCRIBE names;+------------+-------------+------+-----+---------+----------------+---------------------------------+| Field | Type | Null | Key | Default | Extra | Privileges|+------------+-------------+------+-----+---------+----------------+---------------------------------+| contact_id | smallint(6) | | PRI | NULL | auto_increment | select,insert,update,references || FirstName | char(20) | YES | | NULL | | select,insert,update,references || LastName | char(20) | YES | | NULL | | select,insert,update,references || BirthDate | date | YES | | NULL | | select,insert,update,references |+------------+-------------+------+-----+---------+----------------+---------------------------------+4 rows in set (0.00 sec)

Now lets populate the other tables. Observer the syntax used.

mysql> INSERT INTO address(contact_id, StreetAddress, City, State, Zip, Country) VALUES ('1', '300 Yamila Ave.', 'Los Angeles', 'CA', '300012', 'USA'),('2','4000 Nikki St.','Boca Raton','FL','500034','USA'),('3','404 Tia Blvd.','New York','NY','10011','USA');Query OK, 3 rows affected (0.05 sec)Records: 3 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM address;+------------+-----------------+-------------+-------+--------+---------+| contact_id | StreetAddress | City | State | Zip | Country |+------------+-----------------+-------------+-------+--------+---------+| 1 | 300 Yamila Ave. | Los Angeles | CA | 300012 | USA || 2 | 4000 Nikki St. | Boca Raton | FL | 500034 | USA || 3 | 404 Tia Blvd. | New York | NY | 10011 | USA |+------------+-----------------+-------------+-------+--------+---------+3 rows in set (0.00 sec)

mysql> INSERT INTO company_details (contact_id, CompanyName, Designation) VALUES ('1','Xerox','New Business Manager'), ('2','Cabletron','Customer Support Eng'),('3','Apple','Sales Manager');Query OK, 3 rows affected (0.05 sec)Records: 3 Duplicates: 0 Warnings: 0

132

Page 133: Dbms Complete Notes With Addons

mysql> SELECT * FROM company_details;+------------+-------------+----------------------+| contact_id | CompanyName | Designation |+------------+-------------+----------------------+| 1 | Xerox | New Business Manager || 2 | Cabletron | Customer Support Eng || 3 | Apple | Sales Manager |+------------+-------------+----------------------+3 rows in set (0.06 sec)

mysql> INSERT INTO email (contact_id, Email) VALUES ('1', '[email protected]'),( '2', '[email protected]'),('3','[email protected]');Query OK, 3 rows affected (0.00 sec)Records: 3 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM email;+------------+-------------------+| contact_id | Email |+------------+-------------------+| 1 | [email protected] || 2 | [email protected] || 3 | [email protected] |+------------+-------------------+3 rows in set (0.06 sec)

mysql> INSERT INTO telephones (contact_id, TelephoneHome, TelephoneWork) VALUES ('1','333-50000','333-60000'),('2','444-70000','444-80000'),('3','555-30000','55 5-40000');Query OK, 3 rows affected (0.00 sec)Records: 3 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM telephones;+------------+---------------+---------------+| contact_id | TelephoneHome | TelephoneWork |+------------+---------------+---------------+| 1 | 333-50000 | 333-60000 || 2 | 444-70000 | 444-80000 || 3 | 555-30000 | 555-40000 |+------------+---------------+---------------+3 rows in set (0.00 sec)

Okay, so we now have all our data ready for experimentation.

133

Page 134: Dbms Complete Notes With Addons

Before we start experimenting with manipulating the data let's look at how MySQL stores the Data.

To do this execute the following command from the shell prompt.

mysqldump contacts > contacts.sql

Note: The reverse operation for this command is:

mysql contacts < contacts.sql

The file generated is a text file that contains all the data and SQL instruction needed to recreate the same database. As you can see, the SQL here is slightly different than what was typed in. Don't worry about this. It's all good ! It would also be obvious that this is a good way to backup your stuff.

# MySQL dump 8.2## Host: localhost Database: contacts#--------------------------------------------------------# Server version 3.22.34-shareware-debug

## Table structure for table 'address'#

CREATE TABLE address (contact_id smallint(6) DEFAULT '0' NOT NULL,StreetAddress char(50),City char(20),State char(20),Zip char(10),Country char(20),PRIMARY KEY (contact_id));

## Dumping data for table 'address'#

INSERT INTO address VALUES (1,'300 Yamila Ave.','Los Angeles','CA','300012','USA');INSERT INTO address VALUES (2,'4000 Nikki St.','Boca Raton','FL','500034','USA');INSERT INTO address VALUES (3,'404 Tia Blvd.','New York','NY','10011','USA');

134

Page 135: Dbms Complete Notes With Addons

## Table structure for table 'company_details'#

CREATE TABLE company_details (contact_id smallint(6) DEFAULT '0' NOT NULL,CompanyName char(25),Designation char(20),PRIMARY KEY (contact_id));

## Dumping data for table 'company_details'#

INSERT INTO company_details VALUES (1,'Xerox','New Business Manager');INSERT INTO company_details VALUES (2,'Cabletron','Customer Support Eng');INSERT INTO company_details VALUES (3,'Apple','Sales Manager');

## Table structure for table 'email'#

CREATE TABLE email (contact_id smallint(6) DEFAULT '0' NOT NULL,Email char(20),PRIMARY KEY (contact_id));

## Dumping data for table 'email'#

INSERT INTO email VALUES (1,'[email protected]');INSERT INTO email VALUES (2,'[email protected]');INSERT INTO email VALUES (3,'[email protected]');

## Table structure for table 'names'#

CREATE TABLE names (contact_id smallint(6) DEFAULT '0' NOT NULL auto_increment,FirstName char(20),LastName char(20),BirthDate date,

135

Page 136: Dbms Complete Notes With Addons

PRIMARY KEY (contact_id));

## Dumping data for table 'names'#

INSERT INTO names VALUES (3,'Tia','Carrera','1975-09-18');INSERT INTO names VALUES (2,'Nikki','Taylor','1972-03-04');INSERT INTO names VALUES (1,'Yamila','Diaz','1974-10-13');

## Table structure for table 'telephones'#

CREATE TABLE telephones (contact_id smallint(6) DEFAULT '0' NOT NULL,TelephoneHome char(20),TelephoneWork char(20),PRIMARY KEY (contact_id));

## Dumping data for table 'telephones'#

INSERT INTO telephones VALUES (1,'333-50000','333-60000');INSERT INTO telephones VALUES (2,'444-70000','444-80000');INSERT INTO telephones VALUES (3,'555-30000','555-40000');

Let's try some SELECT statement variations:

To select all names whose corresponding contact_id is greater than 1.

mysql> SELECT * FROM names WHERE contact_id > 1;+------------+-----------+----------+------------+| contact_id | FirstName | LastName | BirthDate |+------------+-----------+----------+------------+| 3 | Tia | Carrera | 1975-09-18 || 2 | Nikki | Taylor | 1972-03-04 |+------------+-----------+----------+------------+2 rows in set (0.00 sec)

136

Page 137: Dbms Complete Notes With Addons

As a condition we can also use NOT NULL. This statement will return all names where there exists a contact_id.

mysql> SELECT * FROM names WHERE contact_id IS NOT NULL;+------------+-----------+----------+------------+| contact_id | FirstName | LastName | BirthDate |+------------+-----------+----------+------------+| 3 | Tia | Carrera | 1975-09-18 || 2 | Nikki | Taylor | 1972-03-04 || 1 | Yamila | Diaz | 1974-10-13 |+------------+-----------+----------+------------+3 rows in set (0.06 sec)

Result's can be arranged in a particular way using the statement ORDER BY.

mysql> SELECT * FROM names WHERE contact_id IS NOT NULL ORDER BY LastName;+------------+-----------+----------+------------+| contact_id | FirstName | LastName | BirthDate |+------------+-----------+----------+------------+| 3 | Tia | Carrera | 1975-09-18 || 1 | Yamila | Diaz | 1974-10-13 || 2 | Nikki | Taylor | 1972-03-04 |+------------+-----------+----------+------------+3 rows in set (0.06 sec)

'asc' and 'desc' stand for ascending and descending respectively and can be used to arrange the results.

mysql> SELECT * FROM names WHERE contact_id IS NOT NULL ORDER BY LastName desc;+------------+-----------+----------+------------+| contact_id | FirstName | LastName | BirthDate |+------------+-----------+----------+------------+| 2 | Nikki | Taylor | 1972-03-04 || 1 | Yamila | Diaz | 1974-10-13 || 3 | Tia | Carrera | 1975-09-18 |+------------+-----------+----------+------------+3 rows in set (0.04 sec)

You can also place date types into conditional statements.

mysql> SELECT * FROM names WHERE BirthDate > '1973-03-06';+------------+-----------+----------+------------+| contact_id | FirstName | LastName | BirthDate |+------------+-----------+----------+------------+

137

Page 138: Dbms Complete Notes With Addons

| 3 | Tia | Carrera | 1975-09-18 || 1 | Yamila | Diaz | 1974-10-13 |+------------+-----------+----------+------------+

2 rows in set (0.00 sec)

LIKE is a statement to match field values using wildcards. The % sign is used for denoting wildcards and can represent multiple characters.

mysql> SELECT FirstName, LastName FROM names WHERE LastName LIKE 'C%';+-----------+----------+| FirstName | LastName |+-----------+----------+| Tia | Carrera |+-----------+----------+1 row in set (0.06 sec)

'_' is used to represent a single wildcard.

mysql> SELECT FirstName, LastName FROM names WHERE LastName LIKE '_iaz';+-----------+----------+| FirstName | LastName |+-----------+----------+| Yamila | Diaz |+-----------+----------+1 row in set (0.00 sec)

SQL Logical Operations (operates from Left to Right)

1.NOT or !

2. AND or &&

3. OR or ||

4. = : Equal

5. <> or != : Not Equal

6. <=

7. >=

8 <,>

Here are some more variations with Logical Operators and using the 'IN' statement.

138

Page 139: Dbms Complete Notes With Addons

mysql> SELECT FirstName FROM names WHERE contact_id < 3 AND LastName LIKE 'D%';+-----------+| FirstName |+-----------+| Yamila |+-----------+1 row in set (0.00 sec)

mysql> SELECT contact_id FROM names WHERE LastName IN ('Diaz','Carrera');+------------+| contact_id |+------------+| 3 || 1 |+------------+2 rows in set (0.02 sec)

To return the number of rows in a table

mysql> SELECT count(*) FROM names;+----------+| count(*) |+----------+| 3 |+----------+1 row in set (0.02 sec)

mysql> SELECT count(FirstName) FROM names;+------------------+| count(FirstName) |+------------------+| 3 |+------------------+1 row in set (0.00 sec)

To do some basic arithmetic aggregate functions.

mysql> SELECT SUM(contact_id) FROM names;+-----------------+| SUM(contact_id) |+-----------------+| 6 |+-----------------+1 row in set (0.00 sec)

139

Page 140: Dbms Complete Notes With Addons

To select a largest value from a row. Substitute 'MIN' and see what happens next.

mysql> SELECT MAX(contact_id) FROM names;+-----------------+| MAX(contact_id) |+-----------------+| 3 |+-----------------+1 row in set (0.00 sec)

HAVING

Take a look at the first query using the statement WHERE and the second statement using the statement HAVING.

mysql> SELECT * FROM names WHERE contact_id >=1;+------------+-----------+----------+------------+| contact_id | FirstName | LastName | BirthDate |+------------+-----------+----------+------------+| 1 | Yamila | Diaz | 1974-10-13 || 2 | Nikki | Taylor | 1972-03-04 || 3 | Tia | Carrera | 1975-09-18 |+------------+-----------+----------+------------+3 rows in set (0.03 sec)

mysql> SELECT * FROM names HAVING contact_id >=1;+------------+-----------+----------+------------+| contact_id | FirstName | LastName | BirthDate |+------------+-----------+----------+------------+| 3 | Tia | Carrera | 1975-09-18 || 2 | Nikki | Taylor | 1972-03-04 || 1 | Yamila | Diaz | 1974-10-13 |+------------+-----------+----------+------------+3 rows in set (0.00 sec)

Now lets work with multiple tables and see how information can be pulled out of the data.

mysql> SELECT names.contact_id, FirstName, LastName, Email FROM names, email WHERE names.contact_id = email.contact_id;+------------+-----------+----------+-------------------+| contact_id | FirstName | LastName | Email |+------------+-----------+----------+-------------------+| 1 | Yamila | Diaz | [email protected] || 2 | Nikki | Taylor | [email protected] |

140

Page 141: Dbms Complete Notes With Addons

| 3 | Tia | Carrera | [email protected] |+------------+-----------+----------+-------------------+3 rows in set (0.11 sec)

mysql> SELECT DISTINCT names.contact_id, FirstName, Email, TelephoneWork FROM names, email, telephones WHERE names.contact_id=email.contact_id=telephones.contact_id;+------------+-----------+-------------------+---------------+| contact_id | FirstName | Email | TelephoneWork |+------------+-----------+-------------------+---------------+| 1 | Yamila | [email protected] | 333-60000 || 2 | Nikki | [email protected] | 333-60000 || 3 | Tia | [email protected] | 333-60000 |+------------+-----------+-------------------+---------------+3 rows in set (0.05 sec)

So what's a JOIN ?

JOIN is the action performed on multiple tables that returns a result as a table. It's what makes a database 'relational'.

There are several types of joins. Let's look at LEFT JOIN (OUTER JOIN) and RIGHT JOIN

Let's first check out the contents of the tables we're going to use

mysql> SELECT * FROM names;+------------+-----------+----------+------------+| contact_id | FirstName | LastName | BirthDate |+------------+-----------+----------+------------+| 3 | Tia | Carrera | 1975-09-18 || 2 | Nikki | Taylor | 1972-03-04 || 1 | Yamila | Diaz | 1974-10-13 |+------------+-----------+----------+------------+3 rows in set (0.00 sec)

mysql> SELECT * FROM email;+------------+-------------------+| contact_id | Email |+------------+-------------------+| 1 | [email protected] || 2 | [email protected] || 3 | [email protected] |+------------+-------------------+3 rows in set (0.00 sec)

141

Page 142: Dbms Complete Notes With Addons

A LEFT JOIN First:

mysql> SELECT * FROM names LEFT JOIN email USING (contact_id);+------------+-----------+----------+------------+------------+------------------+| contact_id | FirstName | LastName | BirthDate | contact_id | Email|+------------+-----------+----------+------------+------------+-------------------+| 3 | Tia | Carrera | 1975-09-18 | 3 | [email protected] || 2 | Nikki | Taylor | 1972-03-04 | 2 | [email protected] || 1 | Yamila | Diaz | 1974-10-13 | 1 | [email protected] |+------------+-----------+----------+------------+------------+-------------------+3 rows in set (0.16 sec)

To find the people who have a home phone number.

mysql> SELECT names.FirstName FROM names LEFT JOIN telephones ON names.contact_id = telephones.contact_id WHERE TelephoneHome IS NOT NULL;+-----------+| FirstName |+-----------+| Tia || Nikki || Yamila |+-----------+3 rows in set (0.02 sec)

These same query leaving out 'names' (from names.FirstName) is still the same and will generate the same result.

mysql> SELECT FirstName FROM names LEFT JOIN telephones ON names.contact_id = telephones.contact_id WHERE TelephoneHome IS NOT NULL;+-----------+| FirstName |+-----------+| Tia || Nikki || Yamila |+-----------+3 rows in set (0.00 sec)

And now a RIGHT JOIN:

mysql> SELECT * FROM names RIGHT JOIN email USING(contact_id);+------------+-----------+----------+------------+------------+----------------- --+| contact_id | FirstName | LastName | BirthDate | contact_id | Email |+------------+-----------+----------+------------+------------+-------------------+| 1 | Yamila | Diaz | 1974-10-13 | 1 | [email protected] |

142

Page 143: Dbms Complete Notes With Addons

| 2 | Nikki | Taylor | 1972-03-04 | 2 | [email protected]|| 3 | Tia | Carrera | 1975-09-18 | 3 | [email protected]|+------------+-----------+----------+------------+------------+-------------------+3 rows in set (0.03 sec)

BETWEEN

This conditional statement is used to select data where a certain related contraint falls between a certain range of values. The following example illustrates it's use.

mysql> SELECT * FROM names;+------------+-----------+----------+------------+| contact_id | FirstName | LastName | BirthDate |+------------+-----------+----------+------------+| 3 | Tia | Carrera | 1975-09-18 || 2 | Nikki | Taylor | 1972-03-04 || 1 | Yamila | Diaz | 1974-10-13 |+------------+-----------+----------+------------+3 rows in set (0.06 sec)

mysql> SELECT FirstName, LastName FROM names WHERE contact_id BETWEEN 2 AND 3;+-----------+----------+| FirstName | LastName |+-----------+----------+| Tia | Carrera || Nikki | Taylor |+-----------+----------+2 rows in set (0.00 sec)

ALTER

The ALTER statement is used to add a new column to an existing table or to make changes to it.

mysql> ALTER TABLE names ADD Age SMALLINT;Query OK, 3 rows affected (0.11 sec)Records: 3 Duplicates: 0 Warnings: 0

Now let's take a look at the 'ALTER'ed Table.

143

Page 144: Dbms Complete Notes With Addons

mysql> SHOW COLUMNS FROM names;+------------+-------------+------+-----+---------+----------------+| Field | Type | Null | Key | Default | Extra |+------------+-------------+------+-----+---------+----------------+| contact_id | smallint(6) | | PRI | 0 | auto_increment || FirstName | char(20) | YES | | NULL | || LastName | char(20) | YES | | NULL | || BirthDate | date | YES | | NULL | || Age | smallint(6) | YES | | NULL | |+------------+-------------+------+-----+---------+----------------+5 rows in set (0.06 sec)

But we don't require Age to be a SMALLINT type when a TINYINT would suffice. So we use another ALTER statement.

mysql> ALTER TABLE names CHANGE COLUMN Age Age TINYINT;Query OK, 3 rows affected (0.02 sec)Records: 3 Duplicates: 0 Warnings: 0

mysql> SHOW COLUMNS FROM names;+------------+-------------+------+-----+---------+--------+----------------+| Field | Type | Null | Key | Default | Extra | +------------+-------------+------+-----+---------+--------+----------------+| contact_id | smallint(6) | | PRI | NULL || FirstName | char(20) | YES | | NULL | | | LastName | char(20) | YES | | NULL | | | BirthDate | date | YES | | NULL | || Age | tinyint(4) | YES | | NULL | |+------------+-------------+------+-----+---------+--------+----------------+5 rows in set (0.00 sec)

MODIFY

You can also use the MODIFY statement to change column data types.

mysql> ALTER TABLE names MODIFY COLUMN Age SMALLINT;Query OK, 3 rows affected (0.03 sec)Records: 3 Duplicates: 0 Warnings: 0

mysql> SHOW COLUMNS FROM names;+------------+-------------+------+-----+---------+----------------+---------------+| Field | Type | Null | Key | Default | Extra |+------------+-------------+------+-----+---------+----------------+---------------+| contact_id | smallint(6) | | PRI | NULL | auto_increment | | FirstName | char(20) | YES | | NULL | || LastName | char(20) | YES | | NULL | |

144

Page 145: Dbms Complete Notes With Addons

| BirthDate | date | YES | | NULL | || Age | smallint(6) | YES | | NULL | |+------------+-------------+------+-----+---------+----------------+---------------+5 rows in set (0.00 sec)

To Rename a Table:

mysql> ALTER TABLE names RENAME AS mynames;Query OK, 0 rows affected (0.00 sec)

mysql> SHOW TABLES;+--------------------+| Tables_in_contacts |+--------------------+| address || company_details || email || mynames || telephones |+--------------------+5 rows in set (0.00 sec)

We rename it back to the original name.

mysql> ALTER TABLE mynames RENAME AS names;Query OK, 0 rows affected (0.01 sec)

UPDATE

The UPDATE command is used to add a value to a field in a table.

mysql> UPDATE names SET Age ='23' WHERE FirstName='Tia';Query OK, 1 row affected (0.06 sec)Rows matched: 1 Changed: 1 Warnings: 0

The Original Table:

mysql> SELECT * FROM names;+------------+-----------+----------+------------+------+| contact_id | FirstName | LastName | BirthDate | Age |+------------+-----------+----------+------------+------+| 3 | Tia | Carrera | 1975-09-18 | 23 || 2 | Nikki | Taylor | 1972-03-04 | NULL || 1 | Yamila | Diaz | 1974-10-13 | NULL |

145

Page 146: Dbms Complete Notes With Addons

+------------+-----------+----------+------------+------+3 rows in set (0.05 sec)

The Modified Table:

mysql> SELECT * FROM names;+------------+-----------+----------+------------+------+| contact_id | FirstName | LastName | BirthDate | Age |+------------+-----------+----------+------------+------+| 3 | Tia | Carrera | 1975-09-18 | 24 || 2 | Nikki | Taylor | 1972-03-04 | NULL || 1 | Yamila | Diaz | 1974-10-13 | NULL |+------------+-----------+----------+------------+------+3 rows in set (0.00 sec)

DELETE

mysql> DELETE FROM names WHERE Age=23;Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM names;+------------+-----------+----------+------------+------+| contact_id | FirstName | LastName | BirthDate | Age |+------------+-----------+----------+------------+------+| 2 | Nikki | Taylor | 1972-03-04 | NULL || 1 | Yamila | Diaz | 1974-10-13 | NULL |+------------+-----------+----------+------------+------+2 rows in set (0.00 sec)

A DEADLY MISTAKE...

mysql> DELETE FROM names;Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM names;Empty set (0.00 sec)

One more destructive tool...

DROP TABLE

mysql> DROP TABLE names;Query OK, 0 rows affected (0.00 sec)

146

Page 147: Dbms Complete Notes With Addons

mysql> SHOW TABLES;+--------------------+| Tables in contacts |+--------------------+| address || company_details || email || telephones |+--------------------+4 rows in set (0.05 sec)

mysql> DROP TABLE address ,company_details, telephones;Query OK, 0 rows affected (0.06 sec)

mysql> SHOW TABLES;Empty set (0.00 sec)

As you can see, the table 'names' no longer exists. MySQL does not give a warning so be careful.

FULL TEXT INDEXING and Searching

Since version 3.23.23, Full Text Indexing and Searching has been introduced into MySQL. FULLTEXT indexes can be created from VARCHAR and TEXT columns. FULLTEXT searches are performed with the MATCH function. The MATCH function matches a natural language query on a text collection and from each row in a table it returns relevance.The resultant rows are organized in order of relevance.

Full Text searches are a very powerful way to search through text. But is not ideal for small tables of text and may produce inconsistent results. Ideally it works with large collections of textual data.

Optimizing your Database

Well, databases do tend to get large at some or the other. And here arises the issue of database optimization. Queries are going to take longer and longer as the database grows and certain things can be done to speed things up.

Clustering

The easiest method is that of 'clustering'. Suppose you do a certain kind of query often, it would be faster if the database contents were arranged in a in the same way data was requested. To keep

147

Page 148: Dbms Complete Notes With Addons

the tables in a sorted order you need a clustering index. Some databases keep stuff sorted automatically.

Ordered Indices

These are a kind of 'lookup' tables of sorts. For each column that may be of interest to you, you can create an ordered index.It needs to be noted that again these kinds of optimization techniques produce a system load in terms of creating a new index each time the data is re-arranged.

There are additional method such as B-Trees, Hashing which you may like to read up about but will not be discussed here.

Replication

Replication is the term given to the process where databases synchronize with each other. In this process one database updates it's own data with respect to another or with reference to certain criteria for updates specified by the programmer. Replication can be used under various circumstances. Examples may be : safety and backup, to provide a closer location to the database for certain users.

What are Transactions ?

In an RDBMS, when several people access the same data or if a server dies in the middle of an update, there has to be a mechanism to protect the integrity of the data. Such a mechanism is called a Transaction. A transaction groups a set of database actions into a single instantaneous event. This event can either succeed or fail. i.e .either get the job done or fail.

The definition of a transaction can be provided by an Acronym called 'ACID'.

(A)tomicity: If an action consists of multiple steps - it's still considered as one operation.

(C) Consistency: The database exists in a valid and accurate operating state before and after a transaction.

(I) Isolation: Processes within one transaction are independent and cannot interfere with that in others.

(D) Durability: Changes affected by a transaction are permanent.

To enable transactions a mechanism called 'Logging' needs to be introduced. Logging involves a DBMS writing details on the tables, columns and results of a particular transaction, both before

148

Page 149: Dbms Complete Notes With Addons

and after, onto a log file. This log file is used in the process of recovery. Now to protect a certain database resource (ex. a table) from being used and written onto simulatneously several techniques are used. One of them is 'Locking' another is to put a 'time stamp' onto an action. In the case of Locking, to complete an action, the DBMS would need to acquire locks on all resources needed to complete the action. The locks are released only when the transaction is completed.

Now if there were say a large numbers of tables involved in a particular action, say 50, all 50 tables would be locked till a transaction is completed.

To improve things a bit, there is another technique used called 2 Phase Locking or 2PL. In this method of locking, locks are acquired only when needed but are released only when the transaction is completed.

This is done to make sure that that altered data can be safely restored if the transaction fails for any reason.

This technique can also result in problems such as "deadlocks".

In this case - 2 processes requiring the same resources lock each other up by preventing the other to complete an action. Options here are to abort one, or let the programmer handle it.

MySQL implements transactions by implementing the Berkeley DB libraries into its own code. So it's the source version you'd want here for MySQL installation. Read the MySQL manual on implementing this.

Beyond MySQL

What are Views ?

A view allows you to assign the result of a query to a new private table. This table is given the name used in your VIEW query.Although MySQL does not support views yet a sample SQL VIEW construct statement would look like:

CREATE VIEW TESTVIEW AS SELECT * FROM names;

What are Triggers ?

149

Page 150: Dbms Complete Notes With Addons

A trigger is a pre-programmed notification that performs a set of actions that may be commonly required. Triggers can be programmed to execute certain actions before or after an event occurs. Triggers are very useful as they they increase efficiency and accuracy in performing operations on databases and also are increase productivity by reducing the time for application development. Triggers however do carry a price in terms of processing overhead.

What are Procedures ?

Like triggers, Procedures or 'Stored' Procedures are productivity enhancers. Suppose you needed to perform an action using a programming interface to the database in say PERL and ASP. If a programmed action could be stored at the database level, it's obvious that it has to be written only once and cam be called by any programming language interacting with the database.

Procedures are executed using triggers.

Beyond RDBMS

Distributed Databases (DDB)

A distributed database is a collection of several, logically interrelated database located at multiple locations of a computer network. A distributed database management system permits the management of such a database and makes the operation transparent to the user. Good examples of distributed databases would be those utilized by banks, multinational firms with several office locations where each distributed data system works only with the data that is relevant to it's operations. DDBs have have full functionality of any DBMS. It's also important to know that the distributed databases are considered to be actually one database rather than discrete files and data within distributed databases are logically interrelated.

Object Database Management Systems or ODBMS

When the capabilities of a database are integrated with object programming language capababilities, the resulting product is an ODBMS. Database objects appear as programming objects in an ODBMS. Using an ODBMS offers several advantages. The ones that can be most readily appreciated are:

1. EfficiencyWhen you use an ODBMS, you're using data the way you store it. You will use less code as

150

Page 151: Dbms Complete Notes With Addons

you're not dependent on an intermediary like SQL or ODBC. When this happens you can create highly complex data structures through your programming language.

2. SpeedWhen data is stored the way you'd like it to be stored (i.e. natively) there is a massive performance increase as no to-and-fro translation is required.

A Quick Tutorial on Database Normalization

Let's start off by taking some data represented in a Table.

Table Name: College Table

StudentName

CourseID1

CourseTitle1

CourseProfessor1

CourseID2

CourseTitle2

CourseProfessor2

StudentAdvisor

StudentID

Tia Carrera

CS123

Perl Regular Expressions

Don Corleone

CS003

Object Oriented Programming 1

Daffy Duck

Fred Flintstone

400

John Wayne

CS456

Socket Programming

DJ Tiesto

CS004

Algorithms

Homer Simpson

Barney Rubble

401

Lara Croft

CS789

OpenGL

Bill Clinton

CS001

Data Structures

Papa Smurf

Seven of Nine

402

(text size has been shrunk to aid printability on one page)

The First Normal Form: (Each Column Type is Unique and there are no repeating groups [types] of data)

This essentially means that you indentify data that can exist as a separate table and therefore reduce repetition and will reduce the width of the original table.

We can see that for every student, Course Information is repeated for each course. So if a student has three course, you'll need to add another set of columns for Course Title, Course Professor and CourseID. So Student information and Course Information can be considered to be two broad groups.

151

Page 152: Dbms Complete Notes With Addons

Table Name: Student Information

StudentID (Primary Key)StudentNameAdvisorName

Table Name: Course Information

CourseID (Primary Key) CourseTitleCourseDescriptionCourseProfessor

It's obvious that we have here a Many to Many relationship between Students and Courses.

Note: In a Many to Many relationship we need something called a relating table which basically contains information exclusively on which relatioships exist between two tables. In a One to Many relationship we use a foreign key.

So in this case we need another little table called: Students and Courses

Table Name: Students and Courses

SnCStudentIDSnCCourseID

The Second Normal Form: (All attributes within the entity should depend solely on the entity's unique identifier)

The AdvisorName under Student Information does not depend on the StudentID. Therefore it can be moved to it's own table.

Table Name: Student Information

StudentID (Primary Key)StudentName

Table Name: Advisor Information

AdvisorIDAdvisorName

152

Page 153: Dbms Complete Notes With Addons

Table Name: Course Information

CourseID (Primary Key) CourseTitleCourseDescriptionCourseProfessor

Table Name: Students and Courses

SnCStudentIDSnCCourseID

Note: Relating Tables can be created as required.

The Third Normal Form:(no column entry should be dependent on any other entry (value) other than the key for the table)

In simple terms - a table should contain information about only one thing.

In Course Information, we can pull CourseProfessor information out and store it in another table.

Table Name: Student Information

StudentID (Primary Key)StudentName

Table Name: Advisor Information

AdvisorIDAdvisorName

Table Name: Course Information

CourseID (Primary Key) CourseTitleCourseDescription

Table Name: Professor Information

ProfessorIDCourseProfessor

153

Page 154: Dbms Complete Notes With Addons

Table Name: Students and Courses

SnCStudentIDSnCCourseID

Note: Relating Tables can be created as required.

154