Introduction to database

By: Engineer Muhammad

Suleman Memon

M.E(Information Technology)

B.E(Computer System)

A database is a simple, yet flexible and powerful tool for storing and retrieving data.

Every company, every website, has lots of data.

The more of your data that you keep in your database - the better.

Far from being a tool only useful to big businesses, even if you just want a simple guest book or page hit counter, a database is perfect.

Whichever database you use - it'll be a relational database.

This is the industry standard design these days.

Relational databases use the principles of set theory.

Set theory is a field of mathematics that describes how to deal with sets of data.

Relational databases are quite intuitive and easy to understand.

All data is held in tables.

A table has columns (along the top) and rows.

You create the tables you need. You define the table names.

You define what the column names are in each table.

You define what type of data the columns are...

There are a number of different data types available which represent the different types of data you find in real life.

There are analogous types in all databases and programming languages. Each has variations, but they're all fundamentally the same.

They are:

• Numerical Types. i.e. Numbers. There are fundamentally two types: integer and float. Integers are whole numbers (i.e. 1, 2, 100, 999999). Floats are numbers with decimal places (i.e. (1.1, 22.5, 3.1415927).

• String Types. i.e. Text. There are two types here: Fixed length, and variable length. 'char' is the only fixed length type in MySQL - from 1-255 characters.

• 'varchar' is a variable length field that can be 1-255 characters. There are several

• 'text' types of varying lengths in MySQL.

Date and Time Types For storing dates & times.

Binary Data This is arbitrary data, could be images, programs absolutely anything.

All Relational Databases use indexes.

Similar to the index in a book, indexes provide a quick way to find the exact data item you want.

Imagine you have a database of 100,000 customers, and you want to find just one.

If you just read the 'customers' table from start to finish

until you find the one your searching for, you

could end up having to read all 100,000 records.

This would be very slow.

Most relational databases use a b-tree index structure.

This is a clever algorithm that guarantees that you can find a data item by reading at most 3 rows from the index.

Databases commonly have

millions of rows - so you can see the necessity for indexes!

Indexes are a large part of databases and their design.

Defining a column as the primary key implicitly creates an index.

f you have a primary key on a table - it has an index.

You can add a number of indexes to each table you have.

You'd use the create index command - more later...

Indexes are used automatically by the database itself when you issue a query (ask for data).

It uses the index to find the data in the table .

For example, we want to get a customer's details from the example 'customers' table

above...

If we submit the following SQL query, the database will use the index it created for primary key column 'customer_id', and get everything for customer 1:

select * from customers where customer_id = 1;

The database uses the index because it can use it.

The query contains the 'customer_id' so it can look in the index and find the location

of customer '1'.

If there's no index on the column in the query, the database will have to go through the whole table! This is called a full table scan .

These days, when you talk about databases in the wild, you are primarily talking about two types: analytical databases and operational databases.

Analytic Databases

Analytic databases (a.k.a. OLAP- On Line Analytical Processing) are primarily static, read-only databases which store archived, historical data used for analysis.

For example, a company might store sales records over the last ten years in an analytic database and use that database to analyze marketing strategies in relationship to demographics.

On the web, you will often see analytic databases in the form of inventory catalogs such as the one shown previously from Amazon.com.

An inventory catalog analytical database usually holds descriptive information about all available products in the inventory.

Web pages are generated dynamically by querying the list of available products in the inventory against some search parameters.

The dynamically-generated page will display the information about each item (such as title, author, ISBN) which is stored in the database.

Operational databases (a.k.a. OLTP On Line Transaction Processing), on the other hand, are used to manage more dynamic bits of data.

These types of databases allow you to do more than simply view archived data.

Operational databases allow you to modify that data (add, change or delete data).

These types of databases are usually used to track real-time information.

For example, a company might have an operational database used to track warehouse/stock quantities.

As customers order products from an online web store, an operational database can be used to keep track of how many items have been sold and when the company will need to reorder stock

Besides differentiating databases according to function, databases can also be differentiated according to how they model the data.

What is a data model? Well, essentially a data model is a

"description" of both a container for data and a methodology for storing and retrieving data from that container.

Actually, there isn't really a data model "thing".

Data models are abstractions, oftentimes mathematical algorithms and concepts.

You cannot really touch a data model. But nevertheless, they are very useful. The analysis and design of data models has

been the cornerstone of the evolution of databases.

As models have advanced so has database efficiency.

Before the 1980's, the two most commonly used Database Models were the hierarchical and network systems.

As its name implies, the Hierarchical Database Model defines hierarchically-arranged data.

Perhaps the most intuitive way to visualize this type of relationship is by visualizing an upside down tree of data.

In this tree, a single table acts as the "root" of the database from which other tables "branch" out.

You will be instantly familiar with this relationship because that is how all windows-based directory management systems (like Windows Explorer) work these days.

Relationships in such a system are thought of in terms of children and parents such that a child may only have one parent but a parent can have multiple children.

Parents and children are tied together by links called "pointers" (perhaps physical addresses inside the file system).

A parent will have a list of pointers to each of their children.

This child/parent rule assures that data is systematically accessible.

To get to a low-level table, you start at the root and work your way down through the tree until you reach your target.

Of course, as you might imagine, one problem with this system is that the user must know how the tree is structured in order to find anything!

The hierarchical model however, is much more efficient than the flat-file model we discussed earlier because there is not as much need for redundant data.

If a change in the data is necessary, the change might only need to be processed once. Consider the student flatfile database example from our discussion of what databases are:

Examples of hierarchical data represented as relational tables

An organization could store employee information in a table that contains attributes/columns such as employee number, first name, last name, and Department number.

The organization provides each employee with computer hardware as needed, but computer equipment may only be used by the employee to which it is assigned.

The organization could store the computer hardware information in a separate table that includes each part's serial number, type, and the employee that uses it.

In many ways, the Network Database model was designed to solve some of the more serious problems with the Hierarchical Database Model.

Specifically, the Network model solves the problem of data redundancy by representing relationships in terms of sets rather than hierarchy.

The model had its origins in the Conference on Data Systems Languages (CODASYL) which had created the Data Base Task Group to explore and design a method to replace the hierarchical model.

The network model is very similar to the hierarchical model actually.

In fact, the hierarchical model is a subset of the network model.

However, instead of using a single-parent tree hierarchy, the network model uses set theory to provide a tree-like hierarchy with the exception that child tables were allowed to have more than one parent.

his allowed the network model to support many-to-many relationships.

Visually, a Network Database looks like a hierarchical Database in that you can see it as a type of tree.

However, in the case of a Network Database, the look is more like several trees which share branches.

Thus, children can have multiple parents and parents can have multiple children.

(RDBMS - relational database management system) A database based on the relational model developed by E.F. Codd.

A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints.

In such a database the data and relations between them are organised in tables. A table is a collection of records and each record in a table contains the same fields.

Properties of Relational Tables:

Values Are Atomic

Each Row is Unique

Column Values Are of the Same Kind

The Sequence of Columns is Insignificant

The Sequence of Rows is Insignificant

Each Column Has a Unique Name

Certain fields may be designated as keys, which means that searches for specific values of that field will use indexing to speed them up.

Where fields in two different tables take values from the same set, a join operation can be performed to select related records in the two tables by matching values in those fields.

Often, but not always, the fields will have the same name in both tables.

For example, an "orders" table might contain (customer-ID, product-code) pairs and a "products" table might contain (product-code, price) pairs so to calculate a given customer's bill you would sum the prices of all products ordered by that customer by joining on the product-code fields of the two tables.

This can be extended to joining multiple tables on multiple fields.

Because these relationships are only specified at retreival time, relational databases are classed as dynamic database management system.

The RELATIONAL database model is based on the Relational Algebra.

Object/relational database management systems (ORDBMSs) add new object storage capabilities to the relational systems at the core of modern information systems.

These new facilities integrate management of traditional fielded data, complex objects such as time-series and geospatial data and diverse binary media such as audio, video, images, and applets.

By encapsulating methods with data structures, an ORDBMS server can execute comple x analytical and data manipulation operations to search and transform multimedia and other complex objects.

As an evolutionary technology, the object/relational (OR) approach has inherited the robust transaction- and performance-management features of it s relational ancestor and the flexibility of its object-oriented cousin.

database designers can work with familiar tabular structures and data definition languages (DDLs) while assimilating new object-management possibilities.

Query and procedural languages and call interfaces in ORDBMSs are familiar: SQL3, vendor procedural languages, and ODBC, JDBC, and proprie tary call interfaces are all extensions of RDBMS languages and interfaces.

And the leading vendors are, of course, quite well known: IBM, Inform ix, and Oracle.

Object DBMSs add database functionality to object programming languages.

They bring much more than persistent storage of programming language objects.

Object DBMSs extend the semantics of the C++, Smalltalk and Java object programming languages to provide full-featured database programming capability, while retaining native language compatibility.

A major benefit of this approach is the unification of the application and database development into a seamless data model and language environment.

As a result, applications require less code, use more natural data modeling, and code bases are easier to maintain.

Object developers can write complete database applications with a modest amount of additional effort.

According to Rao (1994), "The object-oriented database (OODB) paradigm is the combination of object-oriented programming language (OOPL) systems and persistent systems.

The power of the OODB comes from the seamless treatment of both persistent data, as found in databases, and transient data, as found in executing programs."

In contrast to a relational DBMS where a complex data structure must be flattened out to fit into tables or joined together from those tables to form the in-memory structure, object DBMSs have no performance overhead to store or retrieve a web or hierarchy of interrelated objects.

This one-to-one mapping of object programming language objects to database objects has two benefits over other storage approaches:

It provides higher performance management of objects, and it enables better management of the complex interrelationships between objects.

This makes object DBMSs better suited to support applications such as financial portfolio risk analysis systems, telecommunications service applications, world wide web document structures, design and manufacturing systems, and hospital patient record systems, which have complex relationships between data.

In semistructured data model, the information that is normally associated with a schema is contained within the data, which is sometimes called ``self-describing''.

In such database there is no clear separation between the data and the schema, and the degree to which it is structured depends on the application.

In some forms of semistructured data there is no separate schema, in others it exists but only places loose constraints on the data.

Semi-structured data is naturally modelled in terms of graphs which contain labels which give semantics to its underlying structure.

Such databases subsume the modelling power of recent extensions of flat relational databases, to nested databases which allow the nesting (or encapsulation) of entities, and to object databases which, in addition, allow cyclic references between objects.

The associative model divides the real-world things about which data is to be recorded into two sorts:

Entities are things that have discrete, independent existence.

An entity’s existence does not depend on any other thing.

Associations are things whose existence depends on one or more other things, such that if any of those things ceases to exist, then the thing itself ceases to exist or becomes meaningless.

An associative database comprises two data structures:

1. A set of items, each of which has a unique identifier, a name and a type.

2. A set of links, each of which has a unique identifier, together with the unique identifiers of three other things, that represent the source source, verb and target of a fact that is recorded about the source in the database. Each of the three things identified by the source, verb and target may be either a link or an item.

The best way to understand the rationale of EAV design is to understand row modeling (of which EAV is a generalized form).

Consider a supermarket database that must manage thousands of products and brands, many of which have a transitory existence.

Here, it is intuitively obvious that product names should not be hard-coded as names of columns in tables. Instead, one stores product descriptions in a Products table: purchases/sales of individual items are recorded in other tables as separate rows with a product ID referencing this table.

Conceptually an EAV design involves a single table with three columns, an entity (such as an olfactory receptor ID), an attribute (such as species, which is actually a pointer into the metadata table) and a value for the attribute (e.g., rat). In EAV design, one row stores a single fact.

In a conventional table that has one column per attribute, by contrast, one row stores a set of facts. EAV design is appropriate when the number of parameters that potentially apply to an entity is vastly more than those that actually apply to an individual entity.

The context data model combines features of all the above models.

It can be considered as a collection of object-oriented, network and semistructured models or as some kind of object database.

In other words this is a flexible model, you can use any type of database structure depending on task. Such data model has been implemented in DBMS ConteXt.

The fundamental unit of information storage of ConteXt is a CLASS.

Class contains METHODS and describes OBJECT.

The Object contains FIELDS and PROPERTY. The field may be composite, in this case the field contains SubFields etc.

The property is a set of fields that belongs to particular Object. (similar to AVL database). In other words, fields are permanent part of Object but Property is its variable part.

The header of Class contains the definition of the internal structure of the Object, which includes the description of each field, such as their type, length, attributes and name.

Context data model has a set of predefined types as well as user defined types.

The predefined types include not only character strings, texts and digits but also pointers (references) and aggregate types (structures).

A context model comprises three main data types: REGULAR, VIRTUAL and REFERENCE.

Database design is the process of producing a detailed data model of a database.

This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database.

A fully attributed data model contains detailed attributes for each entity.

http://en.wikipedia.org/wiki/Data_model

http://en.wikipedia.org/wiki/Database

http://en.wikipedia.org/wiki/Logical_data_model

http://en.wikipedia.org/wiki/Data_Definition_Language

http://en.wikipedia.org/wiki/Data_Definition_Language

The term database design can be used to describe many different parts of the design of an overall database system.

Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data.

In the relational model these are the tables and views.

http://en.wikipedia.org/wiki/Database_system

http://en.wikipedia.org/wiki/Relational_model

http://en.wikipedia.org/wiki/Database_table

http://en.wikipedia.org/wiki/Database_view

Conceptual schema:

A conceptual schema or conceptual data model is a map of concepts and their relationships.

This describes the semantics of an organization and represents a series of assertions about its nature.

Specifically, it describes the things of significance to an organization (entity classes), about which it is inclined to collect information, and characteristics of (attributes) and associations between pairs of those things of significance (relationships).

http://en.wikipedia.org/wiki/Data_model

http://en.wikipedia.org/wiki/Concept

http://en.wikipedia.org/wiki/Relational_model

http://en.wikipedia.org/wiki/Semantics

http://en.wikipedia.org/wiki/Logical_assertion

http://en.wikipedia.org/wiki/Organization

Because a conceptual schema represents the semantics of an organization, and not a database design, it may exist on various levels of abstraction.

Conceptual data models take a more abstract perspective, identifying the fundamental things, of which the things an individual deals with are just examples.

The model does allow for what is called inheritance in object oriented terms.

http://en.wikipedia.org/wiki/Database_design

http://en.wikipedia.org/wiki/Inheritance_%28computer_science%29

http://en.wikipedia.org/wiki/Object_oriented

A data structure diagram (DSD) is a data model or diagram used to describe conceptual data models by providing graphical notations which document entities and their relationships, and the constraints that binds them.

http://en.wikipedia.org/wiki/Data_structure_diagram

Once the relationships and dependencies amongst the various pieces of information have been determined, it is possible to arrange the data into a logical structure which can then be mapped into the storage objects supported by the database management system.

Ensuring, via normalisation procedures and the definition of integrity rules, that the stored database will be non-redundant and properly connected.

logical data structuring) is based on the identification of: the entities, their attributes, and the relationships between the entities.

http://en.wikipedia.org/wiki/Database_management_system

Entity:

Something about which an enterprise needs to keep data.

Attributes:

The properties of an entity.

Relationships

The connections between entities.

An Entity may be physical

Example:

an Employee; a Part; a Machine

Or conceptual

Example:

a Project; an Order; a Course.

Each instance of an entity is different from all others - one or more attributes will typically form a 'primary key' attribute - unique to a particular instance.

Attributes are the properties of an entity .

Data which describes or is 'owned' by an entity. Attributes (data) equate to facts - specific details about entities - details of interest.

In the real world, objects do not exist in isolation.

Our understanding of real world objects is in terms of their relationships with other objects; for example, 'the earth circles the sun'; 'he is a carpenter' ; etc.

Any real world object which we are going to include in a data model as an entity type must have some relationship with at least one other entity within the model (even if we are not going to implement that relationship within our database system).

One-to-one:

Both tables can have only one record on either side of the relationship.

Each primary key value relates to only one (or no) record in the related table.

Most one-to-one relationships are forced by business rules and don't flow naturally from the data.

In the absence of such a rule, you can usually combine both tables into one table without breaking any normalization rules.

One-to-One Relationships Contd:

For example: a Factory may have many Managers during its lifetime; a Manager might be in charge of different Factories during his career.

One-to-many:

The primary key table contains only one record that relates to none, one, or many records in the related table.

This relationship is similar to the one between you and a parent.

You have only one mother, but your mother may have several children.

One-to-many Contd:

A formal description: of the relationship shown in the diagram above is:

One Factory may make zero or more Components.

One Component is made in one (and only one) Factory.

One-to-one: Contd:

What this means in a database system is that:

one record in a table called Factory may be related to a number of records in a Component table;

but

a record in the Component table can only be related to one record in the Factory table.

One-to-Many Relationships summarised:

For any occurrence of A, there may be 0, 1, or many, occurrences of B.

For any occurrence of B, there can only be one occurrence of A.

From another perspective:

If an 'A' record exists there may be zero or more related 'B' records. Any 'B' record can only be related to a single 'A' record.

Many-to-many:

Each record in both tables can relate to any number of records (or no records) in the other table.

For instance, if you have several siblings, so do your siblings (have many siblings).

Many-to-many relationships require a third table, known as an associate or linking table, because relational systems can't directly accommodate the relationship.

Many-to-many: Contd:

Minimally, a many-many relationship will require insertion of a 'link entity'.

Further analysis may show that the link entity has attributes of its own - often qualifiers in respect of quantity or time.

Many-to-many: Contd:

The physical design of the database specifies the physical configuration of the database on the storage media.

This includes detailed specification of data elements, data types, indexing options and other parameters residing in the DBMS data dictionary.

It is the detailed design of a system that includes modules & the database's hardware & software specifications of the system.

In the case of relational databases the storage objects are tables which store data in rows and columns.

http://en.wikipedia.org/wiki/Data_element

http://en.wikipedia.org/wiki/Data_element

http://en.wikipedia.org/wiki/Data_type

http://en.wikipedia.org/wiki/Index_%28database%29

http://en.wikipedia.org/wiki/Data_dictionary

http://en.wikipedia.org/wiki/Data_dictionary

http://en.wikipedia.org/wiki/Relational_databases

http://en.wikipedia.org/wiki/Database_table

• The purpose of normailization

• Data redundancy and Update Anomalies

• Functional Dependencies

• The Process of Normalization

• First Normal Form (1NF)

• Second Normal Form (2NF)

• Third Normal Form (3NF)

Normalization is a technique for producing a

set of relations with desirable properties, given

the data requirements of an enterprise.

The process of normalization is a formal method

that identifies relations based on their primary or

candidate keys and the functional dependencies

among their attributes.

Relations that have redundant data may have

problems called update anomalies, which are

classified as ,

Insertion anomalies

Deletion anomalies

Modification anomalies

To insert a new staff with branchNo B007 into the

StaffBranch relation;

To delete a tuple that represents the last member of staff

located at a branch B007;

To change the address of branch B003.

StaffBranch

staffNo sName position salary branchNo bAddress

SL21 John White Manager 30000 B005 22 Deer Rd, London

SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow

SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow

SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen

SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow

SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London

Figure 1 StraffBranch relation

Staff

staffNo sName position salary branceNo

SL21 John White Manager 30000 B005

SG37 Ann Beech Assistant 12000 B003

SG14 David Ford Supervisor 18000 B003

SA9 Mary Howe Assistant 9000 B007

SG5 Susan Brand Manager 24000 B003

SL41 Julie Lee Assistant 9000 B005

Branch

branceNo bAddress

B005 22 Deer Rd, London

B007 16 Argyll St, Aberdeen

B003 163 Main St,Glasgow

Figure 2 Straff and Branch relations

Functional dependency describes the relationship between

attributes in a relation.

For example, if A and B are attributes of relation R, and B is

functionally dependent on A ( denoted A B), if each value of

A is associated with exactly one value of B. ( A and B may each

consist of one or more attributes.)

A B B is functionally

dependent on A

Determinant Refers to the attribute or group of attributes on

the left-hand side of the arrow of a functional

dependency

Trival functional dependency means that the right-hand

side is a subset ( not necessarily a proper subset) of the left-

hand side.

For example: (See Figure 1)

staffNo, sName sName

staffNo, sName staffNo

They do not provide any additional information about possible integrity

constraints on the values held by these attributes.

We are normally more interested in nontrivial dependencies because they

represent integrity constraints for the relation.

Main characteristics of functional dependencies in normalization • Have a one-to-one relationship between attribute(s) on

the left- and right- hand side of a dependency; • hold for all time; • are nontrivial.

Identifying the primary key

Functional dependency is a property of the meaning or

semantics of the attributes in a relation. When a

functional

dependency is present, the dependency is specified as a

constraint between the attributes.

An important integrity constraint to consider first is the

identification of candidate keys, one of which is

selected to

be the primary key for the relation using functional

dependency.

Inference Rules A set of all functional dependencies that are implied by a given

set of functional dependencies X is called closure of X, written

X+. A set of inference rule is needed to compute X+ from X.

Armstrong’s axioms

1. Relfexivity: If B is a subset of A, them A B

2. Augmentation: If A B, then A, C B

3. Transitivity: If A B and B C, then A C

4. Self-determination: A A

5. Decomposition: If A B,C then A B and A C

6. Union: If A B and A C, then A B,C

7. Composition: If A B and C D, then A,C B,

Minial Sets of Functional Dependencies

A set of functional dependencies X is minimal if it satisfies

the following condition:

• Every dependency in X has a single attribute on its

right-hand side

• We cannot replace any dependency A B in X with

dependency C B, where C is a proper subset of A, and

still have a set of dependencies that is equivalent to X.

• We cannot remove any dependency from X and still have a

set of dependencies that is equivalent to X.

Example of A Minial Sets of Functional

Dependencies

A set of functional dependencies for the StaffBranch relation

satisfies the three conditions for producing a minimal set.

staffNo sName

staffNo position

staffNo salary

staffNo branchNo

staffNo bAddress

branchNo bAddress

branchNo, position salary

bAddress, position salary

• Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part).

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

• Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key.

CUSTOMER

Cust_ID Name Order_ID

101 AT&T 1234

101 AT&T 156

125 Cisco 1250

Partial

Dependency

• Transitive Dependency – when a non-key attribute determines another non-key attribute.

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sarah Smith 2 Mktg

Transitive

Dependency

• Normalization is often executed as a series of steps.

Each step corresponds to a specific normal form that has

known properties.

• As normalization proceeds, the relations become

progressively more restricted in format, and also less

vulnerable to update anomalies.

• For the relational data model, it is important to recognize

thatit is only first normal form (1NF) that is critical in

creating relations. All the subsequent normal forms are

optional.

• Unnormalized – There are multivalued attributes or repeating groups

• 1 NF – No multivalued attributes or repeating groups.

• 2 NF – 1 NF plus no partial dependencies

• 3 NF – 2 NF plus no transitive dependencies

• ISBN Title

• ISBN Publisher

• Publisher Address

All attributes are directly

or indirectly determined

by the primary key;

therefore, the relation is

at least in 1 NF

BOOK

ISBN Title Publisher Address

• ISBN Title

• ISBN Publisher


The relation is at least in 1NF.

There is no COMPOSITE

primary key, therefore there

can’t be partial dependencies.

Therefore, the relation is at

least in 2NF

BOOK


• ISBN Title

• ISBN Publisher


Publisher is a non-key attribute,

and it determines Address,

another non-key attribute.

Therefore, there is a transitive

dependency, which means that

the relation is NOT in 3 NF.

BOOK


• ISBN Title

• ISBN Publisher


We know that the relation is at

least in 2NF, and it is not in 3

NF. Therefore, we conclude

that the relation is in 2NF.

BOOK


• Option 2: Remove the entire repeating group from the relation. Create another relation which would contain all the attributes of the repeating group, plus the primary key from the first relation. In this new relation, the primary key from the original relation and the determinant of the repeating group will comprise a primary key. STUDENT


101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00


STUDENT

Stud_ID Name

101 Lennon

125 Jonson

STUDENT_COURSE

Stud_ID Course Units

101 MSI 250 3

101 MSI 415 3

125 MSI 331 3

STUDENT


101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00


Composite

Primary Key

• Goal: Remove Partial Dependencies

STUDENT


101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00


Composite

Primary Key

Partial

Dependencies

• Remove attributes that are dependent from the part but not the whole of the primary key from the original relation. For each partial dependency, create a new relation, with the corresponding part of the primary key from the original as the primary key. STUDENT


101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00


CUSTOMER


101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00


STUDENT_COURSE

Stud_ID Course_ID

101 MSI 250

101 MSI 415

125 MSI 331

STUDENT

Stud_ID Name

101 Lennon

101 Lennon

125 Johnson

COURSE

Course_ID Units

MSI 250 3.00

MSI 415 3.00

MSI 331 3.00

• Goal: Get rid of transitive dependencies.

EMPLOYEE




Transitive

Dependency

• Remove the attributes, which are dependent on a non-key attribute, from the original relation. For each transitive dependency, create a new relation with the non-key attribute which is a determinant in the transitive dependency as a primary key, and the dependent non-key attribute as a dependent.

EMPLOYEE




EMPLOYEE




EMPLOYEE

Emp_ID F_Name L_Name Dept_ID

111 Mary Jones 1

122 Sarah Smith 2

DEPARTMENT

Dept_ID Dept_Name

1 Acct

2 Mktg

Unnormalized form (UNF)

A table that contains one or more repeating groups.

ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName

CR76 John

kay

PG4

PG16

6 lawrence

St,Glasgow

5 Novar Dr,

Glasgow

1-Jul-00

1-Sep-02

31-Aug-01

1-Sep-02

350

450

CO40

CO93

Tina

Murphy

Tony

Shaw

CR56 Aline

Stewart

PG4

PG36

PG16

6 lawrence

St,Glasgow

2 Manor Rd,

Glasgow

5 Novar Dr,

Glasgow

1-Sep-99

10-Oct-00

1-Nov-02

10-Jun-00

1-Dec-01

1-Aug-03

350

370

450

CO40

CO93

CO93

Tina

Murphy

Tony

Shaw

Tony

Shaw

Figure 3 ClientRental unnormalized table

Repeating group = (propertyNo, pAddress,

rentStart, rentFinish, rent, ownerNo, oName)

First Normal Form is a relation in which the intersection of each

row and column contains one and only one value.

There are two approaches to removing repeating groups from

unnormalized tables:

1. Removes the repeating groups by entering appropriate

data in the empty columns of rows containing the

repeating data.

2. Removes the repeating group by placing the repeating

data, along with a copy of the original key attribute(s), in

a separate relation. A primary key is identified for the

new relation.

With the first approach, we remove the repeating group

(property rented details) by entering the appropriate client

data into each row.

The ClientRental relation is defined as follows, ClientRental ( clientNo, propertyNo, cName, pAddress, rentStart,

rentFinish, rent, ownerNo, oName)

ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName

CR76 PG4 John

Kay

6 lawrence

St,Glasgow 1-Jul-00 31-Aug-01 350 CO40

Tina

Murphy

CR76 PG16 John

Kay

5 Novar Dr,

Glasgow 1-Sep-02 1-Sep-02 450 CO93

Tony

Shaw

CR56 PG4 Aline

Stewart

6 lawrence

St,Glasgow 1-Sep-99 10-Jun-00 350 CO40

Tina

Murphy

CR56 PG36 Aline

Stewart

2 Manor Rd,

Glasgow 10-Oct-00 1-Dec-01 370 CO93

Tony

Shaw

CR56 PG16 Aline

Stewart

5 Novar Dr,

Glasgow 1-Nov-02 1-Aug-03 450 CO93

Tony

Shaw

Figure 4 1NF ClientRental relation with the first approach

Client (clientNo, cName)

PropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,

rentFinish, rent, ownerNo, oName)

With the second approach, we remove the repeating group

(property rented details) by placing the repeating data along with

a copy of the original key attribute (clientNo) in a separte relation. ClientNo cName

CR76 John Kay

CR56 Aline Stewart ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName

CR76 PG4 6 lawrence

St,Glasgow 1-Jul-00 31-Aug-01 350 CO40

Tina

Murphy

CR76 PG16 5 Novar Dr,

Glasgow 1-Sep-02 1-Sep-02 450 CO93

Tony

Shaw

CR56 PG4 6 lawrence

St,Glasgow 1-Sep-99 10-Jun-00 350 CO40

Tina

Murphy

CR56 PG36 2 Manor Rd,

Glasgow 10-Oct-00 1-Dec-01 370 CO93

Tony

Shaw

CR56 PG16 5 Novar Dr,

Glasgow 1-Nov-02 1-Aug-03 450 CO93

Tony

Shaw

Figure 5 1NF ClientRental relation with the second approach

Full functional dependency indicates that if A and B

are

attributes of a relation, B is fully functionally

dependent on A if B is functionally dependent on A,

but not on any proper subset of A.

A functional dependency AB is partially dependent if

there is some attributes that can be removed from A and

the dependency still holds.

Second normal form (2NF) is a relation that is in first

normal form and every non-primary-key attribute is

fully functionally dependent on the primary key.

The normalization of 1NF relations to 2NF involves

the

removal of partial dependencies. If a partial

dependency exists, we remove the function

dependent attributes from

the relation by placing them in a new relation along

with

a copy of their determinant.

The ClientRental relation has the following functional

dependencies:

fd1 clientNo, propertyNo rentStart, rentFinish (Primary Key)

fd2 clientNo cName (Partial

dependency)

fd3 propertyNo pAddress, rent, ownerNo, oName (Partial

dependency)

fd4 ownerNo oName (Transitive Dependency)

fd5 clientNo, rentStart propertyNo, pAddress,

rentFinish, rent, ownerNo, oName (Candidate key)

fd6 propertyNo, rentStart clientNo, cName, rentFinish (Candidate key)


Rental (clientNo, propertyNo, rentStart, rentFinish)

PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)

After removing the partial dependencies, the creation of the three

new relations called Client, Rental, and PropertyOwner

Client

ClientNo cName

CR76 John Kay

CR56 Aline Stewart

Rental

ClientNo propertyNo rentStart rentFinish

CR76 PG4 1-Jul-00 31-Aug-01

CR76 PG16 1-Sep-02 1-Sep-02

CR56 PG4 1-Sep-99 10-Jun-00

CR56 PG36 10-Oct-00 1-Dec-01

CR56 PG16 1-Nov-02 1-Aug-03 Client (clientNo, cName)


PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName) propertyNo pAddress rent ownerNo oName

PG4 6 lawrence St,Glasgow 350 CO40 Tina Murphy

PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw

PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw

Figure 6 2NF ClientRental relation

Transitive dependency

A condition where A, B, and C are attributes of a relation such that

if A B and B C, then C is transitively dependent on A via B

(provided that A is not functionally dependent on B or C).

Third normal form (3NF)

A relation that is in first and second normal form, and in

which

no non-primary-key attribute is transitively dependent on

the

primary key.

The normalization of 2NF relations to 3NF involves the

removal of transitive dependencies by placing the

attribute(s) in a new relation along with a copy of the

determinant.

The functional dependencies for the Client, Rental and

PropertyOwner relations are as follows:

Client

fd2 clientNo cName

(Primary Key)

Rental

fd1 clientNo, propertyNo rentStart, rentFinish (Primary Key)

fd5 clientNo, rentStart propertyNo, rentFinish (Candidate

key)

fd6 propertyNo, rentStart clientNo, rentFinish (Candidate

key)

PropertyOwner

fd3 propertyNo pAddress, rent, ownerNo, oName

(Primary Key)

fd4 ownerNo oName (Transitive

Dependency)

The resulting 3NF relations have the forms:



PropertyOwner (propertyNo, pAddress, rent, ownerNo)

Owner (ownerNo, oName)

Client

ClientNo cName

CR76 John Kay

CR56 Aline Stewart

Rental

ClientNo propertyNo rentStart rentFinish

CR76 PG4 1-Jul-00 31-Aug-01

CR76 PG16 1-Sep-02 1-Sep-02

CR56 PG4 1-Sep-99 10-Jun-00

CR56 PG36 10-Oct-00 1-Dec-01

CR56 PG16 1-Nov-02 1-Aug-03

PropertyOwner

propertyNo pAddress rent ownerNo

PG4 6 lawrence St,Glasgow 350 CO40

PG16 5 Novar Dr, Glasgow 450 CO93

PG36 2 Manor Rd, Glasgow 370 CO93

Owner

ownerNo oName

CO40 Tina Murphy

CO93 Tony Shaw

Figure 7 2NF ClientRental relation

Boyce-Codd normal form (BCNF)

A relation is in BCNF, if and only if, every determinant

is a

candidate key.

The difference between 3NF and BCNF is that for a

functional

dependency A B, 3NF allows this dependency in a

relation

if B is a primary-key attribute and A is not a candidate

key,

whereas BCNF insists that for this dependency to

remain in a

relation, A must be a candidate key.

fd1 clientNo, interviewDate interviewTime, staffNo, roomNo (Primary

Key)

fd2 staffNo, interviewDate, interviewTime clientNo (Candidate key)

fd3 roomNo, interviewDate, interviewTime clientNo, staffNo

(Candidate key)

fd4 staffNo, interviewDate roomNo (not a candidate key)

As a consequece the ClientInterview relation may suffer from update anmalies.

For example, two tuples have to be updated if the roomNo need be changed for

staffNo SG5 on the 13-May-02. ClientInterview

ClientNo interviewDate interviewTime staffNo roomNo

CR76 13-May-02 10.30 SG5 G101

CR76 13-May-02 12.00 SG5 G101

CR74 13-May-02 12.00 SG37 G102

CR56 1-Jul-02 10.30 SG5 G102

Figure 8 ClientInterview relation

To transform the ClientInterview relation to BCNF, we must remove the violating

functional dependency by creating two new relations called Interview and SatffRoom

as shown below,

Interview (clientNo, interviewDate, interviewTime, staffNo)

StaffRoom(staffNo, interviewDate, roomNo)

Interview

ClientNo interviewDate interviewTime staffNo

CR76 13-May-02 10.30 SG5

CR76 13-May-02 12.00 SG5

CR74 13-May-02 12.00 SG37

CR56 1-Jul-02 10.30 SG5

StaffRoom

staffNo interviewDate roomNo

SG5 13-May-02 G101

SG37 13-May-02 G102

SG5 1-Jul-02 G102

Figure 9 BCNF Interview and StaffRoom relations

Multi-valued dependency (MVD)

represents a dependency between attributes (for example, A,

B and C) in a relation, such that for each value of A there is a

set of values for B and a set of value for C. However, the set of

values for B and C are independent of each other.

A multi-valued dependency can be further defined as

being

trivial or nontrivial. A MVD A > B in relation R is

defined as being trivial if

• B is a subset of A

or

• A U B = R

A MVD is defined as being nontrivial if neither of the above

two conditions is satisfied.

Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and

contains

no nontrivial multi-valued dependencies.

Fifth normal form (5NF)

A relation that has no join dependency.

Lossless-join dependency A property of decomposition, which ensures that no spurious

tuples are generated when relations are reunited through a

natural join operation.

Join dependency

Describes a type of dependency. For example, for a relation R

with subsets of the attributes of R denoted as A, B, …, Z, a

relation R satisfies a join dependency if, and only if, every legal

value of R is equal to the join of its projections on A, B, …, Z.

Atomicity requires that database modifications must follow an "all or nothing" rule.

Each transaction is said to be atomic. If one part of the transaction fails, the entire transaction fails and the database state is left unchanged.

To be compliant with the 'A', a system must guarantee the atomicity in each and every situation, including power failures / errors / crashes.

This guarantees that 'an incomplete transaction' cannot exist.

http://en.wikipedia.org/wiki/Atomicity_%28database_systems%29

The consistency property ensures that any transaction the database performs will take it from one consistent state to another.

Consistency states that only consistent (valid according to all the rules defined) data will be written to the database.

Quite simply, whatever rows will be affected by the transaction will remain consistent with each and every rule that is applied to them (including but not limited to: constraints, cascades, triggers).

http://en.wikipedia.org/wiki/Consistency_%28database_systems%29

While this is extremely simple and clear, it's worth noting that this consistency requirement applies to everything changed by the transaction, without any limit (including triggers firing other triggers launching cascades that eventually fire other triggers etc.) at all.

Isolation refers to the requirement that no transaction should be able to interfere with another transaction

In other words, it should not be possible that two transactions that affect the same rows run concurrently, as the outcome would be unpredicted and the system thus made unreliable at all.

http://en.wikipedia.org/wiki/Isolation_%28database_systems%29

In effect the only strict way to respect the isolation property is to use a serial model where no two transactions can occur on the same data at the same time and where the result is predictable (i.e. transaction B will happen after transaction A in every single possible case).

Durability means that once a transaction has been committed, it will remain so.

In other words, every committed transaction is protected against power loss/crash/errors and cannot be lost by the system and can thus be guaranteed to be completed.

In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently.

If the database crashes right after a group of SQL statements execute, it should be possible to restore the database state to the point after the last transaction committed.

http://en.wikipedia.org/wiki/Durability_%28computer_science%29

The transaction subtracts 10 from A and adds 10 to B.

If it succeeds, it would be valid, because the data continues to satisfy the constraint.

However, assume that after removing 10 from A, the transaction is unable to modify B.

If the database retains A's new value, atomicity and the constraint would both be violated. Atomicity requires that both parts of this transaction complete or neither.

Consistency is a very general term that demands the data meets all validation rules.

Also, it may be implied that both A and B must be integers.

A valid range for A and B may also be implied. All validation rules must be checked to ensure consistency.

Assume that a transaction attempts to subtract 10 from A without altering B.

Because consistency is checked after each transaction, it is known that A + B = 100 before the transaction begins.

If the transaction removes 10 from A successfully, atomicity will be achieved.

However, a validation check will show that A + B = 90.

That is not consistent according to the rules of the database.

The entire transaction must be cancelled and the affected rows rolled back to their pre-transaction state.