By: Engineer Muhammad Suleman Memon M.E(Information Technology) B.E(Computer System)
Jan 27, 2015
By: Engineer Muhammad
Suleman Memon
M.E(Information Technology)
B.E(Computer System)
A database is a simple, yet flexible and powerful tool for storing and retrieving data.
Every company, every website, has lots of data.
The more of your data that you keep in your database - the better.
Far from being a tool only useful to big businesses, even if you just want a simple guest book or page hit counter, a database is perfect.
Whichever database you use - it'll be a relational database.
This is the industry standard design these days.
Relational databases use the principles of set theory.
Set theory is a field of mathematics that describes how to deal with sets of data.
Relational databases are quite intuitive and easy to understand.
All data is held in tables.
A table has columns (along the top) and rows.
You create the tables you need. You define the table names.
You define what the column names are in each table.
You define what type of data the columns are...
There are a number of different data types available which represent the different types of data you find in real life.
There are analogous types in all databases and programming languages. Each has variations, but they're all fundamentally the same.
They are:
• Numerical Types. i.e. Numbers. There are fundamentally two types: integer and float. Integers are whole numbers (i.e. 1, 2, 100, 999999). Floats are numbers with decimal places (i.e. (1.1, 22.5, 3.1415927).
• String Types. i.e. Text. There are two types here: Fixed length, and variable length. 'char' is the only fixed length type in MySQL - from 1-255 characters.
• 'varchar' is a variable length field that can be 1-255 characters. There are several
• 'text' types of varying lengths in MySQL.
Date and Time Types For storing dates & times.
Binary Data This is arbitrary data, could be images, programs absolutely anything.
All Relational Databases use indexes.
Similar to the index in a book, indexes provide a quick way to find the exact data item you want.
Imagine you have a database of 100,000 customers, and you want to find just one.
If you just read the 'customers' table from start to finish
until you find the one your searching for, you
could end up having to read all 100,000 records.
This would be very slow.
Most relational databases use a b-tree index structure.
This is a clever algorithm that guarantees that you can find a data item by reading at most 3 rows from the index.
Databases commonly have
millions of rows - so you can see the necessity for indexes!
Indexes are a large part of databases and their design.
Defining a column as the primary key implicitly creates an index.
f you have a primary key on a table - it has an index.
You can add a number of indexes to each table you have.
You'd use the create index command - more later...
Indexes are used automatically by the database itself when you issue a query (ask for data).
It uses the index to find the data in the table .
For example, we want to get a customer's details from the example 'customers' table
above...
If we submit the following SQL query, the database will use the index it created for primary key column 'customer_id', and get everything for customer 1:
select * from customers where customer_id = 1;
The database uses the index because it can use it.
The query contains the 'customer_id' so it can look in the index and find the location
of customer '1'.
If there's no index on the column in the query, the database will have to go through the whole table! This is called a full table scan .
These days, when you talk about databases in the wild, you are primarily talking about two types: analytical databases and operational databases.
Analytic Databases
Analytic databases (a.k.a. OLAP- On Line Analytical Processing) are primarily static, read-only databases which store archived, historical data used for analysis.
For example, a company might store sales records over the last ten years in an analytic database and use that database to analyze marketing strategies in relationship to demographics.
On the web, you will often see analytic databases in the form of inventory catalogs such as the one shown previously from Amazon.com.
An inventory catalog analytical database usually holds descriptive information about all available products in the inventory.
Web pages are generated dynamically by querying the list of available products in the inventory against some search parameters.
The dynamically-generated page will display the information about each item (such as title, author, ISBN) which is stored in the database.
Operational databases (a.k.a. OLTP On Line Transaction Processing), on the other hand, are used to manage more dynamic bits of data.
These types of databases allow you to do more than simply view archived data.
Operational databases allow you to modify that data (add, change or delete data).
These types of databases are usually used to track real-time information.
For example, a company might have an operational database used to track warehouse/stock quantities.
As customers order products from an online web store, an operational database can be used to keep track of how many items have been sold and when the company will need to reorder stock
Besides differentiating databases according to function, databases can also be differentiated according to how they model the data.
What is a data model? Well, essentially a data model is a
"description" of both a container for data and a methodology for storing and retrieving data from that container.
Actually, there isn't really a data model "thing".
Data models are abstractions, oftentimes mathematical algorithms and concepts.
You cannot really touch a data model. But nevertheless, they are very useful. The analysis and design of data models has
been the cornerstone of the evolution of databases.
As models have advanced so has database efficiency.
Before the 1980's, the two most commonly used Database Models were the hierarchical and network systems.
As its name implies, the Hierarchical Database Model defines hierarchically-arranged data.
Perhaps the most intuitive way to visualize this type of relationship is by visualizing an upside down tree of data.
In this tree, a single table acts as the "root" of the database from which other tables "branch" out.
You will be instantly familiar with this relationship because that is how all windows-based directory management systems (like Windows Explorer) work these days.
Relationships in such a system are thought of in terms of children and parents such that a child may only have one parent but a parent can have multiple children.
Parents and children are tied together by links called "pointers" (perhaps physical addresses inside the file system).
A parent will have a list of pointers to each of their children.
This child/parent rule assures that data is systematically accessible.
To get to a low-level table, you start at the root and work your way down through the tree until you reach your target.
Of course, as you might imagine, one problem with this system is that the user must know how the tree is structured in order to find anything!
The hierarchical model however, is much more efficient than the flat-file model we discussed earlier because there is not as much need for redundant data.
If a change in the data is necessary, the change might only need to be processed once. Consider the student flatfile database example from our discussion of what databases are:
Examples of hierarchical data represented as relational tables
An organization could store employee information in a table that contains attributes/columns such as employee number, first name, last name, and Department number.
The organization provides each employee with computer hardware as needed, but computer equipment may only be used by the employee to which it is assigned.
The organization could store the computer hardware information in a separate table that includes each part's serial number, type, and the employee that uses it.
In many ways, the Network Database model was designed to solve some of the more serious problems with the Hierarchical Database Model.
Specifically, the Network model solves the problem of data redundancy by representing relationships in terms of sets rather than hierarchy.
The model had its origins in the Conference on Data Systems Languages (CODASYL) which had created the Data Base Task Group to explore and design a method to replace the hierarchical model.
The network model is very similar to the hierarchical model actually.
In fact, the hierarchical model is a subset of the network model.
However, instead of using a single-parent tree hierarchy, the network model uses set theory to provide a tree-like hierarchy with the exception that child tables were allowed to have more than one parent.
his allowed the network model to support many-to-many relationships.
Visually, a Network Database looks like a hierarchical Database in that you can see it as a type of tree.
However, in the case of a Network Database, the look is more like several trees which share branches.
Thus, children can have multiple parents and parents can have multiple children.
(RDBMS - relational database management system) A database based on the relational model developed by E.F. Codd.
A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints.
In such a database the data and relations between them are organised in tables. A table is a collection of records and each record in a table contains the same fields.
Properties of Relational Tables:
Values Are Atomic
Each Row is Unique
Column Values Are of the Same Kind
The Sequence of Columns is Insignificant
The Sequence of Rows is Insignificant
Each Column Has a Unique Name
Certain fields may be designated as keys, which means that searches for specific values of that field will use indexing to speed them up.
Where fields in two different tables take values from the same set, a join operation can be performed to select related records in the two tables by matching values in those fields.
Often, but not always, the fields will have the same name in both tables.
For example, an "orders" table might contain (customer-ID, product-code) pairs and a "products" table might contain (product-code, price) pairs so to calculate a given customer's bill you would sum the prices of all products ordered by that customer by joining on the product-code fields of the two tables.
This can be extended to joining multiple tables on multiple fields.
Because these relationships are only specified at retreival time, relational databases are classed as dynamic database management system.
The RELATIONAL database model is based on the Relational Algebra.
Object/relational database management systems (ORDBMSs) add new object storage capabilities to the relational systems at the core of modern information systems.
These new facilities integrate management of traditional fielded data, complex objects such as time-series and geospatial data and diverse binary media such as audio, video, images, and applets.
By encapsulating methods with data structures, an ORDBMS server can execute comple x analytical and data manipulation operations to search and transform multimedia and other complex objects.
As an evolutionary technology, the object/relational (OR) approach has inherited the robust transaction- and performance-management features of it s relational ancestor and the flexibility of its object-oriented cousin.
database designers can work with familiar tabular structures and data definition languages (DDLs) while assimilating new object-management possibilities.
Query and procedural languages and call interfaces in ORDBMSs are familiar: SQL3, vendor procedural languages, and ODBC, JDBC, and proprie tary call interfaces are all extensions of RDBMS languages and interfaces.
And the leading vendors are, of course, quite well known: IBM, Inform ix, and Oracle.
Object DBMSs add database functionality to object programming languages.
They bring much more than persistent storage of programming language objects.
Object DBMSs extend the semantics of the C++, Smalltalk and Java object programming languages to provide full-featured database programming capability, while retaining native language compatibility.
A major benefit of this approach is the unification of the application and database development into a seamless data model and language environment.
As a result, applications require less code, use more natural data modeling, and code bases are easier to maintain.
Object developers can write complete database applications with a modest amount of additional effort.
According to Rao (1994), "The object-oriented database (OODB) paradigm is the combination of object-oriented programming language (OOPL) systems and persistent systems.
The power of the OODB comes from the seamless treatment of both persistent data, as found in databases, and transient data, as found in executing programs."
In contrast to a relational DBMS where a complex data structure must be flattened out to fit into tables or joined together from those tables to form the in-memory structure, object DBMSs have no performance overhead to store or retrieve a web or hierarchy of interrelated objects.
This one-to-one mapping of object programming language objects to database objects has two benefits over other storage approaches:
It provides higher performance management of objects, and it enables better management of the complex interrelationships between objects.
This makes object DBMSs better suited to support applications such as financial portfolio risk analysis systems, telecommunications service applications, world wide web document structures, design and manufacturing systems, and hospital patient record systems, which have complex relationships between data.
In semistructured data model, the information that is normally associated with a schema is contained within the data, which is sometimes called ``self-describing''.
In such database there is no clear separation between the data and the schema, and the degree to which it is structured depends on the application.
In some forms of semistructured data there is no separate schema, in others it exists but only places loose constraints on the data.
Semi-structured data is naturally modelled in terms of graphs which contain labels which give semantics to its underlying structure.
Such databases subsume the modelling power of recent extensions of flat relational databases, to nested databases which allow the nesting (or encapsulation) of entities, and to object databases which, in addition, allow cyclic references between objects.
The associative model divides the real-world things about which data is to be recorded into two sorts:
Entities are things that have discrete, independent existence.
An entity’s existence does not depend on any other thing.
Associations are things whose existence depends on one or more other things, such that if any of those things ceases to exist, then the thing itself ceases to exist or becomes meaningless.
An associative database comprises two data structures:
1. A set of items, each of which has a unique identifier, a name and a type.
2. A set of links, each of which has a unique identifier, together with the unique identifiers of three other things, that represent the source source, verb and target of a fact that is recorded about the source in the database. Each of the three things identified by the source, verb and target may be either a link or an item.
The best way to understand the rationale of EAV design is to understand row modeling (of which EAV is a generalized form).
Consider a supermarket database that must manage thousands of products and brands, many of which have a transitory existence.
Here, it is intuitively obvious that product names should not be hard-coded as names of columns in tables. Instead, one stores product descriptions in a Products table: purchases/sales of individual items are recorded in other tables as separate rows with a product ID referencing this table.
Conceptually an EAV design involves a single table with three columns, an entity (such as an olfactory receptor ID), an attribute (such as species, which is actually a pointer into the metadata table) and a value for the attribute (e.g., rat). In EAV design, one row stores a single fact.
In a conventional table that has one column per attribute, by contrast, one row stores a set of facts. EAV design is appropriate when the number of parameters that potentially apply to an entity is vastly more than those that actually apply to an individual entity.
The context data model combines features of all the above models.
It can be considered as a collection of object-oriented, network and semistructured models or as some kind of object database.
In other words this is a flexible model, you can use any type of database structure depending on task. Such data model has been implemented in DBMS ConteXt.
The fundamental unit of information storage of ConteXt is a CLASS.
Class contains METHODS and describes OBJECT.
The Object contains FIELDS and PROPERTY. The field may be composite, in this case the field contains SubFields etc.
The property is a set of fields that belongs to particular Object. (similar to AVL database). In other words, fields are permanent part of Object but Property is its variable part.
The header of Class contains the definition of the internal structure of the Object, which includes the description of each field, such as their type, length, attributes and name.
Context data model has a set of predefined types as well as user defined types.
The predefined types include not only character strings, texts and digits but also pointers (references) and aggregate types (structures).
A context model comprises three main data types: REGULAR, VIRTUAL and REFERENCE.
Database design is the process of producing a detailed data model of a database.
This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database.
A fully attributed data model contains detailed attributes for each entity.
The term database design can be used to describe many different parts of the design of an overall database system.
Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data.
In the relational model these are the tables and views.
Conceptual schema:
A conceptual schema or conceptual data model is a map of concepts and their relationships.
This describes the semantics of an organization and represents a series of assertions about its nature.
Specifically, it describes the things of significance to an organization (entity classes), about which it is inclined to collect information, and characteristics of (attributes) and associations between pairs of those things of significance (relationships).
Because a conceptual schema represents the semantics of an organization, and not a database design, it may exist on various levels of abstraction.
Conceptual data models take a more abstract perspective, identifying the fundamental things, of which the things an individual deals with are just examples.
The model does allow for what is called inheritance in object oriented terms.
A data structure diagram (DSD) is a data model or diagram used to describe conceptual data models by providing graphical notations which document entities and their relationships, and the constraints that binds them.
Once the relationships and dependencies amongst the various pieces of information have been determined, it is possible to arrange the data into a logical structure which can then be mapped into the storage objects supported by the database management system.
Ensuring, via normalisation procedures and the definition of integrity rules, that the stored database will be non-redundant and properly connected.
logical data structuring) is based on the identification of: the entities, their attributes, and the relationships between the entities.
Entity:
Something about which an enterprise needs to keep data.
Attributes:
The properties of an entity.
Relationships
The connections between entities.
An Entity may be physical
Example:
an Employee; a Part; a Machine
Or conceptual
Example:
a Project; an Order; a Course.
Each instance of an entity is different from all others - one or more attributes will typically form a 'primary key' attribute - unique to a particular instance.
Attributes are the properties of an entity .
Data which describes or is 'owned' by an entity. Attributes (data) equate to facts - specific details about entities - details of interest.
In the real world, objects do not exist in isolation.
Our understanding of real world objects is in terms of their relationships with other objects; for example, 'the earth circles the sun'; 'he is a carpenter' ; etc.
Any real world object which we are going to include in a data model as an entity type must have some relationship with at least one other entity within the model (even if we are not going to implement that relationship within our database system).
One-to-one:
Both tables can have only one record on either side of the relationship.
Each primary key value relates to only one (or no) record in the related table.
Most one-to-one relationships are forced by business rules and don't flow naturally from the data.
In the absence of such a rule, you can usually combine both tables into one table without breaking any normalization rules.
One-to-One Relationships Contd:
For example: a Factory may have many Managers during its lifetime; a Manager might be in charge of different Factories during his career.
One-to-many:
The primary key table contains only one record that relates to none, one, or many records in the related table.
This relationship is similar to the one between you and a parent.
You have only one mother, but your mother may have several children.
One-to-many Contd:
A formal description: of the relationship shown in the diagram above is:
One Factory may make zero or more Components.
One Component is made in one (and only one) Factory.
One-to-one: Contd:
What this means in a database system is that:
one record in a table called Factory may be related to a number of records in a Component table;
but
a record in the Component table can only be related to one record in the Factory table.
One-to-Many Relationships summarised:
For any occurrence of A, there may be 0, 1, or many, occurrences of B.
For any occurrence of B, there can only be one occurrence of A.
From another perspective:
If an 'A' record exists there may be zero or more related 'B' records. Any 'B' record can only be related to a single 'A' record.
Many-to-many:
Each record in both tables can relate to any number of records (or no records) in the other table.
For instance, if you have several siblings, so do your siblings (have many siblings).
Many-to-many relationships require a third table, known as an associate or linking table, because relational systems can't directly accommodate the relationship.
Many-to-many: Contd:
Minimally, a many-many relationship will require insertion of a 'link entity'.
Further analysis may show that the link entity has attributes of its own - often qualifiers in respect of quantity or time.
Many-to-many: Contd:
The physical design of the database specifies the physical configuration of the database on the storage media.
This includes detailed specification of data elements, data types, indexing options and other parameters residing in the DBMS data dictionary.
It is the detailed design of a system that includes modules & the database's hardware & software specifications of the system.
In the case of relational databases the storage objects are tables which store data in rows and columns.
• The purpose of normailization
• Data redundancy and Update Anomalies
• Functional Dependencies
• The Process of Normalization
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
Normalization is a technique for producing a
set of relations with desirable properties, given
the data requirements of an enterprise.
The process of normalization is a formal method
that identifies relations based on their primary or
candidate keys and the functional dependencies
among their attributes.
Relations that have redundant data may have
problems called update anomalies, which are
classified as ,
Insertion anomalies
Deletion anomalies
Modification anomalies
To insert a new staff with branchNo B007 into the
StaffBranch relation;
To delete a tuple that represents the last member of staff
located at a branch B007;
To change the address of branch B003.
StaffBranch
staffNo sName position salary branchNo bAddress
SL21 John White Manager 30000 B005 22 Deer Rd, London
SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow
SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow
SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen
SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow
SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London
Figure 1 StraffBranch relation
Staff
staffNo sName position salary branceNo
SL21 John White Manager 30000 B005
SG37 Ann Beech Assistant 12000 B003
SG14 David Ford Supervisor 18000 B003
SA9 Mary Howe Assistant 9000 B007
SG5 Susan Brand Manager 24000 B003
SL41 Julie Lee Assistant 9000 B005
Branch
branceNo bAddress
B005 22 Deer Rd, London
B007 16 Argyll St, Aberdeen
B003 163 Main St,Glasgow
Figure 2 Straff and Branch relations
Functional dependency describes the relationship between
attributes in a relation.
For example, if A and B are attributes of relation R, and B is
functionally dependent on A ( denoted A B), if each value of
A is associated with exactly one value of B. ( A and B may each
consist of one or more attributes.)
A B B is functionally
dependent on A
Determinant Refers to the attribute or group of attributes on
the left-hand side of the arrow of a functional
dependency
Trival functional dependency means that the right-hand
side is a subset ( not necessarily a proper subset) of the left-
hand side.
For example: (See Figure 1)
staffNo, sName sName
staffNo, sName staffNo
They do not provide any additional information about possible integrity
constraints on the values held by these attributes.
We are normally more interested in nontrivial dependencies because they
represent integrity constraints for the relation.
Main characteristics of functional dependencies in normalization • Have a one-to-one relationship between attribute(s) on
the left- and right- hand side of a dependency; • hold for all time; • are nontrivial.
Identifying the primary key
Functional dependency is a property of the meaning or
semantics of the attributes in a relation. When a
functional
dependency is present, the dependency is specified as a
constraint between the attributes.
An important integrity constraint to consider first is the
identification of candidate keys, one of which is
selected to
be the primary key for the relation using functional
dependency.
Inference Rules A set of all functional dependencies that are implied by a given
set of functional dependencies X is called closure of X, written
X+. A set of inference rule is needed to compute X+ from X.
Armstrong’s axioms
1. Relfexivity: If B is a subset of A, them A B
2. Augmentation: If A B, then A, C B
3. Transitivity: If A B and B C, then A C
4. Self-determination: A A
5. Decomposition: If A B,C then A B and A C
6. Union: If A B and A C, then A B,C
7. Composition: If A B and C D, then A,C B,
Minial Sets of Functional Dependencies
A set of functional dependencies X is minimal if it satisfies
the following condition:
• Every dependency in X has a single attribute on its
right-hand side
• We cannot replace any dependency A B in X with
dependency C B, where C is a proper subset of A, and
still have a set of dependencies that is equivalent to X.
• We cannot remove any dependency from X and still have a
set of dependencies that is equivalent to X.
Example of A Minial Sets of Functional
Dependencies
A set of functional dependencies for the StaffBranch relation
satisfies the three conditions for producing a minimal set.
staffNo sName
staffNo position
staffNo salary
staffNo branchNo
staffNo bAddress
branchNo bAddress
branchNo, position salary
bAddress, position salary
• Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part).
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
• Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key.
CUSTOMER
Cust_ID Name Order_ID
101 AT&T 1234
101 AT&T 156
125 Cisco 1250
Partial
Dependency
• Transitive Dependency – when a non-key attribute determines another non-key attribute.
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Transitive
Dependency
• Normalization is often executed as a series of steps.
Each step corresponds to a specific normal form that has
known properties.
• As normalization proceeds, the relations become
progressively more restricted in format, and also less
vulnerable to update anomalies.
• For the relational data model, it is important to recognize
thatit is only first normal form (1NF) that is critical in
creating relations. All the subsequent normal forms are
optional.
• Unnormalized – There are multivalued attributes or repeating groups
• 1 NF – No multivalued attributes or repeating groups.
• 2 NF – 1 NF plus no partial dependencies
• 3 NF – 2 NF plus no transitive dependencies
• ISBN Title
• ISBN Publisher
• Publisher Address
All attributes are directly
or indirectly determined
by the primary key;
therefore, the relation is
at least in 1 NF
BOOK
ISBN Title Publisher Address
• ISBN Title
• ISBN Publisher
• Publisher Address
The relation is at least in 1NF.
There is no COMPOSITE
primary key, therefore there
can’t be partial dependencies.
Therefore, the relation is at
least in 2NF
BOOK
ISBN Title Publisher Address
• ISBN Title
• ISBN Publisher
• Publisher Address
Publisher is a non-key attribute,
and it determines Address,
another non-key attribute.
Therefore, there is a transitive
dependency, which means that
the relation is NOT in 3 NF.
BOOK
ISBN Title Publisher Address
• ISBN Title
• ISBN Publisher
• Publisher Address
We know that the relation is at
least in 2NF, and it is not in 3
NF. Therefore, we conclude
that the relation is in 2NF.
BOOK
ISBN Title Publisher Address
• Option 2: Remove the entire repeating group from the relation. Create another relation which would contain all the attributes of the repeating group, plus the primary key from the first relation. In this new relation, the primary key from the original relation and the determinant of the repeating group will comprise a primary key. STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
STUDENT
Stud_ID Name
101 Lennon
125 Jonson
STUDENT_COURSE
Stud_ID Course Units
101 MSI 250 3
101 MSI 415 3
125 MSI 331 3
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Composite
Primary Key
• Goal: Remove Partial Dependencies
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Composite
Primary Key
Partial
Dependencies
• Remove attributes that are dependent from the part but not the whole of the primary key from the original relation. For each partial dependency, create a new relation, with the corresponding part of the primary key from the original as the primary key. STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
CUSTOMER
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
STUDENT_COURSE
Stud_ID Course_ID
101 MSI 250
101 MSI 415
125 MSI 331
STUDENT
Stud_ID Name
101 Lennon
101 Lennon
125 Johnson
COURSE
Course_ID Units
MSI 250 3.00
MSI 415 3.00
MSI 331 3.00
• Goal: Get rid of transitive dependencies.
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Transitive
Dependency
• Remove the attributes, which are dependent on a non-key attribute, from the original relation. For each transitive dependency, create a new relation with the non-key attribute which is a determinant in the transitive dependency as a primary key, and the dependent non-key attribute as a dependent.
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID
111 Mary Jones 1
122 Sarah Smith 2
DEPARTMENT
Dept_ID Dept_Name
1 Acct
2 Mktg
Unnormalized form (UNF)
A table that contains one or more repeating groups.
ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName
CR76 John
kay
PG4
PG16
6 lawrence
St,Glasgow
5 Novar Dr,
Glasgow
1-Jul-00
1-Sep-02
31-Aug-01
1-Sep-02
350
450
CO40
CO93
Tina
Murphy
Tony
Shaw
CR56 Aline
Stewart
PG4
PG36
PG16
6 lawrence
St,Glasgow
2 Manor Rd,
Glasgow
5 Novar Dr,
Glasgow
1-Sep-99
10-Oct-00
1-Nov-02
10-Jun-00
1-Dec-01
1-Aug-03
350
370
450
CO40
CO93
CO93
Tina
Murphy
Tony
Shaw
Tony
Shaw
Figure 3 ClientRental unnormalized table
Repeating group = (propertyNo, pAddress,
rentStart, rentFinish, rent, ownerNo, oName)
First Normal Form is a relation in which the intersection of each
row and column contains one and only one value.
There are two approaches to removing repeating groups from
unnormalized tables:
1. Removes the repeating groups by entering appropriate
data in the empty columns of rows containing the
repeating data.
2. Removes the repeating group by placing the repeating
data, along with a copy of the original key attribute(s), in
a separate relation. A primary key is identified for the
new relation.
With the first approach, we remove the repeating group
(property rented details) by entering the appropriate client
data into each row.
The ClientRental relation is defined as follows, ClientRental ( clientNo, propertyNo, cName, pAddress, rentStart,
rentFinish, rent, ownerNo, oName)
ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName
CR76 PG4 John
Kay
6 lawrence
St,Glasgow 1-Jul-00 31-Aug-01 350 CO40
Tina
Murphy
CR76 PG16 John
Kay
5 Novar Dr,
Glasgow 1-Sep-02 1-Sep-02 450 CO93
Tony
Shaw
CR56 PG4 Aline
Stewart
6 lawrence
St,Glasgow 1-Sep-99 10-Jun-00 350 CO40
Tina
Murphy
CR56 PG36 Aline
Stewart
2 Manor Rd,
Glasgow 10-Oct-00 1-Dec-01 370 CO93
Tony
Shaw
CR56 PG16 Aline
Stewart
5 Novar Dr,
Glasgow 1-Nov-02 1-Aug-03 450 CO93
Tony
Shaw
Figure 4 1NF ClientRental relation with the first approach
Client (clientNo, cName)
PropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,
rentFinish, rent, ownerNo, oName)
With the second approach, we remove the repeating group
(property rented details) by placing the repeating data along with
a copy of the original key attribute (clientNo) in a separte relation. ClientNo cName
CR76 John Kay
CR56 Aline Stewart ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName
CR76 PG4 6 lawrence
St,Glasgow 1-Jul-00 31-Aug-01 350 CO40
Tina
Murphy
CR76 PG16 5 Novar Dr,
Glasgow 1-Sep-02 1-Sep-02 450 CO93
Tony
Shaw
CR56 PG4 6 lawrence
St,Glasgow 1-Sep-99 10-Jun-00 350 CO40
Tina
Murphy
CR56 PG36 2 Manor Rd,
Glasgow 10-Oct-00 1-Dec-01 370 CO93
Tony
Shaw
CR56 PG16 5 Novar Dr,
Glasgow 1-Nov-02 1-Aug-03 450 CO93
Tony
Shaw
Figure 5 1NF ClientRental relation with the second approach
Full functional dependency indicates that if A and B
are
attributes of a relation, B is fully functionally
dependent on A if B is functionally dependent on A,
but not on any proper subset of A.
A functional dependency AB is partially dependent if
there is some attributes that can be removed from A and
the dependency still holds.
Second normal form (2NF) is a relation that is in first
normal form and every non-primary-key attribute is
fully functionally dependent on the primary key.
The normalization of 1NF relations to 2NF involves
the
removal of partial dependencies. If a partial
dependency exists, we remove the function
dependent attributes from
the relation by placing them in a new relation along
with
a copy of their determinant.
The ClientRental relation has the following functional
dependencies:
fd1 clientNo, propertyNo rentStart, rentFinish (Primary Key)
fd2 clientNo cName (Partial
dependency)
fd3 propertyNo pAddress, rent, ownerNo, oName (Partial
dependency)
fd4 ownerNo oName (Transitive Dependency)
fd5 clientNo, rentStart propertyNo, pAddress,
rentFinish, rent, ownerNo, oName (Candidate key)
fd6 propertyNo, rentStart clientNo, cName, rentFinish (Candidate key)
Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)
After removing the partial dependencies, the creation of the three
new relations called Client, Rental, and PropertyOwner
Client
ClientNo cName
CR76 John Kay
CR56 Aline Stewart
Rental
ClientNo propertyNo rentStart rentFinish
CR76 PG4 1-Jul-00 31-Aug-01
CR76 PG16 1-Sep-02 1-Sep-02
CR56 PG4 1-Sep-99 10-Jun-00
CR56 PG36 10-Oct-00 1-Dec-01
CR56 PG16 1-Nov-02 1-Aug-03 Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName) propertyNo pAddress rent ownerNo oName
PG4 6 lawrence St,Glasgow 350 CO40 Tina Murphy
PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw
PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw
Figure 6 2NF ClientRental relation
Transitive dependency
A condition where A, B, and C are attributes of a relation such that
if A B and B C, then C is transitively dependent on A via B
(provided that A is not functionally dependent on B or C).
Third normal form (3NF)
A relation that is in first and second normal form, and in
which
no non-primary-key attribute is transitively dependent on
the
primary key.
The normalization of 2NF relations to 3NF involves the
removal of transitive dependencies by placing the
attribute(s) in a new relation along with a copy of the
determinant.
The functional dependencies for the Client, Rental and
PropertyOwner relations are as follows:
Client
fd2 clientNo cName
(Primary Key)
Rental
fd1 clientNo, propertyNo rentStart, rentFinish (Primary Key)
fd5 clientNo, rentStart propertyNo, rentFinish (Candidate
key)
fd6 propertyNo, rentStart clientNo, rentFinish (Candidate
key)
PropertyOwner
fd3 propertyNo pAddress, rent, ownerNo, oName
(Primary Key)
fd4 ownerNo oName (Transitive
Dependency)
The resulting 3NF relations have the forms:
Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo)
Owner (ownerNo, oName)
Client
ClientNo cName
CR76 John Kay
CR56 Aline Stewart
Rental
ClientNo propertyNo rentStart rentFinish
CR76 PG4 1-Jul-00 31-Aug-01
CR76 PG16 1-Sep-02 1-Sep-02
CR56 PG4 1-Sep-99 10-Jun-00
CR56 PG36 10-Oct-00 1-Dec-01
CR56 PG16 1-Nov-02 1-Aug-03
PropertyOwner
propertyNo pAddress rent ownerNo
PG4 6 lawrence St,Glasgow 350 CO40
PG16 5 Novar Dr, Glasgow 450 CO93
PG36 2 Manor Rd, Glasgow 370 CO93
Owner
ownerNo oName
CO40 Tina Murphy
CO93 Tony Shaw
Figure 7 2NF ClientRental relation
Boyce-Codd normal form (BCNF)
A relation is in BCNF, if and only if, every determinant
is a
candidate key.
The difference between 3NF and BCNF is that for a
functional
dependency A B, 3NF allows this dependency in a
relation
if B is a primary-key attribute and A is not a candidate
key,
whereas BCNF insists that for this dependency to
remain in a
relation, A must be a candidate key.
fd1 clientNo, interviewDate interviewTime, staffNo, roomNo (Primary
Key)
fd2 staffNo, interviewDate, interviewTime clientNo (Candidate key)
fd3 roomNo, interviewDate, interviewTime clientNo, staffNo
(Candidate key)
fd4 staffNo, interviewDate roomNo (not a candidate key)
As a consequece the ClientInterview relation may suffer from update anmalies.
For example, two tuples have to be updated if the roomNo need be changed for
staffNo SG5 on the 13-May-02. ClientInterview
ClientNo interviewDate interviewTime staffNo roomNo
CR76 13-May-02 10.30 SG5 G101
CR76 13-May-02 12.00 SG5 G101
CR74 13-May-02 12.00 SG37 G102
CR56 1-Jul-02 10.30 SG5 G102
Figure 8 ClientInterview relation
To transform the ClientInterview relation to BCNF, we must remove the violating
functional dependency by creating two new relations called Interview and SatffRoom
as shown below,
Interview (clientNo, interviewDate, interviewTime, staffNo)
StaffRoom(staffNo, interviewDate, roomNo)
Interview
ClientNo interviewDate interviewTime staffNo
CR76 13-May-02 10.30 SG5
CR76 13-May-02 12.00 SG5
CR74 13-May-02 12.00 SG37
CR56 1-Jul-02 10.30 SG5
StaffRoom
staffNo interviewDate roomNo
SG5 13-May-02 G101
SG37 13-May-02 G102
SG5 1-Jul-02 G102
Figure 9 BCNF Interview and StaffRoom relations
Multi-valued dependency (MVD)
represents a dependency between attributes (for example, A,
B and C) in a relation, such that for each value of A there is a
set of values for B and a set of value for C. However, the set of
values for B and C are independent of each other.
A multi-valued dependency can be further defined as
being
trivial or nontrivial. A MVD A > B in relation R is
defined as being trivial if
• B is a subset of A
or
• A U B = R
A MVD is defined as being nontrivial if neither of the above
two conditions is satisfied.
Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and
contains
no nontrivial multi-valued dependencies.
Fifth normal form (5NF)
A relation that has no join dependency.
Lossless-join dependency A property of decomposition, which ensures that no spurious
tuples are generated when relations are reunited through a
natural join operation.
Join dependency
Describes a type of dependency. For example, for a relation R
with subsets of the attributes of R denoted as A, B, …, Z, a
relation R satisfies a join dependency if, and only if, every legal
value of R is equal to the join of its projections on A, B, …, Z.
Atomicity requires that database modifications must follow an "all or nothing" rule.
Each transaction is said to be atomic. If one part of the transaction fails, the entire transaction fails and the database state is left unchanged.
To be compliant with the 'A', a system must guarantee the atomicity in each and every situation, including power failures / errors / crashes.
This guarantees that 'an incomplete transaction' cannot exist.
The consistency property ensures that any transaction the database performs will take it from one consistent state to another.
Consistency states that only consistent (valid according to all the rules defined) data will be written to the database.
Quite simply, whatever rows will be affected by the transaction will remain consistent with each and every rule that is applied to them (including but not limited to: constraints, cascades, triggers).
While this is extremely simple and clear, it's worth noting that this consistency requirement applies to everything changed by the transaction, without any limit (including triggers firing other triggers launching cascades that eventually fire other triggers etc.) at all.
Isolation refers to the requirement that no transaction should be able to interfere with another transaction
In other words, it should not be possible that two transactions that affect the same rows run concurrently, as the outcome would be unpredicted and the system thus made unreliable at all.
In effect the only strict way to respect the isolation property is to use a serial model where no two transactions can occur on the same data at the same time and where the result is predictable (i.e. transaction B will happen after transaction A in every single possible case).
Durability means that once a transaction has been committed, it will remain so.
In other words, every committed transaction is protected against power loss/crash/errors and cannot be lost by the system and can thus be guaranteed to be completed.
In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently.
If the database crashes right after a group of SQL statements execute, it should be possible to restore the database state to the point after the last transaction committed.
The transaction subtracts 10 from A and adds 10 to B.
If it succeeds, it would be valid, because the data continues to satisfy the constraint.
However, assume that after removing 10 from A, the transaction is unable to modify B.
If the database retains A's new value, atomicity and the constraint would both be violated. Atomicity requires that both parts of this transaction complete or neither.
Consistency is a very general term that demands the data meets all validation rules.
Also, it may be implied that both A and B must be integers.
A valid range for A and B may also be implied. All validation rules must be checked to ensure consistency.
Assume that a transaction attempts to subtract 10 from A without altering B.
Because consistency is checked after each transaction, it is known that A + B = 100 before the transaction begins.
If the transaction removes 10 from A successfully, atomicity will be achieved.
However, a validation check will show that A + B = 90.
That is not consistent according to the rules of the database.
The entire transaction must be cancelled and the affected rows rolled back to their pre-transaction state.