Chapter VI – Logical Design and Relational Data Model 1 | Page Team Crescendo Logical Database Schema The relational model gives us a single way to represent data: as a two-dimensional table called a relation. 1. Schema Figure 1 The name of a relation and the set of attributes for a relation is called the schema for that relation. To show the schema of the relation, use the relation name followed by a parenthesized list of its attributes. Using figure 1 above, we can form the schema: Movies (Title, Year, Length, Film Type) The attributes in a relation schema are a set, not a list. The standard order of attributes must be followed when displaying the relation or any of its rows. 2. Tuples The rows of a relation, other than the header row containing the attribute names, are called tuples. A tuple has one component for each attribute. When we want to display the tuple alone, not as part of the relation, we use commas to separate the components, and use a parenthesis to surround the tuple. For example, we will use the first row of the given relation: (Star Wars, 1977, 124, color) We should always use the order in which the attributes were listed in the relation schema because the attributes are not displayed. 3. Domains The relational data model requires that each component of each tuple should be atomic, meaning that its values cannot be broken into smaller components. The components of any tuple of the relation must have, in each component, a value that belongs to the domain of the corresponding column. For example, tuples of the Movies relation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Chapter VI – Logical Design and Relational Data Model
1 | P a g e Team Crescendo
Logical Database Schema
The relational model gives us a single way to represent data: as a two-dimensional table
called a relation.
1. Schema
Figure 1
The name of a relation and the set of attributes for a relation is called the schema for that
relation. To show the schema of the relation, use the relation name followed by a parenthesized
list of its attributes. Using figure 1 above, we can form the schema:
Movies (Title, Year, Length, Film Type)
The attributes in a relation schema are a set, not a list. The standard order of attributes
must be followed when displaying the relation or any of its rows.
2. Tuples The rows of a relation, other than the header row containing the attribute names, are called
tuples. A tuple has one component for each attribute. When we want to display the tuple alone,
not as part of the relation, we use commas to separate the components, and use a parenthesis to
surround the tuple. For example, we will use the first row of the given relation:
(Star Wars, 1977, 124, color)
We should always use the order in which the attributes were listed in the relation schema
because the attributes are not displayed.
3. Domains
The relational data model requires that each component of each tuple should be atomic,
meaning that its values cannot be broken into smaller components.
The components of any tuple of the relation must have, in each component, a value that
belongs to the domain of the corresponding column. For example, tuples of the Movies relation
Chapter VI – Logical Design and Relational Data Model
2 | P a g e Team Crescendo
of Fig. 1 must have a first component that is a string, second and third components that are
integers, and a fourth component whose value is one of the constants color and blackAndWhite.
4. Relation Instances
A relation about movies is not static; rather, relations change over time. We expect that
these changes involve the tuples of the relation, such as adding new tuples, editing the
components of the tuples, and deleting the tuples.
A set of tuples is for a given relation is called an instance of that relation. For example, the
first three tuples in figure 1 form an instance of relation Movies.
Presumably, the relation Movies has changed over time and will continue to change over
time. For example, in 1980, Movies did not contain the tuples for Mighty Ducks or Wayne's World.
However, a conventional database system maintains only one version of any relation: the set of
tuples that are in the relation "now." This instance of the relation is called the current instance
Chapter VI – Logical Design and Relational Data Model
3 | P a g e Team Crescendo
Relational Data Model
Model
- a representation of ‘real world’ objects and events, and their associations.
- concentrates on the essential, inherent aspects of an organization and ignores the accidental
properties
Data Model
- an integrated collection of concepts for describing data, relationships between data, and
constraints on the data used by an organization
- attempts to represent the data requirements of the organization, or the part of the
organization that you wish to model.
- provides the basic concepts and notations that will allow database designers and end-users to
communicate their understanding of the organization data unambiguously and accurately
- consists of three components (1) structural part – set of rules that define how the database is
to be constructed (2) manipulative – defining the types of operations/transactions that are
allowed on the data (including operations used for updating or retrieving data and for changing
the structure of the database) (3) set of integrity rules – ensures that the data is accurate
- the purpose of a data model is to represent data and to make the data understandable
The relational data model is based on the mathematical concept of a relation which is physically
represented as a table. Codd, a trained mathematician, used terminology taken from
mathematics, principally set theory and predicate logic.
Relation – a table with columns and rows
A relational DBMS requires only that the database be perceived by the user as tables
Attribute – a named column of a relation
In a relational model, we use relations to hold information about the objects that we
want to represent in the database.
The rows of the table correspond to individual records and the columns correspond to
the attributes
Attributes can appear in any order and the relation will still be the same relation and
convey the same meaning
Domain – the set of allowable values for one or more attributes
Important feature of the relational model, every attribute in a relational database is
associated with a domain.
Domains may be distinct for each attribute, or two or more attributes may be associated
with the same domain.
Chapter VI – Logical Design and Relational Data Model
4 | P a g e Team Crescendo
Note that, at any given time, typically there will be values in a domain that don’t
currently appear as values in the corresponding attribute. In other words, a domain
describes possible values for an attribute.
Allows us to define the meaning and source of values that attributes can hold.
More information is available to the system and it can (theoretically) reject operations
that don’t make sense
Chapter VI – Logical Design and Relational Data Model
5 | P a g e Team Crescendo
Tuple – a record of a relation
The fundamental elements of a relation
Relational database – a collection of normalized tables
consists of tables that are appropriately structured
Properties of relational tables
the table has a name that is distinct from all other tables in the database
Each cell of the table contains exactly one value; tables don’t contain repeating
groups of data
A relational table that satisfies this property is said to be normalized (first normal
form)
Each column has a disctinct name
The values of a column are all from the same domain
The order of columns has no significance.
Each record is distinct; there are no duplicate records
The order of records has no significance, theoretically.
Relational keys
Each record in a table must be unique, therefore we must be able to identify a column
or combination of columns (relational keys) that provides uniquenes.
Superkey – a column or set of columns that uniquely identifies a record within a table
Chapter VI – Logical Design and Relational Data Model
6 | P a g e Team Crescendo
Candidate key – a superkey that contains only the minimum number of columns
necessary for unique identification
– has two properties: (1) Uniquenes (2) Irreducibility – no proper subset
of the candidate key has the uniqueness property
Primary key – the candidate key that is selected to identify records uniquely within the
table
Foreign key – a column or set of columns within one table that matches the candidate
key of some table (possibly the same table)
Representing Relational Databases
A relational database consists of one or more tables. The common convention for
representing a description of a relational database is to give the name of each table,
followed by the column names in parentheses. Normally, the primary key is underlined.
The description of the relational database for the StayHome video rental company is:
Relational Integrity
Since every column has an associated domain, there are constraints (called domain
constraints) in the form of restrictions on the set of values allowed for the columns
of tables.
There are two important integrity rules, which are constraints that apply to all
instances of the database.
Chapter VI – Logical Design and Relational Data Model
7 | P a g e Team Crescendo
1. Entity Integrity
2. Referential Integrity
Nulls
represent a value for a column that is currently unknown or is not applicable for
this record
A way to deal with incomplete or exceptional data
It is not the same as a zero numeric value or a text string filled with spaces, but a
null represents the absence of a value
Entity Integrity
- In a base table no column of a primary key can be null
- A base table is a named table whose records are physically stored in the
database.
Referential Integrity
- If a foreign key exists in a table, either the foreign key value must match a candidate
key value of some record in its home table or the foreign key must be wholly null.
Advantages:
1. Ease of use: The revision of any information as tables consisting of rows and columns
is much easier to understand .
2. Flexibility: Different tables from which information has to be linked and extracted can
be easily manipulated by operators such as project and join to give information in the
form in which it is desired.
3. Precision: The usage of relational algebra and relational calculus in the manipulation of
he relations between the tables ensures that there is no ambiguity, which may otherwise
arise in establishing the linkages in a complicated network type database.
4. Security: Security control and authorization can also be implemented more easily by
moving sensitive attributes in a given table into a separate relation with its own
authorization controls. If authorization requirement permits, a particular attribute could
be joined back with others to enable full information retrieval.
Chapter VI – Logical Design and Relational Data Model
8 | P a g e Team Crescendo
5. Data Independence: Data independence is achieved more easily with normalization
structure used in a relational database than in the more complicated tree or network
structure.
6. Data Manipulation Language: The possibility of responding to query by means of a
language based on relational algebra and relational calculus e.g SQL is easy in the
relational database approach. For data organized in other structure the query language
either becomes complex or extremely limited in its capabilities.
Disadvantages :
1. Performance: A major constraint and therefore disadvantage in the use of relational
database system is machine performance. If the number of tables between which
relationships to be established are large and the tables themselves effect the performance
in responding to the sql queries.
2. Physical Storage Consumption: With an interactive system, for example an operation
like join would depend upon the physical storage also. It is, therefore common in
relational databases to tune the databases and in such a case the physical data layout
would be chosen so as to give good performance in the most frequently run operations.
It therefore would naturally result in the fact that the lays frequently run operations would
tend to become even more shared.
3. Slow extraction of meaning from data: if the data is naturally organized in a hierarchical
manner and stored as such, the hierarchical approach may give quick meaning for that
data.
Chapter VI – Logical Design and Relational Data Model
9 | P a g e Team Crescendo
Concept of Normalization
Normalization of Database
Normalization is a systematic approach of decomposing tables to eliminate data redundancy and
undesirable characteristics like insertion, update and deletion Anomalies. It is a two step process
that puts data into tabular form by removing duplicated data from the relation tables.
Uses of Normalization
1. Eliminating redundant (useless) data
2. Ensuring data dependencies make sense i.e. data is logically stored
Without Normalization it becomes difficult to handle and update the database, without facing
data loss. Insertion, Updation and Deletion Anomalies are very frequent if Database is not
normalized. To understand these anomalies let us take an example of Student table.
Student table:
Updation Anomaly: To update address of a student who occurs twice or more than twice in a
table, we will have to update S_Address column in all the rows, else data will become inconsistent.
Insertion Anomaly: Suppose for a new admission, we have a student id(S_id), name and address
of a student but if student has not opted for any subjects yet then we have to insert NULL there,
leading to insertion anomaly.
Deletion Anomaly: If (S_id) 401 has only one subject and temporarily he drops it, when we delete
that row, entire student record will be deleted along with it.
Normalization Rule
Normalization rule are divided into following normal form.
1. First Normal Form
Chapter VI – Logical Design and Relational Data Model
10 | P a g e Team Crescendo
2. Second Normal Form
3. Third Normal Form
4. BCNF
First Normal Form (1NF)
A row of data cannot contain repeating group of data i.e each column must have a unique value.
Each row of data must have a unique identifier i.e Primary key. For example consider a table which
is not in First Normal form.
You can clearly see here that student name Adam is used twice in the table and subject math is
also repeated. This violates the First Normal form. To reduce above table to First Normal form
break the table into two different tables.
In Student table concentration of subject_id is the Primary key. Now both the Student table and
Subject table are normalized to first normal form.
Chapter VI – Logical Design and Relational Data Model
11 | P a g e Team Crescendo
Second Normal form (2NF)
A table to be normalized to Second Normal form should all meet the needs of First Normal
form and there must not be any partial dependency of any column on primary key. It means that
for a table that has concatenated primary key, each column in the table that is not part of the
primary key must depend upon the entire concatenated key for its existence. If any column
depends only on one part of the concatenated key, then the table fails Second Normal form. For
example, consider a table which is not in Second Normal form.
In customer table concatenation of Customer_id and Order_id is the primary key. This table is in
First Normal form but not in Second Normal form because there are partial dependencies of
columns on primary key. Customer_Name is only dependent on customer_id, Order_name is
dependent on Order_id and there is no link between sale_detail and Customer_name.
To reduce Customer table to Second Normal form break the table into following three different
tables.
Chapter VI – Logical Design and Relational Data Model
12 | P a g e Team Crescendo
Denormalization Databases intended for online transaction processing (OLTP) are typically more normalized
than databases intended for online analytical processing (OLAP). OLTP applications are
characterized by a high volume of small transactions such as updating a sales record at a
supermarket checkout counter. The expectation is that each transaction will leave the database in
a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly"
databases. Denormalization is also used to improve performance on smaller computers as in
computerized cash-registers and mobile devices, since these may use the data for look-up only
(e.g. price lookups). Denormalization may also be used when no RDBMS exists for a platform (such
as Palm), or no changes are to be made to the data and a swift response is crucial.
Some Good Reasons Not To Normalize
That said, there are some good reasons not to normalize your database. Let’s look at a few:
1. Joins are expensive. Normalizing your database often involves creating lots of tables. In fact,
you can easily wind up with what might seem like a simple query spanning five or ten tables.
If you’ve ever tried doing a five-table join, you know that it works in principle, but its
painstakingly slow in practice. If you’re building a web application that relies upon multiple-
join queries against large tables, you might find yourself thinking: “If only this database wasn’t
normalized!” When you hear that thought in your head, it’s a good time to consider
denormalizing. If you can stick all of the data used by that query into a single table without
really jeopardizing your data integrity, go for it! Be a rebel and denormalize your database.
You won’t look back!
2. Normalized design is difficult. If you’re working with a complex database schema, you’ll
probably find yourself banging your head against the table over the complexity of
normalization. As a simple rule of thumb, if you’ve been banging your head against the table
for an hour or two trying to figure out how to move to the fourth normal form, you might be
taking normalization too far. Step back and ask yourself if it’s really worth continuing.
3. Quick and dirty should be quick and dirty. If you’re just developing a prototype, just do
whatever works quickly. Really. It’s OK. Rapid application development is sometimes more
important than elegant design. Just remember to go back and take a careful look at your
design once you’re ready to move beyond the prototyping phase. The price you pay for a quick
and dirty database design is that you might need to throw it away and start over when it’s time