Relational Database Systems 1 - TU Braunschweig Database Systems 1 •Functional Dependencies ... → {leader_id, battle_parole} , {leader_id} → ...
Post on 11-Mar-2018
222 Views
Preview:
Transcript
Wolf-Tilo Balke
Christoph LofiInstitut für Informationssysteme
Technische Universität Braunschweig
http://www.ifis.cs.tu-bs.de
Relational Database
Systems 1
• Functional Dependencies
– Definition
– Functional Closure
• Normalization
– 1-NF
– 2-NF
– 3-NF
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2
13. Normalization
• We have considered all stages during the life
cycle of a database in this lecture…
– Modeling and implementation of the model
– Querying and manipulating data
– Application programming with databases
– Setting constraints and enforcing
access control
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3
13. Introduction
• But… how do you know whether your design is
sensible?
– Trade-off: redundancy vs. data access speed
• General design: avoid redundancy wherever possible,
because redundancies often lead to inconsistent states
• Example for an exception: materialized views – expensive
to maintain, but boost read efficiency, if often used
– Normal forms can help!
• Functional dependencies measure the amount of
redundancy in the table design
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4
13. Introduction
• In this lecture, you learned the basics of how to
use relational databases
– How do I model data?
– How do I query data?
– What theories are behind queries?
– Ho do I use a DB in my application?
• But we did not tell you how all this stuff really
works!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 5
Relational Databases 2
• That‟s what we will do in Relational Databases 2
– What is the architecture of a DBMS?
– How do you store data on hard disks?
– How does an index work? Why is it so fast?
– How does the DBMS evaluate a query? How can the
evaluation be optimized?
– How are transactions and the ACID principles
enforced?
– What happens to your data if your computer
explodes?
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6
Relational Databases 2
• Data structures for indexes!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7
Relational Databases 2
• Query Optimization!
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8
Relational Databases 2
• Implementing Transactions!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9
Relational Databases 2
Scheduler
Storage Manager
Transaction Manager
• Relational Databases 2
– Coming to your lecture hall in Summer Semester
2009
• Featuring
– Learn to build a DB yourself!
– Discover all the secrets we
skipped this semester!
– See the wonders of tuning
and optimization!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10
Relational Databases 2
• Remember lecture 5: data in relational databases is represented using the relational model
– Relation R(A1:D1,…, An:Dn)• A1…An are attributes
• D1…Dn are domains of the attributes
• A relational database schema consists of
– A set of relation schemas
– A set of integrity constraints
• A relational database instance (or extension) is
– A set of tuples adhering to the respective schemas and respecting all integrity constraints
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11
13.1 Functional Dependency
• For the sake of the argument, let us assume that a database is represented by a single universal relation R(A1,…, An)
• Based on this relation, we can introduce the concept of functional dependencies
– Functional dependency is crucial to formally introduce data normalization
– In short, functional dependency canbe informally described as
• “If some attribute B is dependent on another attribute A, and two tuples have the same value for the attribute A, they also have the same value for attribute B”
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12
13.1 Functional Dependency
Definition:
• X and Y are subsets of the attributes of R
• There is functional dependency between Xand Y (denoted as X → Y), iff…
– … for any two tuples t1 and t2 within any instance of
R holds: If t1[X]=t2[X] then also t1[Y]=t2[Y]
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13
13.1 Functional Dependency
• If X → Y then
– Y is functionally depending of X
– X is called the head of the dependency, Y the body
– The values of the attributes in Y are determined by
those in X
• Note the following:
– If X represents a candidate key (i.e. value
combinations are unique within an instance) then
• X → Y for any Y
– X → Y does not imply Y → X
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14
13.1 Functional Dependency
• Functional dependencies are properties of the
semantics of attributes
– Semantics are given by the understanding of the
domain
– The designer is responsible for identifying those
semantics
• Functional dependencies further
restrict possible schema extensions
– All extensions respecting the functional dependencies
are called legal extensions
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15
13.1 Functional Dependency
• Let F be a set of functional dependencies on the
relation R
• Examples:
– A relation containing students
• Semantics: matrikelnummer is unique
• {matrikelnummer} → {firstName, lastName, birthdate}
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16
13.1 Functional Dependency
matrikelnummer firstName lastName birthdate
– A relation containing real names and aliases of heroes
• {alias} → {realName}, iff each hero has only one unique
alias
– A relation containing license plates and the type of the
respective car
• {areaCode, characterCode, numberCode} → {carType}
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 17
13.1 Functional Dependency
alias realName
areaCode characterCode numberCode carType
• However, often not all possible functional dependencies are explicitly modeled
• Example:
– Heroes have some unique id, a realName, and an alias
– Hero teams have a unique id and a unique team leaderidentified by its id, a hero is only leader of one team
– F = {{hero_id}→{realName, alias}, {team_id} →{leader_id}, {leader_id} →{hero_id}}
– However, also the following dependencies hold
• {hero_id} → {hero_id}, {team_id} → {realName}, {team_id} → {alias}, etc.
• Those dependencies can be inferred from F
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 18
13.1 Functional Dependency
primary key dependency
foreign keydependency
• Definition: The set F+ is called the closure of Fand contains all dependencies which can be
inferred from F
– A dependency X → Y is inferred from a set of relations
F, iff it holds for any legal extension of R
– If X → Y can be inferred from F, this is denoted as
F ⊨ X →Y
• Given a set of dependencies F, the full closure F+
can be inferred automatically by inference rules
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19
13.1 Functional Dependency
• Reflexive Rule (R1)
– ⊨{X} → {Y}, Y ⊆ X
• Augmentation Rule (R2)
– {{X} → {Y}} ⊨ {X,Z} → {Y,Z}
• Transitive Rule (R3)
– {{X} → {Y}, {Y} → {Z}} ⊨ {X} → {Z}
• It was shown that the rules R1-R3 are sound and complete
– W. W. Armstrong: “Dependency Structures of Data Base Relationships”. In: IFIP Congress, 1974
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 20
13.1 Functional Dependency
It‘s thatsimple!
• Projection Rule (R4)
– {{X} → {Y, Z}} ⊨ {X} → {Y}
• Union Rule (R5)
– {{X} → {Y}, {X} → {Z}} ⊨ {X} → {Y, Z}
• Pseudo-Transitive Rule (R6)
– {{X} → {Y}, {W, Y} → {Z}} ⊨ {W, X} → {Z}
• Rules R4-R6 can be concluded from R1-R3
– R1-R3 are axioms, R4-R6 are not
– They just allow for easier and faster inference
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21
13.1 Functional Dependency
• Usually, database designers specify all dependencies within F, which can be extracted easily from the semantics of the domain
– All other dependencies of the closure of F+ can be computed automatically
• The algorithm is based on computing the attributeclosure X+ of an attribute set X under F– X+ is the maximal set of attributes which is
depending on X, if all dependencies in F hold
– Use rules R1-R6 for this
– E.g., {X → X’} implies X’ ⊆ X+
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22
13.1 Functional Dependency
• Compute X+ under F
• Perform this algorithm for all X ⊆ {Head(F)}
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 23
13.1 Functional Dependency
X+ := X;repeat
oldX+ := X+ ;for each dependency Y → Z in F do
if (X+ ⊇ Y) then X+ := X+ ∪ Z;until (X+ == oldX+)
• Example:
– F = {{hero_id} → {real_name, alias}, {team_id} → {leader_id, battle_parole} , {leader_id} → {hero_id},{hero_id, team_id} → {join_date}}
– Attribute Closures
• {hero_id}+ = {hero_id, real_name, alias}
• {team_id}+ = {team_id, leader_id, battle_parole, hero_id, real_name, alias}
• {hero_id, team_id}+ = {hero_id, team_id, join_date, real_name, alias, leader_id, battle_parole}
• etc…
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 24
13.1 Functional Dependency
• To compute the dependency closure F+, create
dependencies for all subsets of the attribute closures
– ∀ X ∈ Head(F) ∀ Y ∈ ℘(X+) : create a dependency X → Y
– E.g., {hero_id, team_id}+ = {hero_id, team_id, join_date, real_name, alias, leader_id, battle_parole}
• {heroId, teamId} → {heroId}
• {heroId, teamId} → {teamId}
• …
• {heroId, teamId} →{heroId, teamId, join_date}
• …
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25
13.1 Functional Dependency
• Definition:
– A set of functional dependencies F covers a set E, iff
E ⊆ F+
• Or: ∀ d ∈ E : F ⊨ d
• Definition:
– Two sets of functional dependencies E and F are
equivalent if E + = F+
• Or: E covers F and F covers E
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26
13.1 Functional Dependency
• Now, we can define the minimal cover
– Informally: F minimally covers E, iff… • F covers E
• If F would not cover E anymore if any dependency of F is removed or weakened
– Formally: F minimally covers E, iff… • F covers E
• Every dependency in F has a single attribute as its body
• No dependency in F with X → A can be replaced by Y → A with Y⊂ X such that the resulting dependency set is still equivalent to F
• No dependency in F can be removed such that the resulting dependency set is still equivalent to F
– Thus, F is in canonical form and without redundancies
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27
13.1 Functional Dependency
• Since ancient times, people dream of intelligent machines– Golden robots of Hephaestus
– Archytas‟ wooden pigeon (400 BC)
– Leonardo daVinci‟s mechanical knight (1495)
– The Turk of Wolfgang von Kempelen (1770)
– …
• In computer science, this gave birth to the field of Artificial Intelligence
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28
Knowledge-based Systems and
Deductive DB
• In the initial phase of A.I. research, people were highly motivated and full of visions– High amount of research money available,
mainly from the military (DARPA)
• In the mid seventies, the great visions died… – A long series of failures took
its toll
– The A.I. winter – funding stopped
• Change of research direction– Do not imitate the full human brain, but find intelligent
algorithms for solving particular difficult problems
– Today the basic ideas are part of the Semantic Web efforts
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 29
Knowledge-based Systems and
Deductive DB
• Main critique – Hubert Dreyfus (UC Berkeley, USA)
– Expertise cannot readily be extracted
from human experts
– Much knowledge is not explicit, but
somehow embodied
• The brain is not simply hardware running a
program based on discrete symbolic
calculations
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 30
Knowledge-based Systems and
Deductive DB
• In the 1980ies, A.I. focused on well-defined problem domains building first commercially successful systems
– Knowledge-based systems or ‘expert systems’
• Idea: Create a system which can draw conclusions and thus support people in difficult decisions
– Simulate a human expert
– Main idea: extract knowledge of experts and just cheaply copy it to all places you might need it
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 31
Knowledge-based Systems and
Deductive DB
• Expert Systems were supposed to be
especially useful in
– Medical diagnosis
• Great failure up to now
– Production and machine failure diagnosis
• Works quite well
– Financial services
• Widely used and successful
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32
Knowledge-based Systems and
Deductive DB
• Usually this is based on interference rules and specific problem data– Rule: All frogs are green
– Fact: Hektor is a frog
– Implies new fact: Hektor is green
• Also, uncertainly can be supported– Rule: Almost all birds can fly except ostriches, chicken
and penguins
– Fact: Tweety is a bird
– Query: Can Tweety fly?• Only few species are ostrichs, chicken or penguins
• Tweety can fly with high probability
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33
Knowledge-based Systems and
Deductive DB
• MYCIN
– Developed 1970 at Stanford University, USA
– Medical expert system for treating infections
• Diagnosis of infection types and recommended antibiotics
(antibiotics names usually end with ~mycin)
– Around 600 rules (also supporting uncertainty)
– MYCIN was treated as a success by the project team
• Experiments showed good results, especially with rare infections
– … but was never used in practice
• Too clumsy
• Technological constraints
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34
Knowledge-based Systems and
Deductive DB
• NASA Shine– Spacecraft Health Inference Engine
– Development started in mid 70s by NASA and JPL (Jet Prolusion Lab) for the Deep Space Network• Commercially used by ViaSpace
– Multi-purpose inference system
– Detects system failures within complex mission critical machineries
– Designed to run in real-time in embedded and distributed systems
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 35
Knowledge-based Systems and
Deductive DB
• Knowledge-based Systems and Deductive Databases
– Coming to your lecture hall in Summer Semester 2009
• Featuring
– Fun with logics
– Really clever systems
– Databases which can cureinfections, repair spacecraftsand drill for oil
– And of course the Semantic Web
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 36
Knowledge-based Systems and
Deductive DB
• Functional dependencies may be used to further
specify semantic properties of a relational
schema
• We assume that a schema is given by
– Some relations, their attributes and domains
– For each relation, there is a primary key
– For each relation, there is a set
of functional dependencies
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37
13.2 Normal Forms
• Schemas can be classified to adhere to a certain
normal form
• Part of a schema design process is to choose a
desired normal form and convert the schema into
that form
– There are 5 normal forms (1-NF to 5-NF)
• The higher the number, the stricter the properties
– Schemas which do not follow any of the normal forms
may show very anomalous behavior
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38
13.2 Normal Forms
• The normalization process was first introduced
by E. Codd in 1972
– Tests whether a relational schema satisfies the
conditions of a given normal form
– If not, the schema can be modified such that the
condition is fulfilled
– Normalization increases the quality and stability of
the schema design
• Normal forms remove redundancy
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39
13.2 Normal Forms
• What problems may arise if you do not normalize?
– Example Scenario: A single table for storing heroes and
super teams
• Attention: This schema is not good!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40
13.2 Normal Forms
hero_id team_id hero_name team_name join_year
1 1 Thor The Avengers 1963
2 2 Mister Fantastic Fantastic Four 1961
3 1 Iron Man The Avengers 1963
4 1 Hulk The Avengers 1963
5 1 Captain America The Avengers 1964
6 2 Invisible Girl Fantastic Four 1961
• A normalized (good) schema would look like this:
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 41
13.2 Normal Forms
hero_id team_id join_year
1 1 1963
2 2 1961
3 1 1963
4 1 1963
5 1 1964
6 2 1961
hero_id hero_name
1 Thor
2 Mister Fantastic
3 Iron Man
4 Hulk
5 Captain America
6 Invisible Girl
team_id team_name
1 The Avengers
2 Fantastic Four
• Each entity type/ relationship type has its own relation
• In each relation, there are only dependencies from the key to non-key attributes
• In case of badly designed, non-normalized
schemas, several problems occur due to data
redundancy and superfluous dependencies
– Insertion Anomalies
– Deletion Anomalies
– Modification Anomalies
– Superfluous NULL-values
– Spurious Tuples
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 42
13.2 Normal Forms
• Insertion Anomalies
– Imagine you want to add a new hero without a
team
• In schema A, this is not easily possible as you have to make
up a key value for the team id and fill all team-related
attributes with NULL
• Within schema B, this is no problem at all
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43
13.2 Normal Forms
hero_id team_id hero_name team_name join_year
7 -1 Spiderman NULL NULL
hero_id hero_name
7 Spiderman
– You want to add a new hero to the “Fantastic Four”
• In schema A, you need to replicate the team name
attribute to avoid consistency problems
• In schema B, no replication is needed. Just add a new hero
and a new hero-team-assignment
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44
13.2 Normal Forms
hero_id team_id hero_name team_name join_year
8 2 The Thing Fantastic Four 1961
hero_id hero_name
8 The Thing
hero_id team_id join_year
8 2 1961
• Deletion Anomalies
– Similar to insert anomalies
– What happens to heroes if you delete a team?
What happens if you delete the last hero of a team?
• Remove them too?
• Introduce NULL values and fake primary keys?
• Modification Anomalies
– During modification, consistency has to be ensured
• e.g. if you change a team name, multiple tuples have to be
changed
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45
13.2 Normal Forms
• Spurious Tuples
– Spurious Tuples are the result of particularly poor
design
– Characteristics: Two tables have intersecting
attributes such that when applying a natural join,
invalid tuples are generated
• i.e. there are matching attributes which are not in a
foreign key - primary key combination
• Usually, a result of carelessly decomposing
larger relations
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46
13.2 Normal Forms
• Consider following (inapt) relations
• Performing a natural join results in
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 47
13.2 Normal Forms
hero_id team_id real_name team_name
1 1 Thor Odinson The Avengers
2 2 Reed Richards Fantastic Four
3 1 Tony Stark The Avengers
4 1 Bruce Banner The Avengers
5 1 Steve Rogers The Avengers
6 2 Susan Storm Fantastic Four
alias team_name
Iron Man The Avengers
Invisible Girl Fantastic Four
Thor The Avengers
Hulk The Avengers
Captain America The Avengers
Mister Fantastic Fantastic Four
HeroToTeamNames
hero_id team_id real_name team_name alias
1 1 Thor Odinson The Avengers Iron Man
2 2 Reed Richards Fantastic Four Invisible Girl
3 1 Tony Stark The Avengers Iron Man
4 1 Bruce Banner The Avengers Iron Man
… … … .. ..
Spurious Tuples
Same attribute but no foreign key
• These observations can be summarized in some
informal design guidelines
• Guideline 1:
– Design a schema such that it is easy to explain. If
possible, a relation should only represent one entity
type or relationship type
• Guideline 2:
– Design the schema such that there are no update,
delete, or insert anomalies
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 48
13.2 Normal Forms
• Guideline 3:
– Avoid attributes in base relations which regularly have
NULL values
• Guideline 4:
– Design relations in such a way that a natural join can
be applied without creating spurious tuples
• These informal guidelines can also be formalized
– This is what normalization does!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49
13.2 Normal Forms
• Database technology has eased the handling of relational data and provides efficient querying– Typical queries
• List the names of all bookstore with more than ten thousand titles
• List the names of the customers with highest sales in the year 2007
• But what about queries with a spatial dimension?– List all bookstores within ten
miles of Hannover
– List the average amounts for purchases of customers who live in Braunschweig and its adjoining area
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50
GIS
• A Geographical Information System (GIS) is
any information system capable of providing
geographically referenced information
– This includes integrating, editing, analyzing, sharing, and
displaying information
• For storing and querying the information a
specialized spatial database is used
– Highly optimized to store and query data related to
objects in space, including points, lines and polygons
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51
GIS
• In 1854, John Snow depicted a cholera outbreak in London
– Points on a map represented the locations of individual cases
– The study of the distribution of cholera led to the source of the disease, a contaminated water pump in the middle of the cholera outbreak
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 52
GIS
• Geo-Information Systems
– Coming to your lecture hall in Summer Semester
2009
• Featuring
– The art of creating and storing
a map
– Finding exotic stuff in places
you don‟t know
– Build a GPS for your car…
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 53
GIS
• To characterize the normal forms, we will need
the following concepts
– Functional dependencies
– Superkeys (non-minimal keys), candidate keys (any
key of a relation), primary keys, secondary keys
– Prime attributes
• Prime attributes are all those attributes which are within
any candidate key
– Nonprime attributes
• Are those attributes which are no prime attributes…
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 54
13.2 Normal Forms
• The earliest proposed normal forms are1-NF to 3-NF– 1972 by Codd
– They are hierarchical • A schema in 3-NF is also in 2-NF, a schema
in 2-NF is also in 1-NF
• This is just by convention, not due to their properties
– 1-NF• Removes multi-valued attributes
– 2-NF• Enforces full functional dependency
– 3-NF• Removes transitive dependencies
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55
13.3 1-NF to 3-NF
• First Normal Form (1-NF)
– Remember lecture 5
– Restricts relations to being “flat”
• Only atomic attributes are allowed
– Multi-values attributes must be normalized, e.g., by
A) Introducing a new relation for the multi-valued attribute
B) Replicating the tuple for each multi-value
C) If the maximum-number is known, introducing an own
attribute for each multi-value
• The first solution is usually considered the best
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56
13.3 1-NF
• Introducing a new relation for the multi-valued
attribute
– Uses old key and multi-attribute as composite key
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57
13.3 1-NF
hero_id hero_name powers
1 Storm weather control, flight
2 Wolverine extreme cellular regeneration
3 Phoenix omnipotence, indestructibility, limitless energy manipulation
hero_id power
1 weather control
1 flight
2 extreme cellular regeneration
3 omnipotence
3 indestructibility
3 limitless energy manipulation
hero_id hero_name
1 Storm
2 Wolverine
3 Phoenix
• Replicating the tuple for each multi-value
– Uses old key and multi-attribute as composite key
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58
13.3 1-NF
hero_id hero_name powers
1 Storm weather control, flight
2 Wolverine extreme cellular regeneration
3 Phoenix omnipotence, indestructibility, limitless energy manipulation
hero_id hero_name powers
1 Storm weather control
1 Storm flight
2 Wolverine extreme cellular regeneration
3 Phoenix omnipotence
3 Phoenix indestructibility
3 Phoenix limitless energy manipulation
• If the maximum-number is known, introducing an
own attribute for each multi-value
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 59
13.3 1-NF
hero_id hero_name powers
1 Storm weather control, flight
2 Wolverine extreme cellular regeneration
3 Phoenix omnipotence, indestructibility, limitless energy manipulation
hero_id hero_name power1 power2 power3
1 Storm weather control flight NULL
2 Wolverine cellular regeneration NULL NULL
3 Phoenix omnipotence indestructibility limitless energy manipulation
• The Second Normal Form (2-NF)
– The second normal is based on the concept of full
functional dependencies
– A dependency X→Y is full functional iff
• ∀ A ∈ X : (X \ {A}) ↛ Y
• i.e. there is no attribute in X such that the dependency still
holds after removing it
• A dependency which is not full functional is called a partial
dependency
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 60
13.3 2-NF
• Definition:A relation schema is in 2-NF if it is in 1-NF and every non-prime attribute is full functionally depending on the primary key
– Non-prime attributes are those which are not part of any candidate key
• If the relation has a non-composite primarykey and is in 1-NF, it is always also in 2-NF
– 2-NF is violated, if there is a composite key and any dependency between a non-prime attribute and a component of the primary key
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61
13.3 2-NF
• Normalization into 2-NF is archived by breakingthe relation into sub-relations– Sub-relations consist of the components of the
primary key and all their full functional dependent attributes
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 62
13.3 2-NF
hero_id team_id hero_name team_name join_year
1 1 Thor The Avengers 1963
2 2 Invisible Girl Fantastic Four 1961
3 1 Iron Man The Avengers 1963
hero_id team_id join_year
1 1 1963
2 2 1961
3 1 1963
hero_id hero_name
1 Thor
2 Mister Fantastic
3 Iron Man
team_id team_name
1 The Avengers
2 Fantastic Four
• The Third Normal Form (3-NF)– The third normal form removes all transitive
dependencies• A dependency X→Y is transitive if there is a set Z such that Z is
neither a candidate key nor a subset of any key and X→Z and Z→Y
– Definition:A relation schema is in 3-NF, if it is 2-NF and no nonprime attribute is transitively dependent on the primary key
– Alternative Definition:A relation schema R is in 3-NF if, whenever there is a non-trivial functional dependency X→A, then X is a superkey of R or A is a prime attribute of R
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 63
13.3 3-NF
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 64
13.3 3-NF
• Normalization works by breaking the relation at
the transitive dependency
hero_id hero_name home_city_id home_city home_city_flag_symbol
11 Professor X 563 New York Sea, Ships & Sun
12 Wolverine 782 Alberta Crops & Mountains
13 Cyclops 112 Anchorage Anchor
14 Phoenix 563 New York Sea, Ships & Sun
hero_id hero_name home_city_id
11 Professor X 563
12 Wolverine 782
13 Cyclops 112
14 Phoenix 563
home_city_id home_city home_city_flag_symbol
563 New York Sea, Ships & Sun
782 Alberta Crops & Mountains
112 Anchorage Anchor
• Also, there is a stricter version of the 3-NF
– Boyce-Codd Normal Form (BCNF) (1974)
• which was actually invented by Ian Heath 3 years before
Boyce-Codd…
– All BCNF schemas are also in 3-NF, and most 3-NF
schemas are also in BCNF
• There are some rare exceptions
– Definition:
A relation schema R is in BCNF if, whenever there is
a non-trivial functional dependency X→A,
then X is a superkey of R
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 65
13.3 BCNF
– Thus, the definition looks very similar to the
definition of 3-NF
• Difference: A is not allowed to be a prime attribute
– A schema in 3-NF is not BCNF if all of the following
conditions hold
• All candidate keys in the relation are composite keys
(that is, they are not single attributes)
• There is more than one candidate key in the relation
• The keys are not disjoint, that is, some attributes in the
keys are common
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 66
13.3 BCNF
• Example: Given a table with students, a topic, and the respective advisor
– Lets assume following dependencies hold:
• {student , topic} → {advisor}
• {advisor} → {topic}
• i.e. For each topic, a student has a specific advisor. Each advisor is responsible for a single specific topic.
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 67
13.3 BCNF
Student Topic Advisor
100 Math Gauss
100 Physics Einstein
101 Math Leibniz
102 Math Gauss
– Thus, we could have following candidate keys
• (student, topic), (student, advisor),
– The relation is in 3-NF, because there are no
transitive dependencies of non-prime attributes
• However, there are transitive dependencies involving prime
attributes: it is not in BCNF
• The table has still deletion anomalies
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 68
13.3 BCNF
Student Topic Advisor
100 Math Gauss
100 Physics Einstein
101 Math Leibniz
102 Math Gauss
If you delete this, all information about Leibniz doing math is lost
• Normalized solution: Decompose tables
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 69
13.3 BCNF
Student Advisor
100 Gauss
100 Einstein
101 Leibniz
102 Gauss
Advisor Topic
Gauss Math
Einstein Physics
Leibniz Math
• Summary 1-NF to BCNF
Normal Form Test Normalization
1-NF There must be no non-atomic attribute values
Create new relation for attribute OR use replication OR introduce new attributes
2-NF For composite primary keys, there must be no attributes depending on only a key component
Decompose relation and create a new one for each partial key and its depending attributes
3-NF There must be no transitive dependency between a no-key attribute and the key
Decompose and set up a relation that includes those non-key attributes which are depending on other non-key attributes
BCNF There must also be no transitive dependencies among attributes of different candidate keys.
Further decompose the relation.
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 70
13.3 1-NF to BCNF
• The fourth NF prohibits non-trivial multivalued
dependencies (or a multivalued dependency
depends on a superkey)
– There are no two (or more) attributes in a 1:n
relationship with the key
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 71
13.3 Higher Normal Forms
Student Advisor Fav. Color
100 Gauss green
100 Einstein green
101 Leibniz red
102 Gauss blue
102 Gauss red
Student Advisor
100 Gauss
100 Einstein
101 Leibniz
102 Gauss
Student Fav .Color
100 green
101 red
102 blue
102 red
4 NF
• The fifth NF simplifies relations such that the
original relation can be restored using projections
and joins
– Every additional split of relation would lose
information
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 72
13.3 Higher Normal Forms
Student Advisor Course
100 Gauss Math
100 Einstein Physics
101 Gauss Physics
Student Advisor
100 Gauss
100 Einstein
101 Gauss
Advisor Course
Gauss Math
Gauss Physics
Einstein Physics
Student Advisor Course
100 Gauss Math
100 Gauss Physics
100 Einstein Physics
101 Gauss Math
101 Gauss Physics
• Usually, a schema in a higher normal form is
better than one in a lower normal form
– However, sometimes it is a good idea to artificially
create lower-form schemas to, e.g. increase read
performance
• This is called denormalization
• Denormalization usually increases query speed and
decreases update efficiency due to the introduction of
redundancy
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 73
13.3 Denormalization
• Often, denormalization is facilitated with materialized views
– See lecture 9
– Example: Students and average exam results are regularly needed – create a materialized view!• Join and aggregation are expensive operations that now can be omitted
• But for every update, the materialized view may also require an update
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 74
13.3 Denormalization
matNr firstName lastName sex
1005 Clark Kent m
2832 Louise Lane f
matNr crsNr result
1005 100 3.7
2832 102 2.0
1005 101 4.0
2832 100 1.3student avg result
Louise Lane 1.65
Clark Kent 3.85
• For business-oriented data relational systems build a good foundation, but…
– There is a huge flood of updates in productive databases
– For data analysis purposes data often has to be transformed and aggregated
– Reports have to be generated quickly to support important decisions
• Should such stress be put on top of operational database systems’ workloads?
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 75
Data Warehousing
• Basic idea: Don‟t put stress on your crucial DBMSs, but use a second independent system
– Data Warehouses
• provide a unified view of business data and
• provide retrieval of data without slowing down the operational systems
• facilitate decision support system applications such as trend reports or market analysis
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 76
Data Warehousing
• A data warehouse provides a common data model for all data of interest regardless of the data's source
– Data is usually scattered over several systems in companies: sales invoices, order receipts, production data, etc.
– For reporting and analysis the data would have to be retrieved from each respective source, transformed into a common model and then aggregated
• Before loading into the warehouse all data can be cleaned
– Inconsistencies can be identified and resolved
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 77
Data Warehousing
• Global enterprise data models are optimized for efficient
retrieval needed for timely decision support
– The most simple schema is the star model
– The model consists of a (few) central fact tables that are are connected to
multiple dimensions
– All dimensions are denormalized with each dimension being represented
by a single table
• If dimensions are normalized into several related tables with minimized redundancy, the
snowflake model evolves
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 78
Data Warehousing
• Queries on data warehouses follow the paradigm
of online analytical processing (OLAP)
– Exploiting the multidimensional data model of
the warehouse allows for complex analytical and ad-
hoc queries with a rapid execution time
– The heart of any OLAP system is
an OLAP cube consisting of
numeric facts called measures
that are categorized by dimensions
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 79
Data Warehousing
• Data Warehousing
– Coming to your lecture hall in Summer Semester
2009
• Featuring
– New and exciting ways to store
your data!
– State enormous queries!
– Mine your Data in seconds!
– Learn stuff which is important
for industry!
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 80
Data Warehousing
• A) What is the difference between an UDF, trigger,
and a procedure? (3 P)
– Triggers are called automatically on an event
– UDFs may be used within an SQL statement
– Stored procedures represent full SQL statements or
external programs
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 81EN 3
Exercise 1 - A
• B) Could a trigger be replaced by a constraint or
vice versa? (4 P)
– A constraint can be replaced by a trigger, but
not all triggers can be replaced by a constraint
– Constraints can only reject actions, if a condition is
not met, while triggers might perform pretty much
any kind of action if an event is triggered
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 82EN 3
Exercise 1 - B
• C) What types of constraints are there? (4 P)
– Static integrity constraints
• Bound to a correct DB state (e.g., data types, key
constraints, value domains)
– Dynamic integrity constraints
• Transitional integrity constraints are bound to a
change in the DB state (e.g., update, insert, delete)
• Temporal integrity constraints are bound to a
sequence of DB states (e.g., transactions, periodical checks)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 83EN 3
Exercise 1 - C
• D) Briefly explain Event-Condition-Action. (3 P)
– In case of a specific event (e.g. DELETE on table
heroes), you check whether a specific condition is
met (nr of entries < 5). If the condition is met, you
perform an action (restrict statement and return an
error)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 84EN 3
Exercise 1 - D
• E) What is the base idea of SQL injection? Very briefly provide the techniques preventing those attacks. (5 P)
– The base idea is to provide to the DBMS SQL-statements as “user input” so that those statements are executed.
– To prevent those attacks, you should:
• Sanitize the input i.e. restrict the input to values you expect
• Quote and escape the input
• Use strong types (cast every input to its intended type)
• Use Prepared Statements (don‟t allow injection)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 85EN 3
Exercise 1 - E
• Provide a constraint such that it is not possible
that a given student participates in an exam of
one lecture more often than three times.
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 86
Exercise 2 - A
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 87
Exercise 2 - A
ALTER TABLE Results ADD CONSTRAINT noMoreThan3CHECK (
(SELECT max(counts) FROM(SELECT count(*) AS countsFROM Results GROUP BY matNr, crsNr )
) < 4 )
• In standard SQL, the solution would look like below– Unfortunately, this query is too complex for many DBMS
(like DB2 which just forbids subqueries in check-clauses)
– In that case, a trigger must be used to solve the problem…
• Write some triggers and table which perform an
audit trail on Result.
– i.e. log all changes to the data (updates, adds, deletes)
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 88
Exercise 2 - B
CREATE TABLE ResultAudit(date timestamp, type int,oldMat int, oldCrs int, oldRes double, newMat int, newCrs int, newRes double)
• Updates
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 89
Exercise 2 - B
CREATE TRIGGER updateResultsAFTER UPDATE ON ResultsREFERENCING NEW AS new OLD AS old FOR EACH ROWINSERT INTO ResultAudit VALUES
(current timestamp, 0old.matNr, old.crsNr, old.result, new.matNr, new.crsNr, new.result)
• Inserts
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 90
Exercise 2 - B
CREATE TRIGGER insertResultsAFTER INSERT ON ResultsREFERENCING NEW AS newFOR EACH ROWINSERT INTO ResultAudit VALUES
(current timestamp, 1null, null, nullnew.matNr, new.crsNr, new.result)
• Deletes
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 91
Exercise 2 - B
CREATE TRIGGER deleteResultsAFTER DELETE ON ResultsREFERENCING OLD AS oldFOR EACH ROWINSERT INTO ResultAudit VALUES
(current timestamp, 2old.matNr, old.crsNr, old.result, null, null, null)
• Write a Grant-statement providing full access
rights including the right to also grant access to a
user called „Magneto‟ for the table „students‟.
Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 92
Exercise 2 - C
GRANT ALL ON students TO USER ‘Magneto’ WITH GRANT OPTION
top related