Top Banner
Review of INFO 605 • In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail
58

Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Dec 14, 2015

Download

Documents

Jakayla Tony
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Review of INFO 605

• In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail

Page 2: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Review of INFO 605

• Database design using ER model

• Relational model theory

• Database manipulation using SQL in Oracle.

• Database application design and development

Page 3: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Why ER Model?

• ER modeling is relatively easy to learn and use• ERD shows a concise representation of the real-

world in terms of entities and relationships• We know how to translate ERDs into relational

schema• Most CASE tools support one or more variation of

ERDs.

Page 4: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

ER Model (Peter Chen 1976)

• Representation

• Constraints

• Operations

Page 5: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Representation• Entity and Relationship.• The three main concepts of ER modeling at a lower

level: Entity, Relationships, and Attributes. - Types of Entity: .. (Regular) entity: Has its own identifier .. Weak entity: Its identifier is the concatenation

of the identifier of owner entity and its partial key. - Types of Relationships: . By degree: ... Recursive relationship ... Binary relationships ... Ternary relationships

Page 6: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Constraints- Constraints of ER Models:

(1) Cardinality constraints: 1:1. 1:N, and M:N

(2) Participation constraints: TOTAL

(Mandatory) and PARTIAL (Optional)

- Entity constraints

-- The identifier of an entity cannot be null

-- Weak entity constraints

-- The concatenated identifier of the weak entity

-- Existence dependency

Page 7: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Operations

- What kinds of operations does the ER model inherently support?

- There have been many research proposals that automatically navigate the ERD to process queries.

- However, since we use the ERD as a high level design tool and translate the ERD into RDB, they are not important to our discussion.

Page 8: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Relational Theory (E.F. Codd in 1970)

(1) Representation - A single main concept in RDB is relation, that can be viewed

as a table having columns (attributes) and rows (tuples). - A high level logical structure of a database is called a

(database) schema. - A single table structure is called a scheme.(2) Constraints - The three basic constraints of RDB. (1) Primary Key constraints- Uniqueness and Minimality (2) Entity constraints: - The PK of a relation cannot be null. (3) Referential integrity constraints - The value of a Foreign

Key, if not null, must exist in the original relation.

•   

Page 9: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Cont’- Note that these basic integrity constraints can be automatically

enforced at DDL level by commercial database systems.- Some application-dependent constraints may or may not be

enforced at DDL level.(3) Operations - The operations of RDB are formalized by relational algebra,

and implemented in SQL. - The three basic operations that support the query processing

in RDB are: SELECT, PROJECT, and JOIN. - Other set operations used for RDB query processing are:

UNION, INTERSECT, and MINUS. - Other variations include (LEFT, RIGHT, FULL) OUTER

JOIN and RECURSIVE JOIN OPERATION.

Page 10: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Characteristics of Next Generation Databases

In this part, we will briefly look at recent trends in database technology. Database systems which will come in next decade is referred to as Next Generation database systems.

Page 11: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Characteristics of Next Generation Databases

• Rich data model-- which means the new data models will have more data modeling components than ER or relational data model - Object-oriented

- Multimedia data - Choices for structures• Highly distributed - Heterogeneous environment, WWW - Self-installing, self-managed, highly

robust and automatic coordination

Page 12: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Cont.• Large storage and memory - Will have more RAM - Many commercial DB systems will have tera/pera bytes or more• Component DBMS, DB applications may be built by

buying components as we buy HW components - Support portable DBMSs - Need to have public interfaces•  High-level environment - High level query languages and supporting tools•  Intelligent processing

Page 13: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Technologies for Next Generation Databases

• Traditional Database Technology: Extended RDBMS (Relational Technology, Semantic Data models, 4GL, CASE)

• Object-oriented Technology: OODBMS, ORDBMS, OOA&D, OOP (Rich data model, Natural representation, SW development, Integration, Productivity, Reusability, Component-based)

•  Knowledge Based Techniques: Expert DBMS (Inference, AI Technology, Tools for User Interface, Data Mining, and Knowledge Acquisition, etc.)

•  Hypermedia: Multimedia DBMS (User Interface, Multimedia data, GIS, Imaging DB, VOD (Video on demand))

•  Online Information Retrieval: (Text database, Information Retrieval, Intelligent Retrieval)

Page 14: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Cont’• Internet, Networking & Distributed systems (WWW interface,

internet/intranet, Heterogeneous, Resource-sharing, Robust and automatic coordination, Legacy systems, Client/Server)

•  Mass Storage (Optical disks, Scanning, Electronic publishing, Digital library, DIS)

•  Other trends - Standards (SQL3, OMG, ODMG, CORBA, DCOM...) - High-level environment (HLQL, supporting tools) - Component databases (public interface, interoperable, portable) - Larger memory - Parallelism - New applications (Data Warehousing, Electronic Commerce, Health-care systems, EOSDIS, ... )

 

Page 15: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Normalization- Basic concepts of normalization- Functional Dependency (FD) -- Definitions and Semantics -- FD as integrity constraints -- Armstrong's axioms -- Minimal cover- Lossless join and spurious tuples- Normal forms (1, 2, 3, 4, 5) -- Multi-valued dependencies & 4NF -- Join dependencies & 5NF- Practical ways to use normalization- Denormaliztion techniques

Page 16: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

What is the normalization?

A process to design a highly desirable relational schemas using relational theory

Page 17: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Why Normalization?

- Normal forms are guidelines for relational database design

-- Minimize redundancy

-- Avoid potential inconsistency

- Can predict the behavior (problems) of database systems

- Avoid update anomalies discussed below

Page 18: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

What if we don't normalize our DB schema?

• Your DB will have the following update anomalies.

• Insertion problem

• Deletion problem

• Update problem

Page 19: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

19

Two DB Schema

Page 20: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Hierarchy of normal forms

• The normal forms from less strict to more strict: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF

• We can directly decompose into BCNF or 3NF without going through 2NF.

• Note that BCNF (Boyce-Codd normal form) is a variation of 3NF. In most cases, 3NF and BCNF are the same and we will not discuss it in this course.

Page 21: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

FD (Functional Dependency)

• FD is a way of representing relationships among attributes in a relation.

•   Notation: X --> Y, where both X and Y can be a group of

attributes, X: LHS, Y: RHS

We say that 1. X uniquely determines Y, 2. For a given value of X, there is at most one value

of Y associated with X at a time.

Page 22: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Example

Suppose we have R(A, B, C, D, E, F) and data instances as follows:

A B C D E F

a1 b1 c1 d1 e1 f1

a1 b2 c1 d1 e2 f2

a2 b2 c2 d1 e2 f2

a3 b3 c1 d2 e3 f1

Page 23: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Which one is valid?

Based on relation above, which of the following FDs are valid?

(a) A -> C (b) C -> A

(c) B -> E (d) C -> D

(e) B -> F (f) BD -> E

(g) CD -> E (h) F -> B

Page 24: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

SELECTIVE ANSWERS:(a) A --> C: This is True since for each a1 value, we have the

same c1. Note that it doesn't matter that both a1 and a3 ends up with the same c1. This is similar to the fact that two different employees may have the same age.

(b) C --> A: This is false since c1 maps to both a1 and a3. The examples of (a) and (b) show that FDs are not symmetric. That is, the fact that A-->C is true doesn't mean C--> A is true.

(c) B --> E is True(d) C --> D is False.(f) BD--> E is True. Since we have two attributes in LHS, we

have to consider the pair of value together as a single of the LHS.

(g) CD -> E is False.

Page 25: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

FD

The FDs in a given relation are determined by semantics of the relation, not by data instances.

Page 26: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Example

TEACH (Teacher, Course, Text)  

Teacher Course Text

Smith DS Bartram

Smith DBMS Al-nour

Hall Compilers Hoffman

Brown DS Augen

Page 27: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Example- TEACH looks to satisfy TEXT --> COURSE since each text

ends up with different course.

- However, it don't semantically make sense to determine the course by the text book since two different courses could use the same book. SO, TEXT --> COURSE is False.

- However, instances can be used to disprove a FD

TEACHER -\-> COURSE

since two teacher (Smith) teaches two different courses.

- The correct FD of this relation is TEACHER+COURSE --> TEXT.

- What else can be disproved from the above data instances?

Page 28: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

FD

FD as an integrity constraint

Example WORK (EMP#, DEPT, LOC)

FDs of WORK are:

EMP# --> DEPT

DEPT --> LOC

Page 29: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Example

Suppose WORK table has the following three instances:

EMP# DEPT LOC

E1 D1 Market

E2 D2 Walnut

E3 D1 Market

Page 30: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Example

Which of the following are valid or invalid? and why? (Hints: check whether or not your insertion or update would violate any existing FD!)

INSERT UPDATE(1) E1 D1 Walnut(2) E1 D2 Walnut(3) E5 D3 Market

Page 31: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

SOLUTION for INSERTION(1) The first insertion is invalid since doing so

would violate the FD DEPT--> LOC.

(2) The second insertion is also invalid since doing so would violate the FD EMP#--> DEPT

(3) The third insertion is allowed since doing so does not violate any FD.

Page 32: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

SOLUTION FOR UPDATE

(1) This means changing Market to Walnut. This update would violate FD DEPT --> LOC.

(2) This means changing E2 of the 2nd tuple to E1. This update would violate FD EMP# --> DEPT

(3) The third update is INSERTION and valid

Page 33: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Example

BORROW (Loan#, Bname, Cname, Amount) and FDs:

Loan# --> Amount Loan# --> BnameWhat is the semantic difference between the

following two FDs? (1) Loan# --> Cname (2) Loan# -\-> Cname

Page 34: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Example

(1) means there is only one customer for each loan, which means a loan cannot be checked out by the husband and wife together, for example.

(2) means for each loan, they may be more than one customers

Page 35: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

How to find FDs?

- List only most direct FDs, not indirect FDs (e.g., SSN --> DLOC is an indirect FD)

- List only non-trivial FDs (e.g., SSN --> SSN is a trivial FD)

- Do not include redundant attributes in an FD in either LHS or RHS (e.g., SSN, ENAME --> ENAME, BDATE, ADDRESS has a redundant attribute in LHS (ENAME))

Page 36: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Example from Book:EMP_DEPT (ENAME, SSN, BDATE, ADDRESS, DNUMBER, DNAME, DMGRSSN, DLOC)

The valid FDs in this relation are:

(1) SSN --> ENAME, BDATE, ADDRESS, DNUMBER

(2) DNUMBER --> DNAME, DMGRSSN, DLOC

Page 37: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Transitive dependency (TD)

If A --> B and B --> C, then A --> C is called a TD.

Page 38: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Find a TD in the above EMP_DEPT

One TD is: SSN --> DNAME since SSN --> DNUMBER and DNUMBER --> DNAME.

Two other TDs are SSN --> DMGRSSN and SSN --> DLOC

Page 39: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Armstrong's axioms

• Theorem: Armstrong's axioms are sound and complete.

• We will not discuss the details of Armstrong's 6 axioms. The 6 axioms are used in manipulating FDs to remove any redundant FDs, redundant attributes in each FD, and finding candidate keys. Elmasri's book discuses them in detail in theoretical manners. Just note the meanings and the importance of the theorem summarized below.

•  

Page 40: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Cont.• By SOUND, we mean that any result derived by

applying the Armstrong's axiom is always correct.• By COMPLETE, we mean that Armstrong's axiom

can derive all the FDs that are necessary for computation of normalization.

For example, - We can find all candidate keys by using

Armstrong's axiom. - We can compute the minimal cover of relations

using Armstrong's axiom

Page 41: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Minimal cover F of the relation R

- A set of FDs F such that:

(a) There is no redundant attributes in any FDs

(b) There is no redundant FD in F

- The importance of the minimal cover is that we need to compute the minimal cover before we apply any normalization algorithms.

- Thus, our normalized schema will not have any redundant attributes and will not have any relations that come from redundant FDs.

Page 42: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Candidate key (CK) and FDs

The CK can determine all other attributes of the R(A, B, C, D, E, F).

Suppose we have two CKs, CK = {A, BD}

Then, A --> B, C, D, E. F

BD --> A, C, E, F

Page 43: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Algorithm for Finding a Key Once we find a minimal cover, we can find a key

using the following algorithm.• (1) Find attributes not appearing in the RHS of

any FDs. Then, these are part of any candidate keys.

• (2) Check whether they can determine all other attributes by using FDs.

• (3) If not, what other attributes do I need to add to determine all other attributes?

Page 44: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Examples

STORE (SNAME, ADDR, ZIP, ITEM, PRICE)

FDs: SNAME --> ADDR

ADDR --> ZIP

SNAME, ITEM --> PRICE

Page 45: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Finding a key:(1) SNAME does not appear in RHS, so SNAME must

be a part of the key.(2) since SNAME --> ADDR --> ZIP, we know

SNAME --> ADDR, ZIP(3) But SNAME alone cannot determine any more.

How can we determine ITEM and PRICE ? If we have ITEM, then we can determine PRICE So,

SNAME, ITEM --> SNAME, ADDR, ZIP, ITEM, PRICE

so it satisfies the definition of the key. 

Page 46: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Examples of Finding a Key for relation R (A, B, C, D)FDs Key

(a) A--> C B --> D C --> D (b) A -->B B --> C A --> D D --> A(c) A --> D D --> A C --> B

Page 47: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

ANSWER

• (a) {AB}

• (b) {A, D} Note that A and D are in 1:1 relationship since A --> D and D --> A.

• (c) {CA, CD} Note that A-->D and D --> A.

Page 48: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Lossless-Decomposition and Spurious Tuples

- Decomposition means dividing a table into multiple tables.

- Decomposition is lossless if it is possible to reconstruct R from decomposed relations using JOINs.

• Condition for Lossless Join when R was decomposed into R1, R2, ...., Rn

• R = R1¥ R2 ¥ R3 ¥ .... ¥ Rn, where ¥ means JOIN operation.

•  

Page 49: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Cont.

• Why need it ?To maintain the accurate database 

• What if not ?Cause wrong answers for queries

•  How to check ?It is sufficient if any Ri contains a candidate key of R

• when we used the normalization algorithms for 3NF/BCNF

Page 50: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Cont.

This means that if any of the decomposed relation contains a CK (or PK) of the original relation, then the decomposition is called lossless. This means by joining all the decomposed relations, we can reconstruct the original relation

Page 51: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Example

LOAN_ACC (L#, AMT, ACC#, BAL) L# --> AMT ACC# --> BAL Key ? L# + ACC#Possible decomposition: R1(L#, AMT) R2 (ACC#, BAL)The decomposition is not loss-less, since R1 or R2

does not have a candidate key. (Note that we cannot correlate L# and ACC#)

Page 52: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Example)WORK (EMP#, DEPT, LOC)

EMP# --> DEPTDEPT --> LOC

Key ?EMP#, since EMP# --> DEPT, LOC

DecompositionR1 (EMP#, DEPT) R2(DEPT, LOC)

The decomposition is lossless, since R1 contains a candidate key.

Page 53: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Spurious Tuples

• Spurious Tuples are those that appear in the result of lossy decomposition, but that do not exist in the original relation R.

•  

Page 54: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Example)

A B C

a1 b1 c1

a2 b2 c2

a3 b1 c1

a3 b2 c2

Page 55: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Cont

• Lossy decomposition Loss-less decomposition

R1 R2 R3 R4

A B A C A B B C

a1 b1 a1 c1 a1 b1 b1 c1

a2 b2 a2 c2 a2 b2 b2 c2

a3 b1 a3 c1 a3 b1

a3 b2 a3 c2 a3 b2

Page 56: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Perform the join between R1(A,B) ¥ R2(A,C):

A B C

a1 b1 c1

a2 b2 c2

a3 b1 c1

a3 b1 c2*

a3 b2 c1*

a3 b2 c2

Page 57: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Cont

The two tuples with * are spurious tuples that do not exist in the original relation R.

- Perform the join between R3(A,B)¥ R4(B,C): The result should be the same as the original R.

Page 58: Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.

Questions: Why does R1 JOIN R2 cause lossy decomposition and result in spurious tuples?

• Because the decomposition of R into R1 and R2 didn't follow the FDs.

• The FDs in R are: A --> B B --> C

• The decomposition that follows the FDs are lossless as shown in R3(A,B) and R4(B,C).

•  This means:

- When we normalize we decompose based on FDs, not randomly.

- After decomposition, one of decomposed relation Ri must contain a CK to be lossless.