Relational Database Relational Database Design Design Bill Woolfolk Public Health Sciences University of Virginia woolfolk@virginia .edu
Dec 22, 2015
Relational Database DesignRelational Database Design
Bill WoolfolkPublic Health SciencesUniversity of [email protected]
ObjectivesObjectives
Understand definition of modern relational database
Understand and be able to apply a practical method for designing databases
Recognize and avoid common pitfalls of database design
What’s a database?What’s a database?A collection of logically-related
information stored in a consistent fashion◦ Phone book◦ Bank records (checking statements, etc)◦ Library card catalog◦ Soccer team roster
The storage format typically appears to users as some kind of tabular list (table, spreadsheet)
What Does a Database Do?What Does a Database Do?Stores information in a highly
organized mannerManipulates information in various
ways, some of which are not available in other applications or are easier to accomplish with a database
Models some real world process or activity through electronic means◦ Often called modeling a business process◦ Often replicates the process only in
appearance or end result
Databases and the Systems Databases and the Systems which manage themwhich manage themModern electronic databases are
created and managed through means of RDBMS: Relational DataBase Management Systems
An individual data storage structure created with an RDBMS is typically called a “database”
A database and its attendant views, reports, and procedures is called an “application”
Database ApplicationsDatabase ApplicationsDatabase (the actual DB with its
attendant storage structure)SQL Engine - interprets between
the database and the interface/application
Interface or application – the part the user gets to see and use
Relational DatabaseRelational DatabaseManagement SystemsManagement SystemsLow-end, proprietary, specific purpose
◦ Email: Outlook, Eudora, Mulberry◦ Bibliographic: Ref. Mgr., EndNote, ProCite
Mid-level◦ Microsoft Access, Lotus Approach, Borland’s
Paradox◦ More or less total control of design allows
custom buildsHigh-end
◦ Oracle, Microsoft SQL Server, Sybase, IBM DB2◦ Professional level DBs: Banks, e-commerce,
secure◦ Amazon.com, Ebay.com, Yahoo.com
Problems with Bad DesignProblems with Bad DesignEarly computers were slow and
had limited storage capacityRedundant or repeating data
slowed operations and took up too much precious storage space
Poor design increased chance of data errors, lost or orphaned information
Benefits of Good DesignBenefits of Good DesignComputers today are faster and
possess much larger storage devicesRigid structure of modern relational
databases helped codify problems and solutions
Design problems are still possible, because the DBMS software won’t protect you from poor practices
Good design still increases efficiency of data processes, reduces waste of storage, and helps eliminate data entry errors
Codd’s RulesCodd’s RulesEdgar F. Codd
◦ Mathematician and Researcher at IBM◦ Devised the relational data model in 1970◦ Published 12 rules in 1985 defining ideal
relational database, added 6 more in 1990
E. F. Codd: A Relational Model of Data for Large Shared Data Banks. CACM 13(6): 377-387 (1970)(http://www.acm.org/classics/nov95/toc.html)
Codd, E. (1985). "Is Your DBMS Really Relational?" and "Does Your DBMS Run By the Rules?" ComputerWorld, October 14 and October 21.
Modification AnomaliesModification Anomalies
A search for “General Tool Co.” would miss “General Tool” and “General Toll”. A case-sensitive search for
“Totally Toys” would miss “TOTALLY TOYS”
Customer OrderNum ItemNum Item
General Tool 07456 2246 Pentium Computer
General Toll 08622 3145 HP Printer
General Tool Co. 08622 3967 17” monitor
Totally Toys 06755 2246 Pentium computer
TOTALLY TOYS 08134 3145 Hewlett-Packard Printer
XYZ Inc. 09010 0446 Dot Matrix Printer
Customers_Orders_Inventory
Insertion AnomaliesInsertion Anomalies
How would you enter a new item into your inventory if no one had ordered it
yet?
Customer OrderNum ItemNum Item
General Tool 07456 2246 Pentium Computer
General Toll 08622 3145 HP Printer
General Tool Co. 08622 3967 17” monitor
Totally Toys 06755 2246 Pentium computer
TOTALLY TOYS 08134 3145 Hewlett-Packard Printer
XYZ Inc. 09010 0446 Dot Matrix Printer
Customers_Orders_Inventory
Deletion AnomaliesDeletion Anomalies
If you wanted to stop selling “dot matrix printer” and remove it from your inventory, you would have to delete the order and customer info for
“XYZ Inc.”
Customer OrderNum ItemNum Item
General Tool 07456 2246 Pentium Computer
General Toll 08622 3145 HP Printer
General Tool Co. 08622 3967 17” monitor
Totally Toys 06755 2246 Pentium computer
TOTALLY TOYS 08134 3145 Hewlett-Packard Printer
XYZ Inc. 09010 0446 Dot Matrix Printer
Customers_Orders_Inventory
The FixThe FixOrderNum ItemNum
06755 2246
07456 2246
08134 3145
08622 3145
08622 3967
09010 0446
CustomerNum
OrderNum
7822 09010
8755 06755
8755 08134
9123 07456
9123 08622
CustomerNum
Customer
7822 XYZ Inc.
8755 Totally Toys
9123 General Tool Co.
ItemNum Item
0446 Dot Matrix Printer
2246 Pentium Computer
3145 Hewlett-Packard printer
3967 17” monitor
Order_Items Orders
Customers
Products
The Design ProcessThe Design Process1) Identify the purpose of the database2) Review existing data3) Make a preliminary list of fields4) Make a preliminary list of tables and
enter fields5) Identify the key fields6) Draft the table relationships7) Enter sample data and normalize the
data/tables8) Review and finalize the design
Database ModelingDatabase ModelingRefers to various, more-or-less
formal methods for designing a database
Some provide precision steps and tools◦Ex.: Entity-Relationship (E-R) Modeling
Widely used, especially by high-end database designers who can’t afford to miss things
Fairly complex process Extremely precise
1. Identify purpose of the 1. Identify purpose of the DBDBClients can tell you what information
they want but have no idea what data they need.
“We need to keep track of inventory”“We need an order entry system”“I need monthly sales reports”“We need to provide our product catalog
on the Web”
Be sure to Limit the Scope of the database.
2. Review Existing Data2. Review Existing DataElectronic
◦Legacy database(s)◦Spreadsheets◦Web forms
Manual◦Paper forms◦Receipts and other printed output
3. Make Preliminary Field 3. Make Preliminary Field ListListMake sure fields exist to support
needs◦ Ex. if client wants monthly sales reports,
you need a date field for orders.◦ Ex. To group employees by division, you
need a division identifierMake sure values are atomic
◦ Ex. First and Last names stored separately◦ Ex. Addresses broken down to Street, City,
State, etc.Do not store values that can be
calculated from other values◦ Ex. “Age” can be calculated from “Date of
Birth”
4. Make Preliminary Tables4. Make Preliminary Tables(and insert the fields into them)(and insert the fields into them)
Each table holds info about one subjectDon’t worry about the quantity of tablesLook for logical groupings of informationUse a consistent naming convention
Naming ConventionsNaming ConventionsRules of thumb
◦ Table names must be unique in DB; should be plural
◦ Field names must be unique in the table(s)◦ Clearly identify table subject or field data◦ Be as brief as possible◦ Avoid abbreviations and acronyms◦ Use less than 30 characters, ◦ Use letters, numbers, underscores (_)◦ Do not use spaces or other special
characters
Naming Conventions Naming Conventions (cont’d)(cont’d)Leszynski Naming Convention
(LNC)◦Example: tblEmployees, qryPartNum◦tbl, qry = tag◦Employees, PartNum = basename
LNC at Microsoft Developers Network
5. Identify the Key Fields5. Identify the Key FieldsPrimary Key(s)
◦ Can never be Null; must hold unique values◦ Automatically indexed in most RDBMSs◦ Values rarely (if ever) change◦ Try to include as few fields as possible
Multi-field Primary Key◦ Combination of two or more fields that
uniquely identify an individual recordCandidate Key
◦ Field or fields that qualify as a primary key◦ Important in Third and Boyce-Codd Normal
Forms
6. Identify Table 6. Identify Table RelationshipsRelationshipsBased on business rules being modeled
Examples:◦“each customer can place many orders”
◦“all employees belong to a department”
◦“each TA is assigned to one course”
Relationship TerminologyRelationship TerminologyRelationship Type
◦One-to-one: expressed as 1:1◦One-to-Many: expressed as 1:N or 1:M or
1:∞◦Many-to-Many: expressed as N:N or M:M
Primary or Parent Table◦Table on the left side of 1:N relationship
Related or Child Table◦Table on the right side of 1:N relationship
Relational Schema◦Diagram of table relationships in
database
Relationship Terminology Relationship Terminology (cont’d)(cont’d)Join
◦ Definition of how related records are returned
Join Line◦ Visual relationship indicators in schema
Key fields◦ Primary Key: the linking field on the one
side of a 1:N relationship◦ Foreign Key: the primary key from one
table that is added to another table so the records can be related
◦ Non-Key Fields: any field that is not part of a primary key, multi-field primary key, or foreign key
One-to-One (1:1)One-to-One (1:1)Each record in Table A relates to
one, and only one, record in Table B, and vice versa.
Either table can be considered the Primary, or Parent Table
Can usually be combined into one table, although may not be most efficient design
One-to-Many (1:N)One-to-Many (1:N)Each record in Table A may relate to
zero, one or many records in Table B, but each record in Table B relates to only one record in Table A.
The potential relationship is what’s important: there might be no related records, or only one, but there could be many.
The table on the One (or left) side of a 1:N relationship is considered the Primary Table.
Many-to-Many (N:N)Many-to-Many (N:N)A record in Table A can relate to many
records in Table B, and a record in Table B can relate to many records in Table A.
Most RDBMSs do not support N:N relationships, requiring the use of a linking (or intersection or bridge) table that breaks the N:N relationship down into two 1:N relationships with the linking table being on the Many side of both new relationships.
Relational SchemaRelational Schema
Table 1
Field1_1
Field1_2
Field1_3
Field1_4
Table 2
Field2_1
Field1_1
Field2_2
Field2_3
1N
7. Normalization7. NormalizationNormal Forms (NF): design
standards based on database design theory
Normalization is the process of applying the NFs to table design to eliminate redundancy and create a more efficient organization of DB storage.
Each successive NF applies an increasingly stringent set of rules
First Normal Form (1NF)First Normal Form (1NF)A table is in first normal form if
there are no repeating groups.Repeating Groups : a set of
logically related fields or values that occur multiple times in one record◦1: non-atomic value, or multiple
values, stored in a field◦2: multiple fields in the same table
that hold logically similar values
Sample 1NF Violation - 1Sample 1NF Violation - 1
EmployeeID Name Project Time
EN1-26 Sean O’Brien 30-452-T3, 30-457-T3, 32-244-T3
0.25, 0.40, 0.30
EN1-33 Amy Guya 30-452-T3, 30-382-TC, 32-244-T3
0.05, 0.35, 0.60
EN1-35 Steven Baranco 30-452-T3, 31-238-TC
0.15, 0.80
Employee_Projects_Time
Sample 1NF Violation - 2Sample 1NF Violation - 2
EmpID
Last
Name
First
Name Proj1 Time1 Proj2 Time2
EN1-26 O’Brien Sean 30-452-T3
0.25 30-457-T3
0.40
EN1-33 Guya Amy 30-452-T3
0.05 30-328-TC
0.35
Employee_Projects_Time
Tables in 1NFTables in 1NF
*EmployeeID LastName FirstName
EN1-26 O’Brien Sean
EN1-33 Guya Amy
EN1-35 Baranco Steven
*ProjNum EmployeeID Time
30-328-TC EN1-33 0.35
30-452-T3 EN1-26 0.25
30-452-T3 EN1-33 0.05
Employees
Employees_Projects
Second Normal Form Second Normal Form (2NF)(2NF)A table is in 2NF if it is in 1NF and each non-
key field is functionally dependent on the entire primary key.
Functional dependency: a relationship between fields such that the value in one field determines the one value that can be contained in the other field.
Determinant: a field in which the value determines the value in another field.
ExampleAirport – City
Dulles – Washington, DC
Sample 2NF ViolationSample 2NF Violation
*EmpID Lname Fname *ProjNum ProjTitle
EN1-25 O’Brien Sean 30-452-T3 STAR Manual
EN1-25 O’Brien Sean 30-457-T3 ISO Procedures
EN1-25 O’Brien Sean 31-124-T3 Employee Handbook
EN1-33 Guya Amy 30-452-T3 STAR Manual
EN1-33 Guya Amy 30-482-TC Web site
Employees_Projects
Tables in 2NFTables in 2NF
*EmployeeID LastName FirstName
EN1-26 O’Brien Sean
EN1-33 Guya Amy
Employees
*EmployeeID *ProjNum
EN1-26 30-452-T3
EN1-33 30-457-T3
Employees_Projects
*ProjNum Title
30-452-T3 STAR manual
30-457-T3 ISO procedure
Projects
Third Normal Form (3NF)Third Normal Form (3NF)A table is in 3NF when it is in 2NF
and there are no transitive dependencies.
Transitive Dependency: a type of functional dependency in which the value of a non-key field is determined by the value in another non-key field and that field is not a candidate key.
Sample 3NF ViolationSample 3NF Violation
*ProjNum ProjTitle ProjMgr Phone
30-452-T3 STAR Manual Garrison 2756
30-457-T3 ISO Procedures Jacanda 2954
30-482-TC Web Site Friedman 2846
31-124-T3 STAR prototype Garrison 2756
35-272-TC Order System Jacanda 2954
Projects_Managers
Tables in 3NFTables in 3NF
*ProjNum ProjTitle Manager
30-452-T3 STAR manual Garrison
30-457-T3 ISO procedures Jacanda
Projects
*Manager Phone
Garrison 2846
Jacanda 2756
Project Managers
Boyce-Codd Normal Form Boyce-Codd Normal Form (BCNF)(BCNF)A table is in BCNF when it is in
3NF and all determinants are candidate keys.
Developed to cover situations that 3NF did not address.
Applies to situations where you have overlapping candidate keys.
Sample Business RulesSample Business RulesBusiness Rules:
◦Each course can have many students◦Each student can take many courses◦Each course can have multiple
teaching assistants (TAs)◦Each TA is associated with only one
course◦For each course, each student has
one TA
Sample BCNF ViolationSample BCNF Violation
CourseNum Student TA
ENG101 Jones Clark
ENG101 Grayson Chen
ENG101 Samara Chen
MAT350 Grayson Powers
MAT350 Jones O’Shea
MAT350 Berg Powers
Course_Students_TAs
Tables in BCNFTables in BCNF
*Student *TA
Jones Clark
Grayson Chen
Students
*CourseNum *TA
ENG101 Clark
MAT350 Chen
TAs
*CourseNum *Student
ENG101 Jones
MAT350 Grayson
Courses
Fourth Normal Form (4NF)Fourth Normal Form (4NF)A table is in 4NF when it is in BCNF
and there are no multi-valued dependencies.
Multi-valued Dependency: occurs when, for each value in field A, there is a set of values for field B and a set of values for field C, but B and C are not related.
Occurs when the table contains fields that are not logically related.
Sample 4NF Violation - 1Sample 4NF Violation - 1
*Movie *Star *Producer
Once Upon a Time Judy Garland Alfred Brown
Once Upon a Time Mickey Rooney Alfred Brown
Once Upon a Time Judy Garland Muriel Hemingway
Once Upon a Time Mickey Rooney Muriel Hemingway
Moonlight Humphrey Bogart Alfred Brown
Moonlight Judy Garland Alfred Brown
Movies
Tables in 4NF - 1Tables in 4NF - 1
*Movie *Star
Once Upon a Time Judy Garland
Once Upon a Time Mickey Rooney
Moonlight Humphrey Bogart
Moonlight Judy Garland
Stars
*Movie *Producer
Once Upon a Time Alfred Brown
Once Upon a Time Muriel Hemingway
Moonlight Alfred Brown
Producers
Sample 4NF Violation - 2Sample 4NF Violation - 2Projects_Equipment
Dept
Code ProjNum ProjMgrID Equip PropID
IS 36-272-TC EN1-15 CD-ROM 657
IS VGA monitor 305
AC 36-152-TC EN1-15
AC Dot matrix printer 358
AC Calculator w/tape 239
TW 30-452-T3 EN1-10 486 PC 275
TW 30-457-T3 EN1-15
TW 31-124-T3 EN1-15 Laser Printer 109
Tables in 4NF - 2Tables in 4NF - 2
*PropID Equip DeptCode
657 CD-ROM IS
305 VGA monitor IS
358 Dot matrix printer AC
Equipment
*ProjNum ProjMgrID DeptCode
30-452-T3 EN1-15 IS
30-457-T3 EN1-15 AC
35-152-TC EN1-10 TW
Projects
Fifth Normal Form (5NF)Fifth Normal Form (5NF)A table is in 5NF when it is in 4NF
and there are no cyclic dependencies.
Cyclic Dependency: occurs when there is a multi-field primary key with three or more fields (ex. A, B, C) and those fields are related in pairs AB, BC and AC.
Can occur only with a multi-field primary key of three or more fields
Sample 5NF ViolationSample 5NF Violation
*Buyer *Product *Company
Chris Jeans Levi
Chris Jeans Wrangler
Chris Shirts Levi
Lori Jeans Levi
BUYING
Do the mathDo the math
Our sample is two buyers, two products and two companies, so…
2 x 2 x 2 = 8 total records
But, what if our store has 20 buyers, 50 products and 100 companies?
20 x 50 x 100 = 100,000 total records
A Tempting SolutionA Tempting Solution
*Buyer *Product
Chris Jeans
Chris Shirts
Lori Jeans
Buyers
*Product *Company
Jeans Wrangler
Jeans Levi
Shirts Levi
Products
The Correct SolutionThe Correct Solution
*Buyer *Product
Chris Jeans
Chris Shirts
Lori Jeans
Buyers
*Product *Company
Jeans Wrangler
Jeans Levi
Shirts Levi
Products*Buyer *Compan
y
Chris Wrangler
Chris Levi
Lori Levi
Companies
Check the Math, AgainCheck the Math, AgainIf our company has 20 buyers, 50 products and 100 companies?
Buyers = 20 x 50 = 1000Products = 50 x 100 = 5000
Companies = 20 x 100 = 2000
8,000 total records instead of 100,000!
8. Finalizing the Design8. Finalizing the DesignDouble-check to ensure good,
principle-based designEvaluate design in light of
business model and determine desired deviations from design principles◦Process efficiency◦Security concerns
That’s it for Table DesignThat’s it for Table DesignWatch for repeating values and
fieldsCheck against the Normal FormsMake new tables when necessaryRe-check all tables against the
NFsRemember the business rulesUse common sense, but check
anyway!
Ensuring Data IntegrityEnsuring Data IntegrityPlacing constraints on how and
when and where data can be entered
Done after or along with table design
Part of design process because many constraints are established at the database and table levels
Referential IntegrityReferential Integrity
True relational databases support Referential Integrity: every non-null foreign key value must match an existing primary key value.
In other words, every record in a related table must have a matching record in the primary table.
Preserves the validity of foreign key values.
Enforced at database level.
Cascading UpdatesCascading UpdatesWhen a primary key value
changes, Cascade Update changes the corresponding values in the related records, so no records get orphaned.
Usually only one level deep◦Foreign key is not usually primary
key of related table (except in 1:1 relationships) hence no other tables are usually related to it
Cascade DeletesCascade DeletesWhen a primary table record is
deleted, all matching records in any related table are also deleted
Can propagate through multiple tables if Cascade Delete is turned on in all relationships between those tables
Another protection against orphan records, only this time by eradicating them instead!
Levels of EnforcementLevels of EnforcementReferential Integrity enforced at
database level because it affects relationship between two tables.
Many other business rules enforced at field and table level to ensure data integrity.
Business rule implementation should be documented: how and where it is enforced in the design.
Some rules can’t be enforced at table or field level; must be enforced in the application level.
Testing of Business RulesTesting of Business RulesAlways test business rule
implementation◦What happens when rule is met?◦What happens when rule is violated?
Not much good as a data entry constraint if it doesn’t constrain properly
Good application or interface design will provide feedback when user violates a constraint or rule
Field Level IntegrityField Level IntegrityConstraining by use of field properties
◦Data type: text, number, Yes/No, Date/Time
◦Field size◦Formats
Entry and editing constraints◦Required◦Indexed, with or without duplicates◦Input masks◦Default value◦Validation Rule
Table Level IntegrityTable Level Integrity
Field Comparisons◦Compare value in one field to value in
another◦Comparison performed before record is
saved◦Violations could display an error message
or force constraint of available valuesValidation or Lookup Tables
◦Store generally static set of values◦Stored values used to populate new
records to ensure accuracy of data entry
DocumentationDocumentation
A good design deserves good documentation
Data Dictionary for database/table design◦Table and field names◦Table and field properties◦Relationships, including primary and foreign
keys◦Indexes
Provide reasons for design features, especially if they intentionally violate normal design principles