Overview of Database Design By Nazife Dimililer
Dec 19, 2015
Database Management System
A DBMS is a data storage and retrieval system which permits data to be stored non-redundantly while making it appear to the user as if the data is well-integrated.
Database Management System
DBMS manages data resources like an operating system manages hardware resources
DBMSDBMS Databasecontainingcentralized
shared data
Application#1
Application#2
Application#3
Advantages of Database Approach
• Program-Data Independence – Metadata stored in DBMS, so applications don’t need to
worry about data formats– Data queries/updates managed by DBMS so programs
don’t need to process data access routines– Results in: increased application development and
maintenance productivity• Minimal Data Redundancy
– Leads to increased data integrity/consistency
Advantages of Database Approach• Improved Data Sharing
– Different users get different views of the data• Enforcement of Standards
– All data access is done in the same way• Improved Data Quality
– Constraints, data validation rules• Better Data Accessibility/ Responsiveness
– Use of standard data query language (SQL)• Security, Backup/Recovery, Concurrency
– Disaster recovery is easier
Costs and Risks of the Database Approach
• Up-front costs:– Installation Management Cost and
Complexity– Conversion Costs
• Ongoing Costs– Requires New, Specialized Personnel– Need for Explicit Backup and Recovery
• Organizational Conflict– Old habits die hard
The Range ofDatabase Applications
• Personal Database – standalone desktop database
• Workgroup Database – local area network (<25 users)
• Department Database – local area network (25-100 users)
• Enterprise Database – wide-area network (hundreds or thousands of users)
Evolution of DB Systems
• Flat files - 1960s - 1980s• Hierarchical – 1970s - 1990s• Network – 1970s - 1990s• Relational – 1980s - present• Object-oriented – 1990s - present• Object-relational – 1990s - present• Data warehousing – 1980s - present• Web-enabled – 1990s - present
Database Design Phases• Conceptual DesignModel the data without any physical considerations for
each user view.• Logical DesignChoose the data model that will be used and modify the
conceptual data model to fit the data model without any other physical considerations. Validate the model using normalization and transaction requirements.
• Physical DesignChoose the actual DBMS and implement the data model
efficiently. Performance, security and reliability are key issues.
Physical Database Design
• Purpose - translate the logical description of data into the technical specifications for storing and retrieving data
• Goal - create a design for storing data that will provide adequate performance and insure database integrity, security and recoverability
Physical Design Process
Normalized relations
Volume estimates
Attribute definitions
Response time expectations
Data security needs
Backup/recovery needs
Integrity expectations
DBMS technology used
Inputs
Attribute data types
Physical record descriptions (doesn’t always match logical design)
File organizations
Indexes and database architectures
Query optimization
Leads to
Decisions
Designing Fields
• Field: smallest unit of data in database
• Field design –Choosing data type–Coding, compression, encryption–Controlling data integrity
Field Data Integrity
• Default value - assumed value if no explicit value
• Range control – allowable value limitations (constraints or validation rules)
• Null value control – allowing or prohibiting empty fields
• Referential integrity – range control (and null value allowances) for foreign-key to primary-key match-ups
Denormalization• Transforming normalized relations into unnormalized
physical record specifications• Benefits:
– Can improve performance (speed) be reducing number of table lookups (i.e reduce number of necessary join queries)
• Costs (due to data duplication)– Wasted storage space– Data integrity/consistency threats
• Common denormalization opportunities– One-to-one relationship– Many-to-many relationship with attributes– Reference data (1:N relationship where 1-side has data not used
in any other relationship)
Systems Development Life Cycle
Project Identification and Selection
Project Initiation and Planning
Analysis
Physical Design
Implementation
Maintenance
Logical Design
Systems Development Life Cycle
Project Identification and Selection
Project Initiation and Planning
Analysis
Physical Design
Implementation
Maintenance
Logical Design
Purpose --preliminary understandingDeliverable –request for project
Database activity – enterprise modeling
First step in database developmentSpecifies scope and general contentOverall picture of organizational data, not specific designEntity-relationship diagramDescriptions of entity typesRelationships between entitiesBusiness rules
Systems Development Life Cycle
Project Identification and Selection
Project Initiation and Planning
Analysis
Physical Design
Implementation
Maintenance
Logical Design
Purpose – state business situation and solutionDeliverable – request for analysis
Database activity – conceptual data modeling
Systems Development Life Cycle
Project Identification and Selection
Project Initiation and Planning
Analysis
Physical Design
Implementation
Maintenance
Logical Design
Purpose –thorough analysisDeliverable – functional system specifications
Database activity – conceptual data modeling
Systems Development Life Cycle
Project Identification and Selection
Project Initiation and Planning
Analysis
Physical Design
Implementation
Maintenance
Logical Design
Purpose –information requirements structureDeliverable – detailed design specifications
Database activity – logical database design
Systems Development Life Cycle
Project Identification and Selection
Project Initiation and Planning
Analysis
Physical Design
Implementation
Maintenance
Logical Design
Purpose –develop technology specsDeliverable – program/data structures, technology purchases, organization redesigns
Database activity – physical database design
Systems Development Life Cycle
Project Identification and Selection
Project Initiation and Planning
Analysis
Physical Design
Implementation
Maintenance
Logical Design
Purpose –programming, testing, training, installation, documentingDeliverable – operational programs, documentation, training materials
Database activity – database implementation
Systems Development Life Cycle
Project Identification and Selection
Project Initiation and Planning
Analysis
Physical Design
Implementation
Maintenance
Logical Design
Purpose –monitor, repair, enhanceDeliverable – periodic audits
Database activity – database maintenance
Simplified Database Development Procedure
Start
Draw ERD
Convert to Relational Schema
Validate using Normalization
Validate against user transactions
Stop
DocumentationEntity Document
Entity Name Description Aliases OccurrenceName of entity A short
Description of entity
Other names the users used to refer to this entity
A common Situation where this entity can be found
Instructor Employees teaching courses
Lecturer,professor
Instructors work in departments
DocumentationRelationship Document
Entity Type Relationship Type
Entity Type Cardinality Participation (Optionality)
Name Ofparticipatin
g Entity : Entity A
Name ofRelationship
Name ofparticipatin
g Entity : Entity B
Cardinalityfrom Entity A to Entity B 1:1 1:M M:1
Participation constraints on the relationship from Entity A to Entity B (Optionalities)
Full (F) : Manadatory Entity (min>0)
Partial (P) : Optional Entity (min=0)
Instructor workFor Department M:1 P:F
DocumentationAttribute Document
Entity Names of Attributes
Description Data type and length
Constraint
Name of Entity List of all attributes of the entity
Description of each attribute
Data type ofeach attribute. It is possible touse domainnames youhave describedin the domaindocument
Primary , Unique and Secondary Key.(SecondaryKeys are usedto search forthe entity)
Student Student Id Uniquely identifies a student.
6 fixed character
Primary Key
Name Full name of student
50 variable character
Secondary Index
Gender Gender of student
1 fixed character
DocumentationAttribute Document Continued
Names of Attributes
Default Value Alias Null Value?(Yes or No)
Derived?
List of all attributes of the entity
Default valuefor attributes
Othernames, theusers used for the attribute
Yes : Null values are allowed
No: Null values are not allowed
Yes: It is derived
No: It is not derived
Student Id No
Name No
Gender ‘F’ Sex Yes
cgpa Cumulative grade
Yes Yes
DocumentationAttribute Domain Document
Domain Name Domain Characteristics Examples of allowed values
Name of Domain for attributes
Description of domain Illustrative examples
Cgpa domain 3 digit floating point between 0.00 and 4.00
3.33, 4.00
Gender 1 character string (‘F’ or ‘M’)
M, F
Some helpful pointers
• Use consistent naming rules for all entities,relationships and attributes
• Choose primary keys intelligently.
Primary keys should NOT change over time.
• Choose appropriate data types for attributes
Introduction
• There are endless possibilities for a designer to make a bad or wrong choice.
• You must try to understand how the customer manipulates data and how the ERD will produce the data structures required to sustain the same data manipulation
• The errors may be corrected at conceptual or logical database design phases. In fact you must check for errors at every phase!
• Here we discuss how to fix some common problems at the conceptual database design phase.
Problem:Unnormalized Attributes• Does an attribute name contain data?
– Multiple Attributes:ex : A1, A2, A3, …, An
ex :First_Inspection, Second_Inspection …
– Enumerations:X-Approval, Y-Approval, Z-Approval
• Difficult to predict population and changes require attribute changes
Restaurant
Id NameFirst_inspection
Second_inspection
Third_inspection
TextBook_Request
FormNo FormDate
Coordinator_Approval
Director_Approval
Rector_Approval
Solution: Unnormalized AttributesFixing Repeating Attributes
• Split repeating variables into its own– Split into a repeating group based on index
ex: (A,n) , (InspectionResult, OrderNo)
Restaurant
Id NameFirst_inspection
Second_inspection
Third_inspection
Restaurant
IdName
inspection
OrderNo Result
Solution: Unnormalized AttributesFixing Repeating Attributes
• Alternatively the following solution may be used
Restaurant
Id Name
Inspection
OrderNo Result
InspectionbelongsTo
If we need to store information on the employees who performed the inspection, it can be easily added here
employee
performedBy
Id Name
Solution: Unnormalized Attributes Fixing Enumerations
• Enumerations– Split the enumeration to code and domain
value(Code, Approval)
TextBook_Request
FormNo FormDateCoordinator_Approval
Director_Approval
Rector_Approval
TextBook_Reuest
FormNoFormDate
Approval
Code Status
EntryDate
Solution: Unnormalized Attributes Fixing Enumerations
Alternatively
Better Yet:
TextBookRequest
FormNo FormDate
Inspection
Code Status
ApprovalHas
EntryDate
TextBookRequest
FormNo FormDate
Inspection
Status
ApprovalHas
EntryDate
Employee
Id Name
Has
Employee
Id Name
TextBookRequest
FormNo FormDate
Approval
StatusEntryDate
Or
We can find out exactly who approved or disapproved of a text book request
Problem: Enumerations (Lists)
• Does an entity have any attributes that are enumerations but are not foreign keys?
• Create special code entities to hold the list of enumerated values and descriptions– also known as Lookup Tables, Reference tables or Cross-Reference entities
• This is different from the unnormalized attribute-enumartions. Here the attribute name does not contain data!
• Ex: If country is a simple attribute, then its value must be chosen from a list.
Solution: Use Validation Entities (lookup tables)
Student
id name
country
Employee
id name
country
Student
id name
Employee
id name
Country
Code
name
isFrom
isFrom
Problem : Single valued attributes changing over time• Even though an attribute may have only one value at any
given time, do you need to know its previous values?– Do you need to keep track of changes of an
attribute?
Instructor
Id Name
Title
At any given time an instructor has only one title: Assist. Prof, Assoc. Prof, Prof. But the title is expected to change!
InstructorTitle
Instructor
Id Name
InstructorTitle
ChangeDateTitle
change
Solution: Add History
Problem: Use of “complex” attributes
• Does an attribute represent a real life object or concept?
ServiceRecord
EquipmentId ServiceDate
EmployeeDescription
Description
HiredateDate
ServiceRecord
EquipmentId ServiceDate
EmployeeSalary
NameId
performs
Solution: Create a separate entity for the “complex” attribute
Representing compound attributes as simple attributes
Customer
Id
name address
Customer
Id
name
addressfirstname lastname
street
city
country
• Is a simple attribute composed from more than one field?
Solution: Use composite attributes
Problem: Fan Traps
• Result of hierarchical relationships that split semantic relationships resulting in the loss of information
• Commonly expressed by traversals from weak entity to related weak entity through parent which results in loss of information
• Fixed by reordering hierarchy
Example of Fan Trap
Issue: Who uses which computer?
Computer Office Employeecontains worksin
code speed
ramcapacity
officeno
floor
name
id
Fixing a Fan Trap
Computer Office Employee
cid ramcapacity
speed
officeno
floor
nameid
uses
worksin
This re-arrangement fixes the fan trap problem but if it is possible to have a computer in an office that is not assigned to any employee, it has another problem
Problem: Chasms
• Result of hierarchical relationships that split semantic relationships resulting in the loss of business rules
• Commonly expressed by creating artificial intermediate entity values for the sole purpose of providing a link
• Fixed by rebalancing hierarchy and adding appropriate relationships
Example of a Chasm
Issues: What if a customer is not assigned to an employee?
Branch employee customerworksFor(0,M) (1,1) (0,M) (0,1)
code name eid name cid name
represents
Fixing a Chasm
Branch employee customerworksFor(0,M) (1,1) (0,M) (0,1)
code name eid name cid name
represents
dealsWith
(0,M) (1,1)
More Design problems
• Misplaced relationships• Incorrect Cardinalities• Missing Relationships• Overuse of specialized data modeling
tools (ex: Inheritance, multiway relationships)
• Redundant Relationships
Use of Intelligent vs Surrogate Keys
• A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key aka intelligent key.
• "Surrogate key" may also be known as "System-generated key", "Database Sequence number", "Synthetic key", "Technical key" or an "Arbitrary, unique identifier".
• primary keys are hard to change. • Intelligent keys suffer from this problem because not only
are they used as primary and foreign keys but they also have some business meaning associated with them
• The biggest advantage for intelligent keys is that users understand what they mean whereas surrogate keys don't make any business sense.
Data Models that use surrogate keys usually have more normalization errors.
Surrogate vs. Intelligent Keys
Natural keys:• are more logical • can sometimes can mean fewer joins • help to encourage good modeling • are traditional/user friendly• make snooping around in the data easier
Surrogate keys:• are shorter • are easier to join• take less storage • enable natural key fields to be easily
changed • are what Object Oriented (and object
relational) databases use
Goals of Database Development
• Develop a Common Vocabulary• Define the meaning of Data• Ensure Data Quality• Find an Efficient Implementation