Introduction to Introduction to Database Design Database Design Entity Relationship Model Entity Relationship Model
Dec 21, 2015
Introduction to Database Introduction to Database DesignDesign
Entity Relationship ModelEntity Relationship Model
Design of a DatabaseDesign of a Database
Design phases:Design phases: Requirement AnalysisRequirement Analysis
• Talk to people and figure out what they wantTalk to people and figure out what they want Conceptual Database DesignConceptual Database Design
• Do the designDo the design• Many tools/modeling techniques Many tools/modeling techniques
ERER, , UMLUML, Rambaugh, Booch, Yordon, Rambaugh, Booch, Yordon Logical Database DesignLogical Database Design
• Actual database tables in relational model, or OO model Actual database tables in relational model, or OO model or XML modelor XML model
• Here – only relational model.Here – only relational model.
Overview of Database Overview of Database DesignDesign
Conceptual designConceptual design: : ((ER Model ER Model is used at this is used at this stage.) stage.) What are the What are the entitiesentities and and relationshipsrelationships in the enterprise? in the enterprise? What information about these entities and relationships What information about these entities and relationships
should we store in the database?should we store in the database? What are the What are the integrity constraints integrity constraints or or business rules business rules that that
hold? hold?
A database `schema’ in the ER Model can be A database `schema’ in the ER Model can be represented pictorially (represented pictorially (ER diagramsER diagrams).).
Can map an ER diagram into a relational schema.Can map an ER diagram into a relational schema.
Entity-Relationship ModelEntity-Relationship Model
Entity SetsEntity Sets Relationship SetsRelationship Sets Mapping Constraints Mapping Constraints KeysKeys E-R DiagramE-R Diagram Extended E-R FeaturesExtended E-R Features Design Issues Design Issues Design of an E-R Database SchemaDesign of an E-R Database Schema Reduction of an E-R Schema to TablesReduction of an E-R Schema to Tables
Entity SetsEntity Sets
A A databasedatabase can be modeled as: can be modeled as: a collection of entities,a collection of entities, relationship among entities.relationship among entities.
An An entityentity is an object that exists and is distinguishable is an object that exists and is distinguishable from other objects.from other objects.
Example: specific person, company, event, plantExample: specific person, company, event, plant Entities are described using Entities are described using attributesattributes
Example: people have Example: people have names names and and addressesaddresses An An entityentity setset is a set of entities of the same type that is a set of entities of the same type that
share the same properties.share the same properties.Example: set of all persons, companies, trees, Example: set of all persons, companies, trees,
holidaysholidays
Entity Sets Entity Sets customercustomer and and loanloan
customer-id customer- customer- customer- loan- amount name street city number
AttributesAttributes An An entityentity is represented by a is represented by a set of attributesset of attributes, that is , that is
descriptive properties possessed by all members of an entity set.descriptive properties possessed by all members of an entity set.
Example: Example:
customer = (customer-id, customer-name, customer = (customer-id, customer-name, customer-street, customer-city) customer-street, customer-city)loan = (loan-number, amount)loan = (loan-number, amount)
DomainDomain – the set of permitted values for each attribute – the set of permitted values for each attribute
Keys: Minimal set of attributes whose values uniquely identify an Keys: Minimal set of attributes whose values uniquely identify an entity in the setentity in the set
Candidate Keys: all sets of attributes that can potentially be a key.Candidate Keys: all sets of attributes that can potentially be a key. Primary Key: One of the candidate keys is chosen to be a “primary” Primary Key: One of the candidate keys is chosen to be a “primary”
key. key.
Relationship SetsRelationship Sets A A relationshiprelationship is an association among several entities is an association among several entities
Example:Example: Hayes Hayes depositordepositor A-102A-102
customercustomer entity entity relationship setrelationship set accountaccount entity entity
A A relationship relationship setset is a mathematical relation among is a mathematical relation among nn 2 entities, 2 entities, each taken from entity setseach taken from entity sets{({(ee11, , ee22, … , … eenn) | ) | ee11 EE11, , ee22 EE22, …, , …, eenn EEnn}}
where (where (ee11, , ee22, …, , …, eenn) is a relationship) is a relationship Example: Example:
(Hayes, A-102) (Hayes, A-102) depositordepositor There can be multiple relationship sets between the same two There can be multiple relationship sets between the same two
entities.entities. A relationship must be uniquely identified by the participating entities. A relationship must be uniquely identified by the participating entities.
Relationship Set Relationship Set borrowerborrower
Descriptive AttributesDescriptive Attributes Descriptive attributes: used to record information about the Descriptive attributes: used to record information about the
relationshiprelationship When was the last time that the When was the last time that the customercustomer accessedaccessed his/her his/her accountaccount. .
E-R DiagramsE-R Diagrams
Rectangles represent entity sets.
Diamonds represent relationship sets.
Lines link attributes to entity sets and entity sets to relationship sets.
Ellipses represent attributes
Underline indicates primary key attributes (coming up)
Ternary RelationshipsTernary Relationships Ternary relationships - used to record associations between three Ternary relationships - used to record associations between three
entity sets.entity sets. Example: Each branch has several jobs that can be worked on by Example: Each branch has several jobs that can be worked on by
For this we need to record the association between employees, branches For this we need to record the association between employees, branches and jobs.and jobs.
Roles/Self Referential Roles/Self Referential RelationshipsRelationships
Entity sets of a relationship need not be distinctEntity sets of a relationship need not be distinct The labels “manager” and “worker” are called The labels “manager” and “worker” are called rolesroles; they specify ; they specify
how employee entities interact via the works-for relationship set.how employee entities interact via the works-for relationship set. Roles are indicated in E-R diagrams by labeling the lines that Roles are indicated in E-R diagrams by labeling the lines that
connect diamonds to rectangles.connect diamonds to rectangles. Role labels are optional, and are used to clarify semantics of the Role labels are optional, and are used to clarify semantics of the
relationshiprelationship
Constraints in ER Constraints in ER
Key ConstraintsKey Constraints Cardinality ConstraintsCardinality Constraints Participation ConstraintsParticipation Constraints
Overlapping Constraints (ISA)Overlapping Constraints (ISA) Coverage Constraints (ISA)Coverage Constraints (ISA)
Key ConstraintsKey Constraints Consider Consider depositordepositor relationship: A customer can deposit into relationship: A customer can deposit into
many accounts; an account can have many depositors. many accounts; an account can have many depositors.
Compare with: Each department has at most one ManagerCompare with: Each department has at most one Manager
Contrast with: Each customer can be the borrower on one loan. However, each loan can have many borrowers. The restriction that each customer can be borrower on one loan => Key Constraint
Key Constraint IIKey Constraint II
Relationship set like Relationship set like borrowerborrower - sometimes said to - sometimes said to be be one-to-manyone-to-many
Relationship set between Relationship set between customerscustomers and and accountsaccounts -> -> many-to-manymany-to-many
Key Constraint IIIKey Constraint III Additional Restriction: a Additional Restriction: a loanloan may be borrowed by only one may be borrowed by only one customercustomer -> -> one-to-oneone-to-one
Textbook clarification: arrow shown to go from customer to borrower Textbook clarification: arrow shown to go from customer to borrower Means same thing!Means same thing! Implies that Implies that customercustomer entity participates in the entity participates in the borrowerborrower relationship set only once. relationship set only once.
Key Constraints for Ternary Key Constraints for Ternary RelationshipsRelationships
Key constraints in binary relationships can Key constraints in binary relationships can be easily extended to ternary.be easily extended to ternary.
Alternative Notation for Alternative Notation for Cardinality LimitsCardinality Limits
Cardinality limits can also express participation constraints
Participation ConstraintsParticipation Constraints
Total participation (indicated by double/thick line): every entity in the entity set participates in at least one relationship in the relationship set E.g. participation of loan in borrower is total
every loan must have a customer associated to it via borrower
Partial participation: some entities may not participate in any relationship in the relationship set E.g. participation of customer in borrower is partial Not every customer has a loan
KeysKeys
A A super keysuper key of an entity set is a set of one or more of an entity set is a set of one or more attributes whose values uniquely determine each attributes whose values uniquely determine each entity.entity.
A A candidate keycandidate key of an entity set is a minimal super of an entity set is a minimal super keykey
Customer-idCustomer-id is candidate key of is candidate key of customercustomer account-numberaccount-number is candidate key of is candidate key of accountaccount
Although several candidate keys may exist, one of Although several candidate keys may exist, one of the candidate keys is selected to be the the candidate keys is selected to be the primary primary keykey..
Weak Entity SetsWeak Entity Sets Assumption so far:Assumption so far:
Attributes associated with an entity contain a key (to uniquely identify the Attributes associated with an entity contain a key (to uniquely identify the entities)entities)
Not always the case!Not always the case! Example:Example:
Employees can purchase policies to cover their dependents.Employees can purchase policies to cover their dependents. We need to record information about policies:We need to record information about policies:
• Who is covered, Who owns the policyWho is covered, Who owns the policy Don’t really care about the dependents beyond thatDon’t really care about the dependents beyond that If employee quits, policy is deleted and coverage for dependents stopped!If employee quits, policy is deleted and coverage for dependents stopped!
This above is modeled via a Weak Entity Set.This above is modeled via a Weak Entity Set. An entity set that does not have a primary key is referred to as a An entity set that does not have a primary key is referred to as a weak entity weak entity
setset.. Weak entity is uniquely identified by a conjunction of some of its Weak entity is uniquely identified by a conjunction of some of its
attributes and the primary key of another entityattributes and the primary key of another entity - Identifying entity - Identifying entity setset
Weak Entity SetsWeak Entity Sets
Restrictions:Restrictions: it must relate to the identifying entity set via a it must relate to the identifying entity set via a one-to-one-to-manymany relationship set from the identifying to the weak relationship set from the identifying to the weak entity setentity set
It must have total participation in the identifying It must have total participation in the identifying relationship set. relationship set.
Weak Entity Sets (Cont.)Weak Entity Sets (Cont.) We depict a weak entity set by double rectangles.We depict a weak entity set by double rectangles. We underline the discriminator of a weak entity set with a We underline the discriminator of a weak entity set with a
dashed line.dashed line. payment-numberpayment-number – discriminator of the – discriminator of the payment payment entity set entity set Primary key for Primary key for payment payment – (– (loan-number, payment-loan-number, payment-
numbernumber) )
Conceptual Design Using the ER Conceptual Design Using the ER ModelModel
Design choices:Design choices: Should a concept be modeled as an entity or an attribute?Should a concept be modeled as an entity or an attribute? Should a concept be modeled as an entity or a relationship?Should a concept be modeled as an entity or a relationship? Identifying relationships: Binary or ternary? Aggregation?Identifying relationships: Binary or ternary? Aggregation?
Constraints in the ER Model:Constraints in the ER Model: A lot of data semantics can (and should) be captured.A lot of data semantics can (and should) be captured. But some constraints cannot be captured in ER diagrams.But some constraints cannot be captured in ER diagrams.
• Constraints on individual attributes of an entityConstraints on individual attributes of an entity Employee enitites must have age > 24Employee enitites must have age > 24
Entity vs. AttributeEntity vs. Attribute
Remember – attribute values are atomic (cannot be Remember – attribute values are atomic (cannot be broken down further)broken down further)
Should Should addressaddress be an attribute of Employees or an be an attribute of Employees or an entity (connected to Employees by a relationship)?entity (connected to Employees by a relationship)?
Depends upon the use of address information, and the Depends upon the use of address information, and the semantics of the data:semantics of the data:
• If we have several addresses per employee, If we have several addresses per employee, addressaddress must must be an entity (since attributes cannot be set-valued). be an entity (since attributes cannot be set-valued).
• If address is to be shared by many employees, address If address is to be shared by many employees, address should be an entity.should be an entity.
• If the structure (city, street, etc.) is important, e.g., we want If the structure (city, street, etc.) is important, e.g., we want to retrieve employees in a given city, to retrieve employees in a given city, addressaddress must be must be modeled as an entity (since attribute values are atomic). modeled as an entity (since attribute values are atomic).
Entity vs. Attribute (Contd.)Entity vs. Attribute (Contd.)
Works_In2Works_In2 does not does not allow an employee to allow an employee to work in a department work in a department for two or more periods.for two or more periods.
Similar to the problem Similar to the problem of wanting to record of wanting to record several addresses for an several addresses for an employee: we want to employee: we want to record record several values of several values of the descriptive attributes the descriptive attributes for each instance of this for each instance of this relationship. relationship.
name
Employees
ssn lot
Works_In2
from to
dname
budgetdid
Departments
dnamebudgetdid
name
Departments
ssn lot
Employees Works_In3
Durationfrom to
Entity vs. RelationshipEntity vs. Relationship First ER diagram OK if First ER diagram OK if
a manager gets a a manager gets a separate discretionary separate discretionary budget for each dept.budget for each dept.
What if a manager gets What if a manager gets a discretionary budget a discretionary budget that covers that covers all all managed depts?managed depts? Redundancy Redundancy of of dbudget, dbudget,
which is stored for each which is stored for each dept managed by the dept managed by the manager.manager.
Manages2
name dnamebudgetdid
Employees Departments
ssn lot
dbudgetsince
Employees
since
name dnamebudgetdid
Departments
ssn lot
Mgr_Appts
Manages3
dbudgetapptnum
- Misleading: suggests dbudget tied to managed dept.
Binary vs. Ternary Binary vs. Ternary RelationshipsRelationships
agepname
DependentsCovers
name
Employees
ssn lot
Policies
policyid cost
Beneficiary
agepname
Dependents
policyid cost
Policies
Purchaser
name
Employees
ssn lot Consider Figure 1 - What does it Consider Figure 1 - What does it
depict?depict? Additional constraints:Additional constraints:
A policy cannot be owned jointly by two A policy cannot be owned jointly by two employeesemployees
Every policy must be owned by some Every policy must be owned by some employeeemployee
Dependents is a weak entity set - uniquely Dependents is a weak entity set - uniquely identified by policyIdidentified by policyId
Binary vs TernaryBinary vs Ternary
Constraint 1: Add a key constraint on Policies Constraint 1: Add a key constraint on Policies with respect to Coverswith respect to Covers Side effect: policy can cover only one dependentSide effect: policy can cover only one dependent
Constraint 2: Total participation constraint on Constraint 2: Total participation constraint on PoliciesPolicies Ok if each policy covers at least one dependentOk if each policy covers at least one dependent
Constraint 3: Introduce an indentifying Constraint 3: Introduce an indentifying relationship setrelationship set
Better SolutionBetter Solutionagepname
DependentsCovers
name
Employees
ssn lot
Policies
policyid cost
Beneficiary
agepname
Dependents
policyid cost
Policies
Purchaser
name
Employees
ssn lot
Are you awake?Are you awake?
ER Group ExerciseER Group Exercise
Class (ISA) Hierarchies Class (ISA) Hierarchies
As in C++ or As in C++ or Java, attributes Java, attributes are inheritedare inherited
If we declare A If we declare A ISA B, every A ISA B, every A entity is also entity is also considered to considered to be a B entity.be a B entity.
ISA Hierarchy ConstraintsISA Hierarchy Constraints
Overlap Constraints: Can Joe be both an Overlap Constraints: Can Joe be both an employee and a customer? (Allowed/Disallowed)employee and a customer? (Allowed/Disallowed)
Does every employee entity also have to be an Does every employee entity also have to be an officer or teller or secretary entity? (Yes/No)officer or teller or secretary entity? (Yes/No)
Reasons for using ISA:Reasons for using ISA: To add attributes specific to a subclassTo add attributes specific to a subclass To identify entities that participate in a relationshipTo identify entities that participate in a relationship
AggregationAggregation
Used when we have Used when we have to model a to model a relationship involving relationship involving (entitity sets and) a (entitity sets and) a relationship setrelationship set.. AggregationAggregation allows allows
us to treat a us to treat a relationship set as an relationship set as an entity set for entity set for purposes of purposes of participation in participation in (other) relationships.(other) relationships.
Aggregation vs. ternary relationship: Monitors is a distinct relationship, with a descriptive attribute. Also, can say that each sponsorship is monitored by at most one employee.
budgetdidpid
started_on
pbudgetdname
until
DepartmentsProjects Sponsors
Employees
Monitors
lotname
ssn
since
Case Study (from Text Book)Case Study (from Text Book)
See HandoutSee Handout Addition to earlier exercise.Addition to earlier exercise.
Summary of Conceptual Summary of Conceptual DesignDesign
Conceptual design Conceptual design follows follows requirements analysisrequirements analysis, , Yields a high-level description of data to be stored Yields a high-level description of data to be stored
ER model popular for conceptual designER model popular for conceptual design Constructs are expressive, close to the way people think Constructs are expressive, close to the way people think
about their applications.about their applications.
Basic constructs: Basic constructs: entitiesentities, , relationshipsrelationships, and , and attributesattributes (of entities and relationships). (of entities and relationships).
Some additional constructs: Some additional constructs: weak entitiesweak entities, , ISA ISA hierarchieshierarchies, and , and aggregationaggregation..
Note: There are many variations on ER model.Note: There are many variations on ER model.
Summary of ER (Contd.)Summary of ER (Contd.)
Several kinds of integrity constraints can be Several kinds of integrity constraints can be expressed in the ER model: expressed in the ER model: key constraintskey constraints, , participationparticipation constraintsconstraints, and , and overlap/covering overlap/covering constraintsconstraints for ISA hierarchies. Some for ISA hierarchies. Some foreign key foreign key constraints constraints are also implicit in the definition of a are also implicit in the definition of a relationship set.relationship set. Some constraints (notably, Some constraints (notably, functional dependenciesfunctional dependencies) cannot ) cannot
be expressed in the ER model.be expressed in the ER model. Constraints play an important role in determining the best Constraints play an important role in determining the best
database design for an enterprise.database design for an enterprise.
Summary of ER (Contd.)Summary of ER (Contd.)
ER design is ER design is subjectivesubjective. There are often many . There are often many ways to model a given scenario! Analyzing ways to model a given scenario! Analyzing alternatives can be tricky, especially for a large alternatives can be tricky, especially for a large enterprise. Common choices include:enterprise. Common choices include: Entity vs. attribute, entity vs. relationship, binary or n-ary Entity vs. attribute, entity vs. relationship, binary or n-ary
relationship, whether or not to use ISA hierarchies, and relationship, whether or not to use ISA hierarchies, and whether or not to use aggregation.whether or not to use aggregation.
Ensuring good database design: resulting Ensuring good database design: resulting relational schema should be analyzed and refined relational schema should be analyzed and refined further. FD information and normalization further. FD information and normalization techniques are especially useful.techniques are especially useful.
Summary of Symbols Used Summary of Symbols Used in E-R Notationin E-R Notation
Summary of Symbols (Cont.)Summary of Symbols (Cont.)
Alternative E-R NotationsAlternative E-R Notations
UMLUML
UML: Unified Modeling LanguageUML: Unified Modeling Language UML has many components to graphically UML has many components to graphically
model different aspects of an entire model different aspects of an entire software systemsoftware system
UML Class Diagrams correspond to E-R UML Class Diagrams correspond to E-R Diagram, but several differences.Diagram, but several differences.
Summary of UML Class Diagram Summary of UML Class Diagram NotationNotation
UML Class Diagrams UML Class Diagrams (Contd.)(Contd.)
Entity sets are shown as boxes, and attributes are shown within the Entity sets are shown as boxes, and attributes are shown within the box, rather than as separate ellipses in E-R diagrams.box, rather than as separate ellipses in E-R diagrams.
Binary relationship sets are represented in UML by just drawing a Binary relationship sets are represented in UML by just drawing a line connecting the entity sets. The relationship set name is written line connecting the entity sets. The relationship set name is written adjacent to the line. adjacent to the line.
The role played by an entity set in a relationship set may also be The role played by an entity set in a relationship set may also be specified by writing the role name on the line, adjacent to the entity specified by writing the role name on the line, adjacent to the entity set. set.
The relationship set name may alternatively be written in a box, The relationship set name may alternatively be written in a box, along with attributes of the relationship set, and the box is along with attributes of the relationship set, and the box is connected, using a dotted line, to the line depicting the relationship connected, using a dotted line, to the line depicting the relationship set.set.
Non-binary relationships cannot be directly represented in UML -- Non-binary relationships cannot be directly represented in UML -- they have to be converted to binary relationships.they have to be converted to binary relationships.
UML Class Diagram Notation UML Class Diagram Notation (Cont.)(Cont.)
*Note reversal of position in cardinality constraint depiction
UML Class Diagrams (Contd.)UML Class Diagrams (Contd.) Cardinality constraints are specified in the form Cardinality constraints are specified in the form l..hl..h, where , where l l
denotes the minimum and denotes the minimum and h h the maximum number of the maximum number of relationships an entity can participate in.relationships an entity can participate in.
Beware: the positioning of the constraints is exactly the Beware: the positioning of the constraints is exactly the reverse of the positioning of constraints in E-R diagrams.reverse of the positioning of constraints in E-R diagrams.
The constraint 0..* on the The constraint 0..* on the EE22 side and 0..1 on the side and 0..1 on the EE1 side 1 side means that each means that each EE2 entity can participate in at most one 2 entity can participate in at most one relationship, whereas each relationship, whereas each EE1 entity can participate in 1 entity can participate in many relationships; in other words, the relationship is many many relationships; in other words, the relationship is many to one from to one from EE2 to 2 to EE1.1.
Single values, such as 1 or * may be written on edges; The Single values, such as 1 or * may be written on edges; The single value 1 on an edge is treated as equivalent to 1..1, single value 1 on an edge is treated as equivalent to 1..1, while * is equivalent to 0..*.while * is equivalent to 0..*.