Database Design. Database A database is a collection of information that is organized so that it can easily be accessed, managed, and updated Databases.
Post on 01-Jan-2016
243 Views
Preview:
Transcript
Database Design
Database• A database is a collection of information that is
organized so that it can easily be accessed, managed, and updated
• Databases typically contain aggregations of data records or files, such as chemical analysis, experiments, investigators, and sample
• Typically, a database manager provides users the capabilities of controlling read/write access, specifying report generation, and analyzing usage
Database structure
• Databases are organized by columns, records, and tables
• A column is a single piece of information; a record is one complete set of columns (shown in a row); and a table is a collection of records
• For example, a telephone book is analogous to a table. It contains a list of records, each of which consists of three fields: – name, address, and telephone number
• To access information from a database, you need a Database
Management System (DBMS), e.g., MS Access
• This is a collection of programs that enables you to enter, organize, and select data in a database
Relational Database• A relational database is a collection of data items organized as a
set of tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables
• The relational model specifies that the rows of a table have no specific order and that the rows, in turn, impose no order on the attributes
• Applications access data by specifying queries, which use operations such as select to identify rows, project to identify attributes, and join to combine relations (tables)
• Relations (Tables) can be modified using the insert, delete, and update operators.
...• A relational database is a set of tables containing data organized
into predefined categories
• Each table (which is sometimes called a relation) contains one or more data categories in columns (fields)
• Each row contains a unique instance of data for the categories defined by the columns
• For example, a typical Investigator database would include a table (Investigator) that describes an investigator with columns (fields) for name, address, phone number, affiliation, degree, etc.
• Another table (e.g., Sample) would describe fields such as sample_id, number, type, lithology, date taken, etc.
Relational Model• Relational model was developed by E.F. Codd in 1970, and
is the basis for all RDBs
• It uses the set theory for database modeling, based on relational algebra (RA) which is used to manipulate objects in a RDB
• It involves the required information that may not appear in the UML diagram but is needed for the database
• SQL is a declarative language, used to manipulate RDBs applying relational algebra
Table (Relation)• A table is defined as a set of rows that have the
same attributes (fields under columns)
– That is, a table is organized into rows and columns
• A row usually represents one object and information (record) about that object
• Objects are typically physical objects or concepts
View of the Database
• A user of the database could obtain a view of the database that fitted the user's needs
• For example, a head of a Geological Survey might like a view or report on:– all geological quadrangle maps that were completed by a
certain date
• A geologist in the same survey could, from the same tables, obtain a report on:– collected samples which need chemical analysis
…
• In addition to being relatively easy to create and access, a relational database has the important advantage of being easy to extend
• After the original database creation, a new data category (e.g., table) can be added without requiring that all existing applications be modified
Create a model of the real world
• Design of a database involves modeling the domain (here it means universe of discourse)– Model: simplified abstraction of the real world
• The database should contain information needed by the people who use it
• UML (Unified Modeling Language) is perhaps the best modeling tool, along with ERD (entity relationship diagram)– We will model our database in MySQL Workbench or Microsoft
Access
UML Class Diagram
• UML class diagrams are used to define classes (equivalent to entity in ER diagrams)– Each class represents a type for all physical and conceptual things or an
event• For example, Ocean, Aquifer, Mineral, Folding
• First step: Define the class in English, e.g.,
• Aquifer: “A rock unit that can store and transmit economically significant amount of water”
• Each class (singular form) has several attributes, which define the relevant descriptive (natural) properties or characteristics (those other than ID, keys) of each member of the class.
nametypeporositythicknesstransmissivityhydraulicCond
Aquifer
Relation (Table) Schema• The UML class is translated into a relation
schema identified by the class name (Aquifer) with all the attributes of the class, e.g.,:
• Aquifer schema:
Assignment rule: Associates each attribute to a set of valid values (i.e., defines the domain)
name type porosity permeability
Data Integrity• When creating a relational database, we define the
domain of possible values in a data column (i.e., for each attribute) and further constraints that may apply to that data value
• This is done for data integrity so that invalid values will not be entered into the database
– e.g., DATETIME, STRING, CHAR, INT
• Domain: Set of valid values that can be assigned to an attribute– This can be done by assigning character length, data type, format,
e.g., of email and URL address, zip codes, state names, etc. – Many of these should be handled with the user interface (in forms to
collect user data), and enumerated lists
Constraints• Constraints allow further restriction of the
domain of an attribute. – You can place constraints to limit the type of data
that is stored in a table
– For example, a constraint can restrict a given integer attribute to values between 1 and 5
Unique constraint• A UNIQUE constraint ensures that all values in a column are
distinct (i.e., not duplicated)
CREATE TABLE Customer (customer_id integer UNIQUE,
last_name VARCHAR(100), first_name VARCHAR(100));
In this example, the customer_id column has a unique constraint, and cannot include duplicate values
customer_id last_name first_name1 Johnson Gloria2 Smith Gina3 Spalding David
Unique Constraint …
• If the previous table already has a customer_id with a value of ‘3’, executing the following SQL statement, will result in an error because '3' already exists in the ID column:
INSERT INTO Customer values ('3','Jones','Kaila');
• Trying to insert another row with that value violates the UNIQUE constraint
Default Constraint• A DEFAULT constraint provides a default value for
a column when the INSERT INTO statement does not provide a specific value
CREATE TABLE Student (student_id INTEGER UNIQUE NOT NULL,
last_name VARCHAR (100), first_name VARCHAR (100), score INTEGER DEFAULT 75);
Default value …INSERT INTO Student (student_id, last_name, first_name) VALUES ('1','James','Alex');
• The populated Student table will look like the following after execution:
• Even though we didn't specify a value for the “score" column in the INSERT INTO statement, it does get assigned the default value of 75 since we had already set 75 as the default value for this column
student_id last_name first_name score
1 James Alex 75
Check Constraint• A CHECK constraint ensures that all values in a column satisfy certain
conditions. Once defined, the database will only insert a new row or update an existing row if the new value satisfies the CHECK constraint
• The CHECK constraint is used to ensure data quality
• For example, in the following CREATE TABLE statement,
CREATE TABLE Customer (customer_id integer CHECK (customer_id > 0), last_name varchar (30), first_name varchar(30));
• Column “customer_id" has a constraint -- its value must only include integers greater than 0. So, attempting to execute the following statement, will result in an error because the values for ID must be greater than 0
INSERT INTO Customer values ('-5','Smith','Lynn'); // leads to error
Set notation
A schema is represented by a set notation
Aquifer= {name, type, porosity, permeability, …}
– Order of the attributes does not matter– Attributes cannot be duplicated– Rules can define domains for each element of the set
• Can also limit the domain of an attribute to a specific range of values
– We can define subset of the set (e.g., views)– All can be manipulated with set operators (e.g., union, to
combine two relations or tables)
Relation = Table• The structure of each relation schema defines a
table
• Use SQL syntax to create a table:CREATE TABLE Aquifer (
id INT NOT NULL,name VARCHAR(20) NOT NULL,type VARCHAR(30) NOT NULL, porosity VARCHAR (10) NULL,permeability VARCHAR (20) NULLPRIMARY KEY (‘id’));
Variable-length character string
i.e., value must be provided (not
optional)
CREATE TABLE Sample
(
sample_id INT NOT NULL,
investigator_id VARCHAR(20) NOT NULL,
sample_num VARCHAR(20) NOT NULL,
lithology VARCHAR(30) NULL,
type VARCHAR(20) NULL,
latitude VARCHAR(30) NULL,
longitude VARCHAR(30) NULL,
PRIMARY KEY (sample_id, investigator_id)
);
CREATE TABLE Investigator
(
id INT NOT NULL,
First_name VARCHAR(20) NULL,
Last_name VARCHAR(20) NULL,
gender_code VARCHAR(1) NULL,
degree VARCHAR(5) NULL,
affiliation VARCHAR(50) NULL,
PRIMARY KEY (id)
);
Rows represent individual objects• While each UML class become a table, values for each individual
object, i.e., real world member of a class become a row in the table– Each row is a record– Each row (tuple) is a function that assigns a constant value to
each attribute of the schema • e.g., function of name, type, porosity, …
INSERT INTO Aquifer (name, type, porosity, location)VALUES (‘Floridan’, ‘confined’, ‘’, ‘Florida’);
name type porosity location
Floridan confined Florida
This (green) part can be omitted if values
entered in order
Rows can be updated
After a table is populated, the values in each row can change using the update clause:
UPDATE AquiferSET name = ‘High Plains’WHERE name= ‘Ogallala’;
Table is a set of rows• Table is a set of rows with common attributes
• Each row is unique• Rows are unordered• Can get a subset of the rows with SQL• Can use union, intersection, or difference of the rows in two or more tables if
they have the same schema
name type porosity location
Floridan confined Florida
High Plains unconfined 10% South DakotaRows (tuples)
Columns (fields, attributes)
Primary Key (PK)• It is a set of attributes that guarantee the uniqueness of
each row in a table
• A primary key can consist of one or more fields in a table. When multiple fields are used as a primary key, they are called a composite key– (used in junction tables in 1:m relationships)
• Each primary key value must be unique within the table
Table CUSTOMER
CREATE TABLE Customer (customer_id integer PRIMARY KEY, last_name VARCHAR (100), first_name VARCHAR(100) );
customer_id
last_name
first_name
Foreign Key (FK)• A foreign key is a field (or fields) that points to (i.e., references) the primary key
of another table
• The purpose of the foreign key is to ensure referential integrity of the data
• Meaning: only values that are supposed to appear in the database are permitted
CREATE TABLE Analysis ( analysis_id integer primary key, analysisDate datetime, analyzer_id integer references Analyzer (analyzer_id));
// The analyzer_id in the Analysis table references the analyzer_id attribute of the Analyzer table.
Each analyzer may analyze many analysesEach analysis is done by only one analyzer
Analysis Analyzer
PK FK
PKanalyzer_id…
analysis_idanalyzer_idanalysisDate
…
Index• An index provides capability for quick access to data
– That is, it speeds up the retrieval of data
• Creating the proper index can drastically increase the performance of an application
• A database index is used in much the same way a reader uses a book index
• When a database has no index to use for searching, the result is similar to the reader who looks at every page in a book to find a word (or looks in a library to find a book): – The database engine needs to visit every row in a table. – In database terminology this is called a table scan
Index …• Index can be created on any combination of attributes on a table
CREATE TABLE Subscriber ( subscriber_id INT PRIMARY KEY, email_addressVARCHAR(255), first_name VARCHAR(255), last_name VARCHAR(255));
• If you want to quickly find an email address, create an index on the email_address
field:
CREATE INDEX Subscriber_email ON Subscriber (email_address);– Note: Subscriber_email is the name of the index
• This SQL statement allows us to quickly find an email address:
SELECT first_name, last_name FROM Subscriber WHERE email_address='email@domain.com';
Subscriber
PK Index
subscriber_idemail_addressfirst_namelast_name
Creating an IndexCREATE TABLE Employee (
employee_id INT PK,emp_name VARCHAR (200)
);
• When you first create a new table, there is no index created by default
• Use the following statement to create an index for this table
CREATE INDEX employee_name ON Employee (emp_name);employee_name is an arbitrary name, given to the index for easy access (it may be different from the attribute in the table)
Employee
PK Index
employee_idemp_name
UML Associations• Indicate the way tables are
functionally related to each other(in ER it is called relationships)
• Relationship between Well class and Aquifer class• The association will help us to find out which wells penetrate a
given aquifer
– We start by describing the association in natural language, focusing on how many individuals of one class participate, or are connected, to a single individual of the other class, and vice versa (i.e., in both directions)• Each aquifer is drilled by zero or more wells (0..*)• Each well drills into only one aquifer
• These statements help to identify the multiplicity (or cardinality in ER) of the association
PK PK
FK
0..*
well_idlocationdepthaquifer_id
…
1
aquifer_idaquifer_nameaquifer_type
…
Aquifer Well
Multiplicity or cardinality
Is drilled by
drills into
aquifer one and only one (1..1) drills into each well
Each aquifer is drilled by zero or more wells (0..*)
NOTE: The association name is the verb that describes the action (e.g., ‘drills into’ or ‘is drilled by’)
The maximum multiplicity indicates that this is a one-to-many association
1..1 0..*
aquifer_idnametypeporositythicknesstransmissivityhydraulicCond
Aquifer
well_idnumberlocationtypeisCasedDepthaquifer_id
Well
The Foreign Key represents UML association• For the 1..* association, we store the PK attribute of the ‘one’ side
(Aquifer) into the table of the many (Well)– See previous slide
• The copied PK becomes an FK (foreign key)
• The ‘one’ side of the association is the ‘parent’ and the ‘many’ side is the ‘child’ of the association– Note: cannot have a child (well) without the parent (Aquifer)
• Parent provides the PK; child receives it as FK• In other words: ‘one parent PK; many children FK!
• The FK in the ‘many’ side may be used as the PK, for example, if we want to uniquely associate a well to an aquifer, or, the table may have its own PK in addition to the FK!
Aquifer-Well Association
name location type porosity thickness transmissivity hydraulicCond
number location type isCased depth
Aquifer
Well
Primary Key (aquifer_id)
Primary Key (well_id) Foreign Key (aquifer_id)
1..1 Parent
child
aquifer_idnamelocationtypeporositythicknesstransmissivityhydraulicCond
Aquifer
well_idnumberlocationtypeisCasedDepthaquifer_id
Well
1..1
0..*
0..*
childParent
Referential Integrity• The FK added to the child table must have the same type and size as
the PK in the parent table
ALTER TABLE WellADD CONTRAINT well_idPRIMARY KEY (number, location); // primary key is a composite key
// Define the FK; and specify where it is found as PKADD CONSTRAINT aquifer_id FOREIGN KEY (aquifer_name, aquifer_location) // arbitrary namesREFERENCES Aquifer (name, location); // real names in Aquifer
• The FK constraints ensures that each well contains a valid aquifer name and location
• This maintains the referential integrity of the database
Updated Well Table
aquifer_name aquifer_location number location type isCased Depth
Floridan Florida F0001 Gainesville Confined Yes 20 m
High Plains Texas H0002 Amerillo unconfined No 12 m
Floridan Florida F0012 Daytona Beach
Confined No 40 m
Floridan Florida F0120 Daytona Beach
Confined Yes 100 m
Well Table
NOTES:• Some aquifers (e.g., Floridan) are repeated in the Well table• The Aquifer data are copied in the foreign key fields (i.e., the first two columns)
Aquifer data Well data
Here is the populated Well Table, with values from the Aquifer table copied into the foreign key (aquifer_id)
One-to-One Association
• There are rare cases where the relationship between two types of data is one-to-one (1..1)– Each item has a relationship to only one item of the other
type• e.g., the state name (Georgia) and its abbreviation (GA)• The mineral name and its formula (if it is constant)• Fold_Limb and Fold (composition)• Brain and Body (composition; will be covered
later)• In such a case, the two items are included as two columns in
one table– However, if in the future the relationship changes or the
table is too big, it is better to break them into two tables
Entity Entity1:1
One-to-Many Association
• One row in one table links to many rows in another, e.g.:– One geoscientist having published many papers– One person taking several samples– One sample having multiple analyses
• The two entities in the one-to-many relationship (1..*) must be put into two tables
• Note: A many-to-many (M:M) relationship is broken into two 1..* relationships (see next slides)
Entity Entity1:M
Many-to-many Association
• The relationship between some tables may be many-to-many, when:
• many rows in one table are linked to many rows in another table– Each geologist may study many samples– Each sample may be studied by many geologists
geologist_idnamespecialtyaffiliation
Geologist
sample_idnumbertype
Sample
0..* 0..*
studies
isStudiedBy
Entity EntityM:M
M:M …• Since the two tables cannot be children of each other, we cannot
show m:m associations directly– There is no place to put the FK– So, we create a junction (association, intermediary) table, where
the foreign keys of the two tables go (in addition to other attributes, if any). This creates two 1:M associations
– Connect the junction table with a dotted line to association line
geosPKnamespecialtyaffiliation
Geoscientist
samplePKnumbertype
Sample1..1 1..1
studies
studyPKtyperesultgeosPKsamplePK
Study
0..* 0..*
FKs
The Study table has many records of one geoscientist and many records of one sample!
M:M …
studentIDname
Student
courseID
course1..1 1..1
takes
SectionIDstudentIDcourseID
Section
0..* 0..*
FKs
PK PK
Generalization/Specialization• In many cases the attributes of a class (e.g., Fault) may be
inherited by several subclasses (NormalFault, ReverseFault, StrikeSlipFault)
• While the superclass generalizes the subclasses by providing the common attributes, the subclasses are said to specialize the superclass by adding additional unique attributes and behavior
• The hollow arrow represents the ‘is-a’ relationship
Fault
NormalFault ReverseFault Sibilings
0..1 0..1
1..1
• Each normalFault or reverseFault is-a fault (one-to-one taxonomic relationship)
Generalization/Specialization …• In this case the relationships are one-to-one• The primary key of the superclass will become the foreign key
in the subclass which will also be the primary key for the subclass (they inherit the pk as fk)
Fault table
PK (fault_id)
fault_id location length orientation name age netslip
FK (fault_id) uniqueAttribute
ReverseFault table
FK (fault_id) uniqueAttribute1
NormalFault table
1..1
0..10..1
Aggregation• A system is an aggregate of several intercommunicating
components (parts)– In aggregation, the components can exist on their own
without being part of the system, e.g., a shear zone may be aggregated from a set of rock, foliation, lineation, mylonite, cataclasite, vein, etc.
– These can exist on their own even after the fault dies.
– Tires, wheels, batteries, and engine of a car are other examples (they can be sold or recycled after the car is totaled)
– Human head, hand, and other parts are NOT examples of aggregation!
Aggregation …• Aggregation is represented with open diamond at the end of the
association line next to the parent, aggregated class• Multiplicity is 0..1 (is implied; may not be shown) at the parent
side, and 1..* (if there is a component; otherwise 0..*) at the component side (needs to be explicitly shown).
• (1 means the wheel is necessary)(there may be no car, but if there is, it should have one or more wheel)– the aggregated parent hasPart one or more (1..*) components
(wheels)– each component (wheel) is partOf zero or one (0..1) system– i.e., the component (wheel) may not be part of the system (car)
– It can stand on its own!
Car Wheel0..1 1..*
hasPart
Aggregation …Components may or may not (0..*) be part of an (0..1) aggregate
CarIDCarModelCarType
Car
CarIDtype…
Wheel
0..* 0..* 0..* 0..*
0..1
CarIDtype…
Engine
CarIDtype…
Wiper
CarIDtype…
Seat
FKs
PK
If engine is necessary for a car, then it should be 1..* for engineIf wiper is not necessary for a car, then it should be 0..* for wiper
Aggregation …Components may or may not (0..*) be part of an (0..1) aggregate
shearZoneNamethicknesslocation
ShearZone
shearZoneNametype…
Mylonite
0..* 0..* 0..* 0..*
0..1
shearZoneNametype…
Foliation
shearZoneNametype…
Lineation
shearZoneNametype…
Cataclasite
FKs
PK
Composition• In composition, in contrast to aggregation, the parts (i.e.,
components) cannot exist on their own without a parent– e.g., our head cannot exist without our body!– Meanders cannot exist without a river
– The parts are created with the parent (e.g., body and brain; fold and limb and fold axis), river and meander, and disappear when the parent disappears
• Composition is symbolized with a filled diamond on the parent side– Multiplicity on the diamond side is 1..1
Composition
fold_idtypeclasslocation
Fold
fold_idorientation…
FoldLimb
0..* 0..* 0..* 0..*
fold_idorientation…
FoldAxis
fold_idorientation…
FoldAxialPlane
fold_idorientation…
AxialPlaneFoliation
PK
implied
FK
Note: None of the components makes sense without the fold!They may not exist, but if they do, the must be with a composed parent
1..1
Recursive Association• Connects a class to itself, when objects of the same class have
different roles, e.g.,– A mineral may alter into other mineral– A drainage (tributary) may merge with other drainage (tributary)– A mineral may recrystallize into other mineral
• Instead of making two Mineral tables (which is bad design, below), we make one Mineral table which reflexes to itself (next slide)
mineral_idcomposition…
Mineral
Mineral_idcomposition…
Mineral
0..* 1..*alters toBad design:duplicates
information
Recursive association
mineral_idcomposition…
Mineral
mineral_idcomposition…
Mineral
0..* 1..*
altersToNot correct!Duplicates
information
mineral_idaltered_mineral_idcomposition…
MineralaltersInto
Correct!
isAlteredFrom
0..*
1..*
Each mineral may alters into zero or more mineral (may not alter)Each altered mineral (if it exists) is altered from one or more mineral This is an M:M relationship!
FKPK
Create an Alteration Class• Instead of the previous bad solution for an
M:M relationship, we can create an alteration class so that we can keep track of the alteration process:
mineral_idcomposition…
MineralgoesThrough
involves
1..*
1..*
mineral_idaltered_mineral_idcomposition…
Alteration
0..*
0..*
PK FKPK
Each mineral goes through zero or more alterationsEach alteration involves one or more minerals
top related