Database Design. Database A database is a collection of information that is organized so that it can easily be accessed, managed, and updated Databases.

Database Design

Database• A database is a collection of information that is

organized so that it can easily be accessed, managed, and updated

• Databases typically contain aggregations of data records or files, such as chemical analysis, experiments, investigators, and sample

• Typically, a database manager provides users the capabilities of controlling read/write access, specifying report generation, and analyzing usage

Database structure

• Databases are organized by columns, records, and tables

• A column is a single piece of information; a record is one complete set of columns (shown in a row); and a table is a collection of records

• For example, a telephone book is analogous to a table. It contains a list of records, each of which consists of three fields: – name, address, and telephone number

• To access information from a database, you need a Database

Management System (DBMS), e.g., MS Access

• This is a collection of programs that enables you to enter, organize, and select data in a database

Relational Database• A relational database is a collection of data items organized as a

set of tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables

• The relational model specifies that the rows of a table have no specific order and that the rows, in turn, impose no order on the attributes

• Applications access data by specifying queries, which use operations such as select to identify rows, project to identify attributes, and join to combine relations (tables)

• Relations (Tables) can be modified using the insert, delete, and update operators.

...• A relational database is a set of tables containing data organized

into predefined categories

• Each table (which is sometimes called a relation) contains one or more data categories in columns (fields)

• Each row contains a unique instance of data for the categories defined by the columns

• For example, a typical Investigator database would include a table (Investigator) that describes an investigator with columns (fields) for name, address, phone number, affiliation, degree, etc.

• Another table (e.g., Sample) would describe fields such as sample_id, number, type, lithology, date taken, etc.

Relational Model• Relational model was developed by E.F. Codd in 1970, and

is the basis for all RDBs

• It uses the set theory for database modeling, based on relational algebra (RA) which is used to manipulate objects in a RDB

• It involves the required information that may not appear in the UML diagram but is needed for the database

• SQL is a declarative language, used to manipulate RDBs applying relational algebra

Table (Relation)• A table is defined as a set of rows that have the

same attributes (fields under columns)

– That is, a table is organized into rows and columns

• A row usually represents one object and information (record) about that object

• Objects are typically physical objects or concepts

View of the Database

• A user of the database could obtain a view of the database that fitted the user's needs

• For example, a head of a Geological Survey might like a view or report on:– all geological quadrangle maps that were completed by a

certain date

• A geologist in the same survey could, from the same tables, obtain a report on:– collected samples which need chemical analysis

• In addition to being relatively easy to create and access, a relational database has the important advantage of being easy to extend

• After the original database creation, a new data category (e.g., table) can be added without requiring that all existing applications be modified

Create a model of the real world

• Design of a database involves modeling the domain (here it means universe of discourse)– Model: simplified abstraction of the real world

• The database should contain information needed by the people who use it

• UML (Unified Modeling Language) is perhaps the best modeling tool, along with ERD (entity relationship diagram)– We will model our database in MySQL Workbench or Microsoft

Access

UML Class Diagram

• UML class diagrams are used to define classes (equivalent to entity in ER diagrams)– Each class represents a type for all physical and conceptual things or an

event• For example, Ocean, Aquifer, Mineral, Folding

• First step: Define the class in English, e.g.,

• Aquifer: “A rock unit that can store and transmit economically significant amount of water”

• Each class (singular form) has several attributes, which define the relevant descriptive (natural) properties or characteristics (those other than ID, keys) of each member of the class.

nametypeporositythicknesstransmissivityhydraulicCond

Aquifer

Relation (Table) Schema• The UML class is translated into a relation

schema identified by the class name (Aquifer) with all the attributes of the class, e.g.,:

• Aquifer schema:

Assignment rule: Associates each attribute to a set of valid values (i.e., defines the domain)

name type porosity permeability

Data Integrity• When creating a relational database, we define the

domain of possible values in a data column (i.e., for each attribute) and further constraints that may apply to that data value

• This is done for data integrity so that invalid values will not be entered into the database

– e.g., DATETIME, STRING, CHAR, INT

• Domain: Set of valid values that can be assigned to an attribute– This can be done by assigning character length, data type, format,

e.g., of email and URL address, zip codes, state names, etc. – Many of these should be handled with the user interface (in forms to

collect user data), and enumerated lists

Constraints• Constraints allow further restriction of the

domain of an attribute. – You can place constraints to limit the type of data

that is stored in a table

– For example, a constraint can restrict a given integer attribute to values between 1 and 5

Unique constraint• A UNIQUE constraint ensures that all values in a column are

distinct (i.e., not duplicated)

CREATE TABLE Customer (customer_id integer UNIQUE,

last_name VARCHAR(100), first_name VARCHAR(100));

In this example, the customer_id column has a unique constraint, and cannot include duplicate values

customer_id last_name first_name1 Johnson Gloria2 Smith Gina3 Spalding David

Unique Constraint …

• If the previous table already has a customer_id with a value of ‘3’, executing the following SQL statement, will result in an error because '3' already exists in the ID column:

INSERT INTO Customer values ('3','Jones','Kaila');

• Trying to insert another row with that value violates the UNIQUE constraint

Default Constraint• A DEFAULT constraint provides a default value for

a column when the INSERT INTO statement does not provide a specific value

CREATE TABLE Student (student_id INTEGER UNIQUE NOT NULL,

last_name VARCHAR (100), first_name VARCHAR (100), score INTEGER DEFAULT 75);

Default value …INSERT INTO Student (student_id, last_name, first_name) VALUES ('1','James','Alex');

• The populated Student table will look like the following after execution:

• Even though we didn't specify a value for the “score" column in the INSERT INTO statement, it does get assigned the default value of 75 since we had already set 75 as the default value for this column

student_id last_name first_name score

1 James Alex 75

Check Constraint• A CHECK constraint ensures that all values in a column satisfy certain

conditions. Once defined, the database will only insert a new row or update an existing row if the new value satisfies the CHECK constraint

• The CHECK constraint is used to ensure data quality

• For example, in the following CREATE TABLE statement,

CREATE TABLE Customer (customer_id integer CHECK (customer_id > 0), last_name varchar (30), first_name varchar(30));

• Column “customer_id" has a constraint -- its value must only include integers greater than 0. So, attempting to execute the following statement, will result in an error because the values for ID must be greater than 0

INSERT INTO Customer values ('-5','Smith','Lynn'); // leads to error

Set notation

A schema is represented by a set notation

Aquifer= {name, type, porosity, permeability, …}

– Order of the attributes does not matter– Attributes cannot be duplicated– Rules can define domains for each element of the set

• Can also limit the domain of an attribute to a specific range of values

– We can define subset of the set (e.g., views)– All can be manipulated with set operators (e.g., union, to

combine two relations or tables)

Relation = Table• The structure of each relation schema defines a

• Use SQL syntax to create a table:CREATE TABLE Aquifer (

id INT NOT NULL,name VARCHAR(20) NOT NULL,type VARCHAR(30) NOT NULL, porosity VARCHAR (10) NULL,permeability VARCHAR (20) NULLPRIMARY KEY (‘id’));

Variable-length character string

i.e., value must be provided (not

optional)

CREATE TABLE Sample

sample_id INT NOT NULL,

investigator_id VARCHAR(20) NOT NULL,

sample_num VARCHAR(20) NOT NULL,

lithology VARCHAR(30) NULL,

type VARCHAR(20) NULL,

latitude VARCHAR(30) NULL,

longitude VARCHAR(30) NULL,

PRIMARY KEY (sample_id, investigator_id)

CREATE TABLE Investigator

id INT NOT NULL,

First_name VARCHAR(20) NULL,

Last_name VARCHAR(20) NULL,

gender_code VARCHAR(1) NULL,

degree VARCHAR(5) NULL,

affiliation VARCHAR(50) NULL,

PRIMARY KEY (id)

Rows represent individual objects• While each UML class become a table, values for each individual

object, i.e., real world member of a class become a row in the table– Each row is a record– Each row (tuple) is a function that assigns a constant value to

each attribute of the schema • e.g., function of name, type, porosity, …

INSERT INTO Aquifer (name, type, porosity, location)VALUES (‘Floridan’, ‘confined’, ‘’, ‘Florida’);

name type porosity location

Floridan confined Florida

This (green) part can be omitted if values

entered in order

Rows can be updated

After a table is populated, the values in each row can change using the update clause:

UPDATE AquiferSET name = ‘High Plains’WHERE name= ‘Ogallala’;

Table is a set of rows• Table is a set of rows with common attributes

• Each row is unique• Rows are unordered• Can get a subset of the rows with SQL• Can use union, intersection, or difference of the rows in two or more tables if

they have the same schema

name type porosity location

Floridan confined Florida

High Plains unconfined 10% South DakotaRows (tuples)

Columns (fields, attributes)

Primary Key (PK)• It is a set of attributes that guarantee the uniqueness of

each row in a table

• A primary key can consist of one or more fields in a table. When multiple fields are used as a primary key, they are called a composite key– (used in junction tables in 1:m relationships)

• Each primary key value must be unique within the table

Table CUSTOMER

CREATE TABLE Customer (customer_id integer PRIMARY KEY, last_name VARCHAR (100), first_name VARCHAR(100) );

customer_id

last_name

first_name

Foreign Key (FK)• A foreign key is a field (or fields) that points to (i.e., references) the primary key

of another table

• The purpose of the foreign key is to ensure referential integrity of the data

• Meaning: only values that are supposed to appear in the database are permitted

CREATE TABLE Analysis ( analysis_id integer primary key, analysisDate datetime, analyzer_id integer references Analyzer (analyzer_id));

// The analyzer_id in the Analysis table references the analyzer_id attribute of the Analyzer table.

Each analyzer may analyze many analysesEach analysis is done by only one analyzer

Analysis Analyzer

PKanalyzer_id…

analysis_idanalyzer_idanalysisDate

Index• An index provides capability for quick access to data

– That is, it speeds up the retrieval of data

• Creating the proper index can drastically increase the performance of an application

• A database index is used in much the same way a reader uses a book index

• When a database has no index to use for searching, the result is similar to the reader who looks at every page in a book to find a word (or looks in a library to find a book): – The database engine needs to visit every row in a table. – In database terminology this is called a table scan

Index …• Index can be created on any combination of attributes on a table

CREATE TABLE Subscriber ( subscriber_id INT PRIMARY KEY, email_addressVARCHAR(255), first_name VARCHAR(255), last_name VARCHAR(255));

• If you want to quickly find an email address, create an index on the email_address

field:

CREATE INDEX Subscriber_email ON Subscriber (email_address);– Note: Subscriber_email is the name of the index

• This SQL statement allows us to quickly find an email address:

SELECT first_name, last_name FROM Subscriber WHERE email_address='email@domain.com';

Subscriber

PK Index

subscriber_idemail_addressfirst_namelast_name

Creating an IndexCREATE TABLE Employee (

employee_id INT PK,emp_name VARCHAR (200)

• When you first create a new table, there is no index created by default

• Use the following statement to create an index for this table

CREATE INDEX employee_name ON Employee (emp_name);employee_name is an arbitrary name, given to the index for easy access (it may be different from the attribute in the table)

Employee

PK Index

employee_idemp_name

UML Associations• Indicate the way tables are

functionally related to each other(in ER it is called relationships)

• Relationship between Well class and Aquifer class• The association will help us to find out which wells penetrate a

given aquifer

– We start by describing the association in natural language, focusing on how many individuals of one class participate, or are connected, to a single individual of the other class, and vice versa (i.e., in both directions)• Each aquifer is drilled by zero or more wells (0..*)• Each well drills into only one aquifer

• These statements help to identify the multiplicity (or cardinality in ER) of the association

well_idlocationdepthaquifer_id

aquifer_idaquifer_nameaquifer_type

Aquifer Well

Multiplicity or cardinality

Is drilled by

drills into

aquifer one and only one (1..1) drills into each well

Each aquifer is drilled by zero or more wells (0..*)

NOTE: The association name is the verb that describes the action (e.g., ‘drills into’ or ‘is drilled by’)

The maximum multiplicity indicates that this is a one-to-many association

1..1 0..*

aquifer_idnametypeporositythicknesstransmissivityhydraulicCond

Aquifer

well_idnumberlocationtypeisCasedDepthaquifer_id

The Foreign Key represents UML association• For the 1..* association, we store the PK attribute of the ‘one’ side

(Aquifer) into the table of the many (Well)– See previous slide

• The copied PK becomes an FK (foreign key)

• The ‘one’ side of the association is the ‘parent’ and the ‘many’ side is the ‘child’ of the association– Note: cannot have a child (well) without the parent (Aquifer)

• Parent provides the PK; child receives it as FK• In other words: ‘one parent PK; many children FK!

• The FK in the ‘many’ side may be used as the PK, for example, if we want to uniquely associate a well to an aquifer, or, the table may have its own PK in addition to the FK!

Aquifer-Well Association

name location type porosity thickness transmissivity hydraulicCond

number location type isCased depth

Aquifer

Primary Key (aquifer_id)

Primary Key (well_id) Foreign Key (aquifer_id)

1..1 Parent

aquifer_idnamelocationtypeporositythicknesstransmissivityhydraulicCond

Aquifer

well_idnumberlocationtypeisCasedDepthaquifer_id

childParent

Referential Integrity• The FK added to the child table must have the same type and size as

the PK in the parent table

ALTER TABLE WellADD CONTRAINT well_idPRIMARY KEY (number, location); // primary key is a composite key

// Define the FK; and specify where it is found as PKADD CONSTRAINT aquifer_id FOREIGN KEY (aquifer_name, aquifer_location) // arbitrary namesREFERENCES Aquifer (name, location); // real names in Aquifer

• The FK constraints ensures that each well contains a valid aquifer name and location

• This maintains the referential integrity of the database

Updated Well Table

aquifer_name aquifer_location number location type isCased Depth

Floridan Florida F0001 Gainesville Confined Yes 20 m

High Plains Texas H0002 Amerillo unconfined No 12 m

Floridan Florida F0012 Daytona Beach

Confined No 40 m

Floridan Florida F0120 Daytona Beach

Confined Yes 100 m

Well Table

NOTES:• Some aquifers (e.g., Floridan) are repeated in the Well table• The Aquifer data are copied in the foreign key fields (i.e., the first two columns)

Aquifer data Well data

Here is the populated Well Table, with values from the Aquifer table copied into the foreign key (aquifer_id)

One-to-One Association

• There are rare cases where the relationship between two types of data is one-to-one (1..1)– Each item has a relationship to only one item of the other

type• e.g., the state name (Georgia) and its abbreviation (GA)• The mineral name and its formula (if it is constant)• Fold_Limb and Fold (composition)• Brain and Body (composition; will be covered

later)• In such a case, the two items are included as two columns in

one table– However, if in the future the relationship changes or the

table is too big, it is better to break them into two tables

Entity Entity1:1

One-to-Many Association

• One row in one table links to many rows in another, e.g.:– One geoscientist having published many papers– One person taking several samples– One sample having multiple analyses

• The two entities in the one-to-many relationship (1..*) must be put into two tables

• Note: A many-to-many (M:M) relationship is broken into two 1..* relationships (see next slides)

Entity Entity1:M

Many-to-many Association

• The relationship between some tables may be many-to-many, when:

• many rows in one table are linked to many rows in another table– Each geologist may study many samples– Each sample may be studied by many geologists

geologist_idnamespecialtyaffiliation

Geologist

sample_idnumbertype

Sample

0..* 0..*

studies

isStudiedBy

Entity EntityM:M

M:M …• Since the two tables cannot be children of each other, we cannot

show m:m associations directly– There is no place to put the FK– So, we create a junction (association, intermediary) table, where

the foreign keys of the two tables go (in addition to other attributes, if any). This creates two 1:M associations

– Connect the junction table with a dotted line to association line

geosPKnamespecialtyaffiliation

Geoscientist

samplePKnumbertype

Sample1..1 1..1

studies

studyPKtyperesultgeosPKsamplePK

0..* 0..*

The Study table has many records of one geoscientist and many records of one sample!

M:M …

studentIDname

Student

courseID

course1..1 1..1

SectionIDstudentIDcourseID

Section

0..* 0..*

Generalization/Specialization• In many cases the attributes of a class (e.g., Fault) may be

inherited by several subclasses (NormalFault, ReverseFault, StrikeSlipFault)

• While the superclass generalizes the subclasses by providing the common attributes, the subclasses are said to specialize the superclass by adding additional unique attributes and behavior

• The hollow arrow represents the ‘is-a’ relationship

NormalFault ReverseFault Sibilings

0..1 0..1

• Each normalFault or reverseFault is-a fault (one-to-one taxonomic relationship)

Generalization/Specialization …• In this case the relationships are one-to-one• The primary key of the superclass will become the foreign key

in the subclass which will also be the primary key for the subclass (they inherit the pk as fk)

Fault table

PK (fault_id)

fault_id location length orientation name age netslip

FK (fault_id) uniqueAttribute

ReverseFault table

FK (fault_id) uniqueAttribute1

NormalFault table

0..10..1

Aggregation• A system is an aggregate of several intercommunicating

components (parts)– In aggregation, the components can exist on their own

without being part of the system, e.g., a shear zone may be aggregated from a set of rock, foliation, lineation, mylonite, cataclasite, vein, etc.

– These can exist on their own even after the fault dies.

– Tires, wheels, batteries, and engine of a car are other examples (they can be sold or recycled after the car is totaled)

– Human head, hand, and other parts are NOT examples of aggregation!

Aggregation …• Aggregation is represented with open diamond at the end of the

association line next to the parent, aggregated class• Multiplicity is 0..1 (is implied; may not be shown) at the parent

side, and 1..* (if there is a component; otherwise 0..*) at the component side (needs to be explicitly shown).

• (1 means the wheel is necessary)(there may be no car, but if there is, it should have one or more wheel)– the aggregated parent hasPart one or more (1..*) components

(wheels)– each component (wheel) is partOf zero or one (0..1) system– i.e., the component (wheel) may not be part of the system (car)

– It can stand on its own!

Car Wheel0..1 1..*

hasPart

Aggregation …Components may or may not (0..*) be part of an (0..1) aggregate

CarIDCarModelCarType

CarIDtype…

0..* 0..* 0..* 0..*

CarIDtype…

Engine

CarIDtype…

If engine is necessary for a car, then it should be 1..* for engineIf wiper is not necessary for a car, then it should be 0..* for wiper

Aggregation …Components may or may not (0..*) be part of an (0..1) aggregate

shearZoneNamethicknesslocation

ShearZone

shearZoneNametype…

Mylonite

0..* 0..* 0..* 0..*

Foliation

Lineation

Cataclasite

Composition• In composition, in contrast to aggregation, the parts (i.e.,

components) cannot exist on their own without a parent– e.g., our head cannot exist without our body!– Meanders cannot exist without a river

– The parts are created with the parent (e.g., body and brain; fold and limb and fold axis), river and meander, and disappear when the parent disappears

• Composition is symbolized with a filled diamond on the parent side– Multiplicity on the diamond side is 1..1

Composition

fold_idtypeclasslocation

fold_idorientation…

FoldLimb

0..* 0..* 0..* 0..*

FoldAxis

FoldAxialPlane

AxialPlaneFoliation

implied

Note: None of the components makes sense without the fold!They may not exist, but if they do, the must be with a composed parent

Recursive Association• Connects a class to itself, when objects of the same class have

different roles, e.g.,– A mineral may alter into other mineral– A drainage (tributary) may merge with other drainage (tributary)– A mineral may recrystallize into other mineral

• Instead of making two Mineral tables (which is bad design, below), we make one Mineral table which reflexes to itself (next slide)

mineral_idcomposition…

Mineral

Mineral_idcomposition…

Mineral

0..* 1..*alters toBad design:duplicates

information

Recursive association

Mineral

0..* 1..*

altersToNot correct!Duplicates

information

mineral_idaltered_mineral_idcomposition…

MineralaltersInto

Correct!

isAlteredFrom

Each mineral may alters into zero or more mineral (may not alter)Each altered mineral (if it exists) is altered from one or more mineral This is an M:M relationship!

Create an Alteration Class• Instead of the previous bad solution for an

M:M relationship, we can create an alteration class so that we can keep track of the alteration process:

MineralgoesThrough

involves

mineral_idaltered_mineral_idcomposition…

Alteration

PK FKPK

Each mineral goes through zero or more alterationsEach alteration involves one or more minerals

Database Design. Database A database is a collection of information that is organized so that it can easily be accessed, managed, and updated Databases.

database tables

database modeling

database manager

database sql

typical investigator

table investigator

set of tables

columns fields

Documents

2017 Legislative Update & Strategy Workshop · Criminal...

Indexing. 421: Database Systems - Index Structures 2 Cost...

DISTRIBUTED DATABASE SYSTEM. A distributed database system....

Databases and Database Users CHAPTER 1. Introduction ...

Report Outline - piac.ca Web viewReport to Shareholders, ......

[MS-OLEDBSTR]: OLEDB Connection String Structure...database....

Chapter 1 Introducing Base - The Document Foundation ·...

Fiche d’information des Statistiques sanitaires...

Trends & issueshave been accessed, but also on customers who...

Database Design - Dixie State UniversityIntroduction to...

01 AccessEd

Accessed by

National Neonatal-Perinatal DatabaseNational Neonatal...

Database Learning: Toward a Database that Becomes Smarter...

LECTURE TEN DATABASE INTEGRITY AND SECURITY. Database...

The NexRad Radar Rainfall tool is accessed through the ET...