Top Banner
IS 257 - Fall 2002 2002.10.08- SLIDE 1 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management
51

2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 1

Database Design: From Conceptual Design to

Physical Relational Implementation

University of California, Berkeley

School of Information Management and Systems

SIMS 257: Database Management

Page 2: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 2

Lecture Outline

• Review– Access Methods

– Indexes and What to index

– Parallel storage systems (RAID)

– Integrity constraints

• Design to Relational Implementation

Page 3: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 3

Internal Model Access Methods

• Many types of access methods:– Physical Sequential– Indexed Sequential– Indexed Random– Inverted– Direct– Hashed

• Differences in – Access Efficiency– Storage Efficiency

Page 4: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 4

Indexed Sequential: Two Levels

Address

7

8

9

Key Value

385

678

805

001003

.

.150

705710

.

.785

251..

385

455480

.

.536

605610

.

.678

791..

805

Address

1

2

Key Value

150

385

Address

3

4

Key Value

536

678

Address

5

6

Key Value

785

805

Page 5: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 5

Indexed Random

AddressBlockNumber

2

1

3

2

1

ActualValue

Adams

Becker

Dumpling

Getta

Harty

BeckerHarty

AdamsGetta

Dumpling

Page 6: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 6

Btree

F | | P | | Z |

R | | S | | Z |H | | L | | P |B | | D | | F |

Devils

AcesBoilers

Cars

MinorsPanthers

SeminolesFlyers

HawkeyesHoosiers

Page 7: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 7

Inverted

AddressBlockNumber

1

2

3

ActualValue

CH 145

CS 201

CS 623

PH 345

CH 145101, 103,104

CS 201102

CS 623105, 106

Adams

Becker

Dumpling

Getta

Harty

Mobile

Studentname

CourseNumber

CH145

cs201

ch145

ch145

cs623

cs623

Page 8: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 8

When to Use Indexes

• Rules of thumb– Indexes are most useful on larger tables– Specify a unique index for the primary key of each

table– Indexes are most useful for attributes used as search

criteria or for joining tables– Indexes are useful if sorting is often done on the

attribute– Most useful when there are many different values for

an attribute– Some DBMS limit the number of indexes and the size

of the index key values– Some indexes will not retrieve NULL values

Page 9: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 9

RAID

• Provides parallel disks (and software) so that multiple pages can be retrieved simultaneously

• RAID stands for “Redundant Arrays of Inexpensive Disks” – invented by Randy Katz and Dave Patterson

here at Berkeley

• Some manufacturers have renamed the “inexpensive” part

Page 10: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 10

RAID Technology

ParallelWrites

Disk 2 Disk 3 Disk 4Disk 1

1 2 3 4

5 6 7 8

9 10 11 12

* * * ** * * ** * * *

ParallelReads

Stripe

Stripe

Stripe

One logical disk drive

Page 11: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 11

Raid 0

ParallelWrites

Disk 2 Disk 3 Disk 4Disk 1

1 2 3 4

5 6 7 8

9 10 11 12

* * * ** * * ** * * *

ParallelReads

Stripe

Stripe

Stripe

One logical disk drive

Page 12: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 12

RAID-1

ParallelWrites

Disk 2 Disk 3 Disk 4Disk 1

1 1 2 2

3 3 4 4

5 5 6 6

* * * ** * * ** * * *

ParallelReads

Stripe

Stripe

Stripe

Page 13: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 13

RAID-2

Writes span all drives

Disk 2 Disk 3 Disk 4Disk 1

1a 1b ecc ecc

2a 2b ecc ecc

3a 3b ecc ecc

* * * ** * * ** * * *

Reads span all drives

Stripe

Stripe

Stripe

Page 14: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 14

RAID-3

Writes span all drives

Disk 2 Disk 3 Disk 4Disk 1

1a 1b 1c ecc

2a 2b 2c ecc

3a 3b 3c ecc

* * * ** * * ** * * *

Reads span all drives

Stripe

Stripe

Stripe

Page 15: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 15

Raid-4

Disk 2 Disk 3 Disk 4Disk 1

1 2 3 ecc

4 5 6 ecc

7 8 9 ecc

* * * ** * * ** * * *

Stripe

Stripe

Stripe

ParallelWrites

ParallelReads

Page 16: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 16

RAID-5

ParallelWrites

Disk 2 Disk 3 Disk 4Disk 1

1 2 3 4

5 6 7 8

9 10 11 12

ecc ecc ecc ecc* * * ** * * *

ParallelReads

Stripe

Stripe

Stripe

Page 17: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 17

Integrity Constraints

• The constraints we wish to impose in order to protect the database from becoming inconsistent.

• Five types– Required data– attribute domain constraints– entity integrity– referential integrity– enterprise constraints

Page 18: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 18

Review

• Database Design Process

• Normalization

Page 19: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 19

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 20: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 20

Today: New Design

• Today we will build the COOKIE database from needs (rough) through the conceptual model, logical model and finally physical implementation in Access.

Page 21: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 21

Cookie Requirements

• Cookie is a bibliographic database that contains information about a hypothetical union catalog of several libraries.

• Need to record which books are held by which libraries

• Need to search on bibliographic information– Author, title, subject, call number for a given library,

etc.

• Need to know who publishes the books for ordering, etc.

Page 22: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 22

Cookie Database

• There are currently 6 main types of entities in the database – Authors (Authors)– Books (bibfile)– Local Call numbers (callfile)– Libraries (libfile)– Publishers (pubfile)– Subject headings (subfile)– Additional entities

• Links between subject and books (indxfile)• Links between authors and books (AU_BIB)

Page 23: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 23

AUTHORS

• author -- The author’s name (We do not distinguish between Personal and Corporate authors)

• Au_id – a unique id for the author

Page 24: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 24

AUTHORS

Authors

AuthorAU ID

Page 25: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 25

BIBFILE

• Books (BIBFILE) contains information about particular books. It includes one record for each book. The attributes are:– accno -- an “accession” or serial number– title -- The title of the book– loc -- Location of publication (where published)– date -- Date of publication– price -- Price of the book– pagination -- Number of pages– ill -- What type of illustrations (maps, etc) if any– height -- Height of the book in centimeters

Page 26: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 26

Books/BIBFILE

Books

accno Title

Loc

Date

Price

Pagination

HeightIll

Page 27: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 27

CALLFILE

• CALLFILE contains call numbers and holdings information linking particular books with particular libraries. Its attributes are:– accno -- the book accession number– libid -- the id of the holding library– callno -- the call number of the book in the

particular library– copies -- the number of copies held by the

particular library

Page 28: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 28

LocalInfo/CALLFILE

CALLFILE

Copiesaccno

libid Callno

Page 29: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 29

LIBFILE

• LIBFILE contain information about the libraries participating in this union catalog. Its attributes include:– libid -- Library id number– library -- Name of the library– laddress -- Street address for the library– lcity -- City name– lstate -- State code (postal abbreviation)– lzip -- zip code– lphone -- Phone number– mop - suncl -- Library opening and closing times for

each day of the week.

Page 30: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 30

Libraries/LIBFILE

LIBFILE

LibidSatCl

SatOp

FCl

FOp

ThCl

ThOpWClWOpTuClTuOp

Mcl

MOp

Suncl

SunOp

lphone

lziplstate lcityladdressLibrary

Page 31: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 31

PUBFILE

• PUBFILE contain information about the publishers of books. Its attributes include– pubid -- The publisher’s id number– publisher -- Publisher name– paddress -- Publisher street address– pcity -- Publisher city– pstate -- Publisher state– pzip -- Publisher zip code– pphone -- Publisher phone number– ship -- standard shipping time in days

Page 32: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 32

Publisher/PUBFILE

PUBFILEpubid

Ship

Publisher

pphone

pzip

pstate

pcity

paddress

Page 33: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 33

SUBFILE

• SUBFILE contains each unique subject heading that can be assigned to books. Its attributes are– subcode -- Subject identification number– subject -- the subject heading/description

Page 34: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 34

Subjects/SUBFILE

SUBFILE

Subjectsubid

Page 35: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 35

INDXFILE

• INDXFILE provides a way to allow many-to-many mapping of subject headings to books. Its attributes consist entirely of links to other tables– subcode -- link to subject id– accno -- link to book accession number

Page 36: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 36

Linking Subjects and Books

INDXFILE

accnosubid

Page 37: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 37

AU_BIB

• AU_BIB provides a way to allow many to many mapping between books and authors. It also consists only of links to other tables– AU_ID – link to the AUTHORS table– ACCNO – link to the BIBFILE table

Page 38: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 38

Linking Authors and Books

AU_BIB

accnoAU ID

Page 39: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 39

Some examples of Cookie Searches

• Who wrote Microcosmographia Academica?• How many pages long is Alfred Whitehead’s The Aims of Education

and Other Essays?• Which branches in Berkeley’s public library system are open on

Sunday?• What is the call number of Moffitt Library’s copy of Abraham

Flexner’s book Universities: American, English, German?• What books on the subject of higher education are among the

holdings of Berkeley (both UC and City) libraries?• Print a list of the Mechanics Library holdings, in descending order by

height.• What would it cost to replace every copy of each book that contains

illustrations (including graphs, maps, portraits, etc.)?• Which library closes earliest on Friday night?

Page 40: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 40

Cookie ER Diagram

AU_ID

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHORS

AU_BIBaccno

AU ID

Author

Note: diagramcontains onlyattributes usedfor linking

Page 41: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 41

What Problems?

• What sorts of problems and missing features arise given the previous ER diagram?

Page 42: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 42

Problems Identified

• Subtitles, parallel titles?• Edition information• Series information• lending status• material type designation• Genre, class information• Better codes (ISBN?)• Missing information

(ISBN)

• Authority control for authors

• Missing/incomplete data• Data entry problems• Ordering information• Illustrations• Subfield separation (such

as last_name, first_name)• Separate personal and

corporate authors

Page 43: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 43

Problems (Cont.)

• Location field inconsistent

• No notes field• No language field• Zipcode doesn’t

support plus-4• No publisher shipping

addresses

• No (indexable) keyword search capability

• No support for multivolume works

• No support for URLs – to online version– to libraries– to publishers

Page 44: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 44

Original Cookie ER Diagram

AU_ID

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHORS

AU_BIBaccno

AU ID

Author

Note: diagramcontains onlyattributes usedfor linking

Page 45: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 45

nameid

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHFILE

AUTHBIB

authtype

accno

nameid

name

Cookie2: Separate Name Authorities

Page 46: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 46

Cookie 3: Keywords

nameid

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHFILE

AUTHBIB

authtype

accno

nameid

name

KEYMAP TERMS

accno termid termid

Page 47: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 47

Cookie 4: Series

nameid

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHFILE

AUTHBIB

authtype

accno

nameid

name

KEYMAP TERMS

accno termid termid

SERIES

seriesid

seriesid

ser_title

Page 48: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 48

Cookie 5: Circulation

nameid

BIBFILE

pubid

LIBFILE

accno

libid

CALLFILE

pubidPUBFILE

libidaccno

INDXFILE SUBFILE

subcodeaccno subcodeAUTHFILE

AUTHBIB

authtype

accno

nameid

name

KEYMAP TERMS

accno termid termid

SERIES

seriesid

seriesid

ser_title

CIRC

circidcopynumpatronid

PATRON

circid

Page 49: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 49

Logical Model: Mapping to Relations

• Take each entity– BIBFILE– LIBFILE– CALLFILE– SUBFILE– PUBFILE– INDXFILE

• And make it a table...

Page 50: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 50

Implementing the Physical Database...

• For each of the entities, we will build a table…

• Start up access…

• Use “New” in Tables…

• Loading data

• Entering data

• Data entry forms

Page 51: 2002.10.08- SLIDE 1IS 257 - Fall 2002 Database Design: From Conceptual Design to Physical Relational Implementation University of California, Berkeley.

IS 257 - Fall 2002 2002.10.08- SLIDE 51

Next Time

• Relational Operations

• Relational Algebra

• Relational Calculus

• Introduction to SQL