Top Banner
9/9/1999 Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval
43

9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

Dec 28, 2015

Download

Documents

Lorin Caldwell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Database Design: From Conceptual Design to Physical

ImplementationUniversity of California, Berkeley

School of Information Management and Systems

SIMS 202: Information Organization and Retrieval

Page 2: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Review

• Database Design Process

• Normalization

Page 3: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 4: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Normalization

• Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data

• Normalization is a multi-step process beginning with an “unnormalized” relation

– Hospital example from Atre, S. Data Base: Structured Techniques for

Design, Performance, and Management.

Page 5: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Normal Forms

• First Normal Form (1NF)

• Second Normal Form (2NF)

• Third Normal Form (3NF)

• Boyce-Codd Normal Form (BCNF)

• Fourth Normal Form (4NF)

• Fifth Normal Form (5NF)

Page 6: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Normalization

Boyce-Codd and

Higher

Functional dependencyof nonkey attributes on the primary key - Atomic values only

Full Functional dependencyof nonkey attributes on the primary key

No transitive dependency between nonkey attributes

All determinants are candidate keys - Single multivalued dependency

Page 7: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Unnormalized Relations

• First step in normalization is to convert the data into a two-dimensional table

• In unnormalized relations data can repeat within a column

Page 8: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Unnormalized RelationPatient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drugDrug side effects

1111145 311

Jan 1, 1995; June 12, 1995 John White

15 New St. New York, NY

Beth Little Michael Diamond

Gallstones removal; Kidney stones removal

Penicillin, none-

rash none

1234243 467

Apr 5, 1994 May 10, 1995 Mary Jones

10 Main St. Rye, NY

Charles Field Patricia Gold

Eye Cataract removal Thrombosis removal

Tetracycline none

Fever none

2345 189Jan 8, 1996 Charles Brown

Dogwood Lane Harrison, NY

David Rosen

Open Heart Surgery

Cephalosporin none

4876 145Nov 5, 1995 Hal Kane

55 Boston Post Road, Chester, CN Beth Little

Cholecystectomy Demicillin none

5123 145May 10, 1995 Paul Kosher

Blind Brook Mamaroneck, NY Beth Little

Gallstones Removal none none

6845 243

Apr 5, 1994 Dec 15, 1984 Ann Hood

Hilton Road Larchmont, NY

Charles Field

Eye Cornea Replacement Eye cataract removal

Tetracycline Fever

Page 9: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

First Normal FormPatient # Surgeon #Surgery DatePatient NamePatient AddrSurgeon Name Surgery Drug adminSide Effects

1111 145 01-Jan-95 John White

15 New St. New York, NY Beth Little

Gallstones removal Penicillin rash

1111 311 12-Jun-95 John White

15 New St. New York, NY

Michael Diamond

Kidney stones removal none none

1234 243 05-Apr-94 Mary Jones10 Main St. Rye, NY Charles Field

Eye Cataract removal

Tetracycline Fever

1234 467 10-May-95 Mary Jones10 Main St. Rye, NY Patricia Gold

Thrombosis removal none none

2345 189 08-Jan-96Charles Brown

Dogwood Lane Harrison, NY David Rosen

Open Heart Surgery

Cephalosporin none

4876 145 05-Nov-95 Hal Kane

55 Boston Post Road, Chester, CN Beth Little

Cholecystectomy Demicillin none

5123 145 10-May-95 Paul Kosher

Blind Brook Mamaroneck, NY Beth Little

Gallstones Removal none none

6845 243 05-Apr-94 Ann Hood

Hilton Road Larchmont, NY Charles Field

Eye Cornea Replacement

Tetracycline Fever

6845 243 15-Dec-84 Ann Hood

Hilton Road Larchmont, NY Charles Field

Eye cataract removal none none

Page 10: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Second Normal FormPatient # Patient Name Patient Address

1111 John White15 New St. New York, NY

1234 Mary Jones10 Main St. Rye, NY

2345Charles Brown

Dogwood Lane Harrison, NY

4876 Hal Kane55 Boston Post Road, Chester,

5123 Paul KosherBlind Brook Mamaroneck, NY

6845 Ann HoodHilton Road Larchmont, NY

Page 11: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Second Normal FormSurgeon # Surgeon Name

145 Beth Little

189 David Rosen

243 Charles Field

311 Michael Diamond

467 Patricia Gold

Page 12: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Second Normal FormPatient # Surgeon # Surgery Date Surgery Drug Admin Side Effects

1111 145 01-Jan-95Gallstones removal Penicillin rash

1111 311 12-Jun-95

Kidney stones removal none none

1234 243 05-Apr-94Eye Cataract removal Tetracycline Fever

1234 467 10-May-95Thrombosis removal none none

2345 189 08-Jan-96Open Heart Surgery

Cephalosporin none

4876 145 05-Nov-95Cholecystectomy Demicillin none

5123 145 10-May-95Gallstones Removal none none

6845 243 15-Dec-84Eye cataract removal none none

6845 243 05-Apr-94Eye Cornea Replacement Tetracycline Fever

Page 13: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Third Normal FormPatient # Surgeon # Surgery Date Surgery Drug Admin

1111 145 01-Jan-95 Gallstones removal Penicillin

1111 311 12-Jun-95Kidney stones removal none

1234 243 05-Apr-94 Eye Cataract removal Tetracycline

1234 467 10-May-95 Thrombosis removal none

2345 189 08-Jan-96 Open Heart Surgery Cephalosporin

4876 145 05-Nov-95 Cholecystectomy Demicillin

5123 145 10-May-95 Gallstones Removal none

6845 243 15-Dec-84 Eye cataract removal none

6845 243 05-Apr-94Eye Cornea Replacement Tetracycline

Page 14: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Third Normal Form

Drug Admin Side Effects

Cephalosporin none

Demicillin none

none none

Penicillin rash

Tetracycline Fever

Page 15: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Most 3NF Relations are also BCNF

Patient # Patient Name Patient Address

1111 John White15 New St. New York, NY

1234 Mary Jones10 Main St. Rye, NY

2345Charles Brown

Dogwood Lane Harrison, NY

4876 Hal Kane55 Boston Post Road, Chester,

5123 Paul KosherBlind Brook Mamaroneck, NY

6845 Ann HoodHilton Road Larchmont, NY

Page 16: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

ER Diagram Symbols

Entity

AttributePrimary

key

Relationship

Ovals are used to indicate the attributes associated with an entity or relationship (That is, the pieces of information recorded in the database about the entity or relationship) An underlined name indicates that the attribute is a primary key (That is, it can uniquely identify the entity)

Rectangles are used to indicate entities (That is, the representatives or records describing persons, things, or events in the database)

Diamonds are used to indicate relationships between entities. (That is, some association between the data records of different entities)

Page 17: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Today: New Design

• Today we will build the COOKIE database from needs (rough) through the conceptual model, logical model and finally physical implementation in Access.

Page 18: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Cookie Requirements• Cookie is a bibliographic database that contains

information about a hypothetical union catalog of several libraries.

• Need to record which books are held by which libraries

• Need to search on bibliographic information– Author, title, subject, call number for a given library,

etc.

• Need to know who publishes the books for ordering, etc.

Page 19: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Cookie Database

• There are currently 5 main types of entities in the database – Books (bibfile)

– Local Call numbers (callfile)

– Libraries (libfile)

– Publishers (pubfile)

– Subject headings (subfile)

– Links between subject and books (indxfile)

Page 20: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

BIBFILE• Books (BIBFILE) contains information about

particular books. It includes one record for each book. The attributes are:– accno -- an “accession” or serial number

– author -- The author’s name (not realistic -- one author per book)

– title -- The title of the book

– loc -- Location of publication (where published)

– date -- Date of publication

– price -- Price of the book

– pagination -- Number of pages

– ill -- What type of illustrations (maps, etc) if any

– height -- Height of the book in centimeters

Page 21: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Books/BIBFILE

Books

Author

accno

Title

Loc

DatePrice

Pagination

HeightIll

Page 22: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

CALLFILE

• CALLFILE contains call numbers and holdings information linking particular books with particular libraries. Its attributes are:– accno -- the book accession number

– libid -- the id of the holding library

– callno -- the call number of the book in the particular library

– copies -- the number of copies held by the particular library

Page 23: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

LocalInfo/CALLFILE

CALLFILE

Copiesaccno

libid Callno

Page 24: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

LIBFILE• LIBFILE contain information about the libraries

participating in this union catalog. Its attributes include:– libid -- Library id number– library -- Name of the library– laddress -- Street address for the library– lcity -- City name– lstate -- State code (postal abbreviation)– lzip -- zip code– lphone -- Phone number– mop - suncl -- Library opening and closing times for each day of the week.

Page 25: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Libraries/LIBFILE

LIBFILE

LibidSatCl

SatOp

FCl

FOp

ThCl

ThOpWClWOpTuClTuOp

Mcl

MOp

Suncl

SunOp

lphone

lziplstate lcityladdressLibrary

Page 26: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

PUBFILE• PUBFILE contain information about the

publishers of books. Its attributes include– pubid -- The publisher’s id number– publisher -- Publisher name– paddress -- Publisher street address– pcity -- Publisher city– pstate -- Publisher state– pzip -- Publisher zip code– pphone -- Publisher phone number– ship -- standard shipping time in days

Page 27: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Publisher/PUBFILE

PUBFILEpubid

Ship

Publisher

pphone

pzip

pstate

pcity

paddress

Page 28: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

SUBFILE

• SUBFILE contains each unique subject heading that can be assigned to books. Its attributes are– subcode -- Subject identification number– subject -- the subject heading/description

Page 29: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Subjects/SUBFILE

SUBFILE

Subjectsubid

Page 30: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

INDXFILE

• INDXFILE provides a way to allow many-to-many mapping of subject headings to books. Its attributes consist entirely of links to other tables– subcode -- link to subject id– accno -- link to book accession number

Page 31: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Linking Subjects and Books

INDXFILE

accnosubid

Page 32: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Some examples of Cookie Searches

• Who wrote Microcosmographia Academica?• How many pages long is Alfred Whitehead’s The Aims of Education

and Other Essays?• Which branches in Berkeley’s public library system are open on Sunday?

• What is the call number of Moffitt Library’s copy of Abraham Flexner’s book Universities: American, English, German?

• What books on the subject of higher education are among the holdings of Berkeley (both UC and City) libraries?

• Print a list of the Mechanics Library holdings, in descending order by height.

• What would it cost to replace every copy of each book that contains illustrations (including graphs, maps, portraits, etc.)?

• Which library closes earliest on Friday night?

Page 33: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Cookie ER diagram

Has callBIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILEHas index

libid

CALLFILE Has copy

publishes pubidPUBFILE

Has subject

subcodeaccno subcode

libidaccno

Note: diagramcontains onlyattributes usedfor linking

Page 34: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

What Problems?

• What sorts of problems and missing features arise given the previous ER diagram?

Page 35: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Problems Identified

• Field sizes inappropriate• Author doesn’t allow

multiple authors (editors, etc).

• Subtitles, parallel titles• Edition information• Series information• lending status• material type designation• Genre, class information• Better codes (ISBN?)

• Missing information (ISBN)

• Authority control for authors

• Missing/incomplete data• Data entry problems• Ordering information• Illustrations• Subfield separation (such

as last_name, first_name)• Separate personal and

corporate authors

Page 36: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Problems (Cont.)

• Location field inconsistent

• No notes field• No language field• Zipcode doesn’t support

plus-4• No publisher shipping

addresses

• No (indexable) keyword search capability

• No support for multivolume works

• No support for URLs – to online version

– to libraries

– to publishers

Page 37: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Original Cookie ER diagram

Has callBIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILEHas index

Address, etc

Librarylibid

CALLFILE Has copy

publishes pubidPUBFILE

Has subject

subidaccno subid subject

CallnoLibidaccno

Page 38: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Cookie2: Separate Name Authorities

nameid

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHFILE

AUTHBIB

authtype

accno

nameid

name

Page 39: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Cookie3: Keywords

nameid

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHFILE

AUTHBIB

authtype

accno

nameid

name

KEYMAP TERMS

accno termid termid

Page 40: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Cookie 4: Series

nameid

BIBFILE

pubid

LIBFILE

INDXFILE

accno

SUBFILE

libid

CALLFILE

pubidPUBFILE

subcodeaccno subcode

libidaccno

AUTHFILE

AUTHBIB

authtype

accno

nameid

name

KEYMAP TERMS

accno termid termid

SERIES

seriesid

seriesid

ser_title

Page 41: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Cookie 5: Circulation

nameid

BIBFILE

pubid

LIBFILE

accno

libid

CALLFILE

pubidPUBFILE

libidaccno

INDXFILE SUBFILE

subcodeaccno subcodeAUTHFILE

AUTHBIB

authtype

accno

nameid

name

KEYMAP TERMS

accno termid termid

SERIES

seriesid

seriesid

ser_title

CIRC

circidcopynumpatronid

PATRON

circid

Page 42: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Mapping to Relations

• Take each entity– BIBFILE– LIBFILE– CALLFILE– SUBFILE– PUBFILE– INDXFILE

• And make it a table...

Page 43: 9/9/1999Information Organization and Retrieval Database Design: From Conceptual Design to Physical Implementation University of California, Berkeley School.

9/9/1999 Information Organization and Retrieval

Implementing the Physical Database...

• For each of the entities, we will build a table…

• Start up access…

• Use “New” in Tables…

• Loading data

• Entering data

• Data entry forms