Top Banner
1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009
28

1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

Jan 02, 2016

Download

Documents

Everett Dalton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

1

Information Retrieval and Use

Data Analysis & Data Modeling, Relational Data Analysis and

Logical Data Modeling

Geoff Leese September 2009

Page 2: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

2

Relational Data Analysis Captures the detailed knowledge of the

meaning of the data. Ensures that the data is logically easy to

maintain and extend.Data inter-dependencies have been

identifiedAmbiguities have been resolved.Eliminate unnecessary duplication of data.Forms the data into optimum groups.Validates the Logical Data Model (LDM).

Page 3: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

3

Logical Data Modelling

Basic Rules for converting 3NF to a LDM Create an entity type for each data relation Mark qualifying foreign keys Check compound key relations Make foreign/primary key relations

Page 4: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

4

Guidelines for logical modelling

Entity type names are singular nouns, descriptive, concise and organisation specific.

Attribute names are unique descriptive nouns of standard format.

Relationship names are descriptive, precise verb phrases.

Page 5: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

5

Simple Master-Detail relationships

Where a single foreign key of a relation corresponds to the primary key of another relation

See next slide for example.

Page 6: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

6

Simple Master-Detail relationships

Shows SINGLE primary key at MASTER entity (Organisation) connected to SINGLE foreign key at DETAIL entity (Contact people)

Page 7: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

7

Multiple level Master-Detail Relationships

Example: five entities

Page 8: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

8

Identifying Recursive (Unary) Relationships

Is a relation where a foreign key references the same relation.

Example: Employee Employee-number

Employee-name

Employee-manager-number

Employee

Page 9: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

9

Relationships: Student/Module

At this point we need to identify the data items that describe or identify each entity

Entity attributes are also known as data items

What are the data items associated with the following LDS diagram?

TakesStudent Module

Is taken by

Page 10: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

10

The Student

Entity Type Attribute Name Attribute Student Student Name Jones

Street Address Leek Road

Town Stoke-on-Trent

Post Code ST4 2DE

Telephone 294303

TakesStudent Module

Is taken by

Page 11: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

11

The Module

Entity Type Attribute Type AttributeModule Module Number CM5111-1

Module Name SSAT Module Leader A LecturerLevel 1Cats Points 10

TakesStudent Module

Is taken by student

Page 12: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

12

The Data Items

TakesStudent Module

Is taken by student

Module NumberModule NameModule LeaderLevelCats Points

Student NameStreet AddressTownPost CodeST4 2DETelephone

Page 13: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

13

Identifying occurrences of entities

Each occurrence of an entity must be uniquely identified in some way

Imagine the British Gas data base that used only surnames to identify account holders

There would be 100,000 account holders called Jones in this country

Even if we used the given names there would still be considerable duplication

It would be impossible to find the right account by name alone

Page 14: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

14

Adding a Primary Key

TakesStudent Module

Is taken by student

Module NumberModule NameModule LeaderLevelCats Points

Student NumberStudent NameStreet AddressTownPost CodeST4 2DETelephone

Primary key added

Page 15: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

15

Relationships: Getting it right

TakesStudent Module

Is taken by student

TakesStudent Module

Is taken by student

Is this right?

The real situation is surely

Page 16: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

16

Putting it right: Intersection entity

Student Number Module number

Student Module

Module NumberModule NameModule LeaderLevelCats Points

Student Number Student NameStreet AddressTownPost CodeST4 2DETelephone

Stud/Mod

We need a link entity - less ambiguity

Page 17: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

17

Normalisation - steps

Start with a set of un-normalised tablesEntity/attribute list

Step 1 - remove ambiguity and repeating data

Step 2 - remove shared data

Page 18: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

18

Normalisation - step 1 Break down ALL attributes into smallest

meaningful parts EG student name becomes student surname,

student firstname, student title

Remove REPEATED information to form a new table EG a course may be composed of MANY

modules (but assume that each module is only on one course!) - so form a MODULE table

Page 19: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

19

Normalisation - step 2

Remove SHARED data to form new tablesEG modules may share tutors - so form

a TUTORS table.

Page 20: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

20

Normalisation

FIRST NORMAL FORM - a relation (table) is in 1NF if it contains atomic values and all repeating groups have been removed

Page 21: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

21

Normalisation

SECOND NORMAL FORM - a relation(table) is in 2NF if it is in 1NF and every non-key attribute is fully dependent on the primary key

Page 22: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

22

Normalisation

THIRD NORMAL FORM - a relation(table) is in 3NF if it is in 2NF and every non-key attribute is not dependent on any other non-key attribute

Page 23: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

23

Relational Data Analysis Form

Validates the LDM against the relations. Consists of:

Unnormalised Form– attributes

First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF)

– Relations– Attributes

Page 24: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

24

RDA Form

Name Date

UNF 1NF 2NF 3NF Resultrelation attributeattributes

Page 25: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

25

Data Dictionary

lists, for every field in every tableTablenameFieldnameField TypeField size (if variable)Decimal places (if applicable)Description (if required)Other significant field properties

Page 26: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

26

Data Dictionary example

Tablename Fieldname Fieldtype Length DecPlaces

Description

Students Student ID Counter N/A N/AStudents Student firstname Text 20 N/A Full firstname(capitalised)Students Student other

initialsText 5 N/A Other initials, Capitals,

Space separatedStudents Student Surname Text 25 N/A Surname, CapitalisedStudents Fee paid Number

(currency)N/A 2 Fee paid

Students Date of Birth Date/Time N/A N/A Input mask Short date,format Medium Date

Students Full Time? Yes/No N/A N/AEtc

Page 27: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

27

The domain Is the “set” of items, and the definition

thereof to which an attribute belongs Define domain once, saves time when

defining attributes belonging to it. For example - Date of Birth, Course Start

Date and Enrolment Date all belong to the DATE domain - data type is date/time, format dd/mm/yyyy, non-unique, non-null.

Page 28: 1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009.

28

Further reading

Rolland chapters 3 and 4 Hoffer chapters 10 and 12 Kendall & Kendall chapter 17