Top Banner
1 Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College Physical Design 3
34

1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

Mar 28, 2015

Download

Documents

Arianna Fuller
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

1

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Physical Design

3

Page 2: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

2

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Introduction

Database Design Methodology requirements specification ER/EER modelling validation of ER/EER models; aggregation of different views transformation of ER/EER model into relational model normalisation physical design monitor and tune operational system

Page 3: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

3

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Outline

overview of physical design base relations and enterprise constraints

• known from before

transactions analysis file organisation and indexes

Page 4: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

4

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

1

Page 5: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

5

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Physical Design – Rationale

all the documentation produced until physical design represents a detailed specification of what we intend to build, of what is required

informally, physical design is the process that transforms these specifications into a good working system, using the functionality provided by the chosen DMBS

Page 6: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

6

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

in next lecture

Overview of Physical Design

express logical model using the DDL language of the chosen/target DBMS

design optimal storage design user views design security mechanisms

Page 7: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

7

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Express Logical Model in Target DBMSd

essentially covered in the first term design and implement base relations analyse and document derived data design enterprise constraints

Page 8: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

8

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Design Base Relations

are domains supported? are attribute constraints supported?

• NOT NULL, UNIQUE

are keys supported? candidate

• primary

• alternate (UNIQUE + NOT NULL)

foreign• referential integrity

• FK rules

Page 9: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

9

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Analyse and Document Derived Data

derived attributes not present in the relational model; but present in the EER model

• it is possible that not all derived attributes are documented thus far• then, this issue can be given further consideration at this point

trade-off calculate a derived attribute each time it is being used

• may be too time-expensive store it in the database

• redundancy, therefore more space required (space-expensive) and possibility of inconsistencies

• to maintain consistency – integrity constraints of active rules; now this may become time-expensive

Page 10: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

10

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Design Enterprise Constraints

support for integrity constraints e.g., SQL’s

• CONSTRAINT … CHECK … NOT EXISTS …

support for active rules or triggers e.g., Postgres’:

• CREATE RULE … ON UPDATE TO … WHERE … DO …

example?

Page 11: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

11

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Design Optimal Storage

criteria maximise transaction throughput maximise response time minimise storage space (they determine and are determined by the system resources)

issues transactions analysis file organisation and indexes

Page 12: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

12

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

2

Page 13: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

13

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Analyse Transactions

identify critical transactions to the functionality of the information system e.g., transactions that should never fail

identify transactions that put a significant load on the DBMS run frequently processing time is high

identify the periods when the database system is heavily used (down to individual transactions)

identify type of users by whom or locations from where the database is going to be heavily used

Page 14: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

14

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Map Transactions [Paths] to Relations

identifies mostly used relations

matrix transactions relations elements of the matrix:

• Y/N (i.e. used, not used)

• number of accesses (per hour, day …)

rel 1

rel 2

rel 3

rel 4

rel 5

T1 T2 T3 T4I R U D I R U D I R U D I R U D

Page 15: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

15

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Determine Frequency Information

determine average, maximum and minimum number of times a transaction runs per hour, per day, … transaction usage map

determine peak periods determine demanding transactions that have in

common some of the resources they access problems due to locking

Page 16: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

16

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Bookings for Personal Tutor

(A) check availability• see if tutor is available

at a specific time

(B) see my appointments• list all my appointments

for given period

(C) see details of my appointments

• list all my appointments, including the tutor’s nameand office

Tutor

nameofficeemail

Student

nameprogrammeemail

Booking

daytimetopic

With For

(A)(B)

(C)

Page 17: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

17

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Transaction Usage Maps frequency: per hour

Tutor

20

With For

(A) (B)(C)

Student

300

Booking

1500

avg: 20max: 200

1

0..* 0..*

1

avg: 40max: 150

avg: 20max: 100

Page 18: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

18

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Analyse Data Usage

more detailed analysis (relations) attributes and the type of access

• updated attributes – not candidates for access structures

attributes used in predicates• are candidates for access structures

attributes involved in joins• candidates for access structures

attributes affecting performance of critical transactions• higher priority for access structures

transaction analysis form• refer to Connolly, p.490

Page 19: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

19

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Analyse Transactions - Conclusion

essentially, the transaction analysis identifies the critical aspects related to the usage of the database (e.g., relations used frequently, attributes involved in “expensive predicates”, …)

on its basis, file organisations and indexes can be chosen

Page 20: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

20

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

3

Page 21: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

21

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

File Organisation

data has to be stored in an efficient way• all 4 operations (insert, retrieve, update and delete) require

efficiency

efficient has different meanings in different contexts different storage structures or file organisations

represent efficient ways for different contexts; e.g.• “heap” structure is suitable for bulk-loading and bulk-retrieval

(all student names, all programmes, …)• “hash” structure is suitable for “exact match” queries (student

name = ‘Joe Bloggs’)

note that a structure that is efficient in one context may not be efficient in a different context

Page 22: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

22

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Index

a structure that allows the DBMS to locate records in a table/file more quickly

the decision as of which attributes to be chosen as indexes, and which type of indexes they should be (which type of file

organisation)

… is determined by the results of the transaction analysis

Page 23: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

23

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

File Organisation vs Index

file organisation – method of storing data on disk, with our without the use of indexes

index – data structure used to access records more quickly primary and clustering index – part of the storage of the

actual records; the records themselves are physically ordered according to the index

secondary index – auxiliary data structure; the records may be (usually are) unordered according to the index;

index is sometimes used to mean secondary index the issue of indexes is subsumed by the issue of file

organisation

Page 24: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

24

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

File Structures/Types

Heap / unordered Index Sequential Access Method Hash B+ tree

Page 25: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

25

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Heap / Unordered

records are written in the file in the same way as they are inserted, at the end of the file insertion is efficient, in particular for bulk-loading

there is no ordering retrieval is very inefficient if it involves predicates/conditions similarly update and deletion

the space freed by deleted records is not automatically reused administrator has to run routines

Page 26: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

26

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Indexed Sequential Access (ISAM) / Ordered

records are ordered on the basis of some attributes – the index field primary index – ordering attributes are a key

• one index value to a tuple

clustering index – ordering attributes are not a key• one index value to a group of tuples

sequentially ordered secondary indexes can also be created what is the difference be between a clustering index and a

secondary sequential index?

Page 27: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

27

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Use of ISAM Files

recommended exact matching (based on the index field) range of values (based on the index field)

drawback ISAM indexes are static (created when the file is created)

not recommended updates to index field

• the access key sequence deteriorates

Page 28: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

28

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Hash (“random” or “direct” access)

a hash function calculates the address where each record is to be stored the calculation is performed on the basis of some fields records appear to be randomly distributed across the file

space the function should be chosen such that it leads to an as

good as possible distribution of the records in the available space

problem : most hashing functions do not guarantee a unique address, because the file space much smaller than possible values of hash field

• address generated by hash function BUCKET (with SLOTS)• COLLISION (SYNONYMS) same bucket, different slots

hash attributes – secondary index

Page 29: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

29

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Use of Hash Files

recommended for retrievals based on exact matches, in particular when the

access order (the order in which queries arise) is random

not recommended retrieval based on pattern match retrieval based on ranges of values retrieval based on other fields than the exact hash filed when the hash field is frequently updated

Page 30: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

30

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

B+ Tree

B Balanced tree more versatile than the hash structure

details – optional issue recommended

exact match (index filed) pattern matching (index filed) range of values (index filed) part index filed specification

advantage B+ tree is dynamic – it grows as the relation grows performance does not deteriorate with updates

B+ tree – secondary index

Page 31: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

31

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Choosing Indexes

consider results from transactions analysis primary/clustering index

attribute(s) used in joins attribute(s) most often used to access the relation

secondary indexes trade-off: maintenance of an index vs. efficiency of queries

• work in class on maintenance operations

choosing secondary indexes index primary key (if not already a primary index) index attributes often involved in joins and selection criteria do not index attributes which are frequently updated …

Page 32: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

32

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

File Organisation - Conclusion

pre-defined file-structures exists that provide better efficiency of certain database operations in certain contexts

Page 33: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

33

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Page 34: 1 Term 2, 2004, Lecture 5, Physical DesignMarian Ursu, Department of Computing, Goldsmiths College Physical Design 3.

34

Term 2, 2004, Lecture 5, Physical Design Marian Ursu, Department of Computing, Goldsmiths College

Conclusions

physical design what it consists of

transaction analysis identifies “hot-spots” of the database

file organisation and indexes make work with the “hot-spots” more efficient