Data Structures, Algorithms and Database Programming Semester 2/ Weeks 13-24 Database Programming Nick Rossiter/Emma-Jane Phillips- Tait.

Data Structures, Algorithms and Database Programming

Semester 2/ Weeks 13-24

Database Programming

Nick Rossiter/Emma-Jane Phillips-Tait

Semester 2/ Week 13

Introduction

• Database Programming– A program is defined simply as:

• a sequence of instructions that a computer can interpret and execute

– So SQL (Structured Query Language)• the ISO standard language for relational databases

– is a programming language

SQL - Classification

• SQL is the basis of all database programming• As a language SQL is:

– Non-procedural• Specify the target, not the mechanism (what not how)

– Safe• Negations limited by context

– Set-oriented• All operations are on entire sets

– Relationally complete• Has the power of the relational algebra

– Functionally incomplete• Does not have the power of a programming language like Java

SQL – Program Constructions

SELECT id, name, addressFROM studentWHERE name = ‘Mary Brown’;

• FROM statement specifies tables to be queried (source/range)

• WHERE statement specifies restriction on values to be processed (predicate)

• SELECT statement specifies what is to be retrieved (target)

Some properties of SQL re-visited

• Non-procedural– No loops or tests for end of file

• Set-oriented– The operation is automatically applied to all the rows in

STUDENT

• Relationally complete– Restrict shown here (all others are available)

• Functionally incomplete– Does not matter here if just want information displayed

SQL Program - Example of Natural Join

SELECT student.id, name, address, year

FROM student, module_choice

WHERE name = ‘Mary Brown’

AND module = ‘CM503’

AND student.id = module_choice.id;

• Last line does primary key – foreign key match

• “Give details of when a student called Mary Brown took module CM503”

Id * | Name | Address

Student

Module_choice

Module * | Id * | year

Module_choice.Id is foreign key to Student.id; represents a path along which joins are made

* Indicates component of primary key

Data names in SQL are case insensitive; SQL values are case sensitive.

Id * Name Address

127 Mary Brown Hexham

296 John Brown Morpeth

654 Mary Brown Newcastle

Student

Module_choice Module * Id * year

CM503 127 2003

CM503 654 2001

cm503 127 2002

For the above data values, what’s the answer?

Id Name Address Year

127 Mary Brown

Hexham 2003

654 Mary Brown

New-castle

Column name contains only one value (as would a module column)

Why only 2 rows? Why is one ‘127’ match missing?

Rewriting Joins as Intersections• SQL is not necessarily run in the way you enter it• You (or the system) could rewrite the join earlier as:SELECT id, name, addressFROM studentWHERE name = ‘Mary Brown’ AND id IN

(SELECT id FROM module_choiceWHERE module= ‘CM503’);

• There’s one difference. Why?

SQL controls the filing cabinet

• Defines data structures (CREATE TABLE, CREATE VIEW, …)

• Handles updates (INSERT, DELETE, COMMIT, ROLLBACK, …)

• Provides retrieval (SELECT)

• But it is not functionally complete in its interactive form

Functional Incompleteness in SQL

• No control statements such as: – Case, Repeat, If, While

• No substitution at run time:– e.g. … WHERE id = :idread– where idread is a program variable

• You don’t see travel agents typing in SQL statements to search for holiday vacancies– although they will be searching a relational database

• There is SQL underneath – But its functionality is increased through additional features

Getting More out of Basic SQL

• To overcome functional incompleteness:– Pre-defined Functions – Procedures (e.g. PL/SQL)– User-defined Functions– Embedded SQL– Web-based Servers (e.g. Microsoft/ASP,

Oracle/JSP, Oracle/JDBC, MySQL/PHP)

Pre-defined Functions in SQL

• An SQL function:– Is a method applied to a particular type– Returns a single value

• There are many pre-defined functions:– Can be used without any knowledge of how they are

implemented– All can be used in target (SELECT) and some in predicate

(WHERE)– Used in areas such as string handling, simple statistics, date

manipulation, type casting

• Use pre-defined functions where available to avoid writing your own code

Example of Predefined Function

SELECT student.id, name, address, yearFROM student, module_choiceWHERE name = ‘Mary Brown’ AND upper(module) = ‘CM503’ AND student.id = module_choice.id;

• Upper(char) takes a value of type char and forces it to upper case

• In the example above it does not update module values in the module_choice table

• So how many rows are retrieved by this join?

Other String-Handling Functions

• Include– CONCAT(arg1, arg2)

• concatenation of the two arguments arg1, arg2

– TRIM(arg1)

• removes leading and trailing blanks from arg1

– LOWER(arg1)

• translates arg1 to lower case

– UPPER(arg1)

• translates arg1 to upper case

– SUBSTR(arg1,n,m)• returns positions n…(n+m) of arg1

• arg1, arg2 are char types; n, m are integer types

Predefined Aggregation Functions

• Functions operating on collections include:– AVG(setN)

• returns average of setN – SUM(setN)

• returns sum of setN– COUNT(setR)

• returns count of setR

– MAX(setT)• returns maximum of setT

• setN is a set of type number, setR is a set of type row, setT is a set of any type

• Sets may be formed as columns of values via:– a SELECT command on a table or– a GROUP BY on a table

Predefined Date Functions

• Include:– SYSDATE

• no arguments, returns current date

– MONTH(arg3)• returns month component of arg3

– YEAR(arg3)• returns year component of arg3

– MONTHS_BETWEEN(arg3,arg4)• the number of months between arg3 and arg4

– In some versions of Oracle, need to use syntax e.g. {fn MONTH(arg3)}

• arg3, arg4 are date types

• (arg3-arg4) gives number of days between the two dates

• time handling is also available within date type

Semester 2/ Week 14

Programming with SQL*Plus

• Predefined functions play an increasing role– Sometimes termed built-in functions

• Rather unstable to some extent– New functions in Oracle 9i– Some redefinitions of Oracle 8i functions

– Not always upwards compatible

– What works in one system may not work in another without some tweaking

Reference Material

• So programmers must always consult reference material

• Text books and lecture notes are not reference manuals

• All database manuals are available on-line e.g.– DB2 notes cited in exercises for week 13 (should be

very similar to Oracle as same standard)– Oracle 9i notes available from (prefix id by unn/)– http://cgweb1.unn.ac.uk/SubjectAreaResources/database/oracle/doc/

• Sound advice is: Read the Manual!

Reading Variable Values into SQL Programs

• Makes programs dynamic• Number of methods

– Substitution variables• &variable in programs

– Accept statements • User-defined prompt and type for a variable value

– Script parameters• Variables assigned values on executing a script file

• Such reads make programs versatile– Can run with values specified by user at run-time

Substitution Variablesselect *from patientwhere pid = ‘&pno’;

• when run, prompts user for value for pno• quotes indicate a char value select * from patientwhere pid <= '&&pno'and pid >= '&&pno';

• Double && means only one prompt is made– even if same variable occurs more than once in program

• In all input operations quotes are also used for dates but not for numbers.

Accept Statement

• ACCEPT sp_variable type PROMPT ‘string’;• where upper case is literal • sp_variable is the SQL*Plus variable being assigned • type is the type of the SQL*Plus variable• string is the prompt

• Example:accept pat_no char prompt 'Enter patient id:';select *from patientwhere pid = ‘&pat_no’;• Value entered for pat_no is available for rest of session.

Script Parameters• Can run a script file called S.sql (type sql is required by

default) by:@S• Say S containsselect *from patientwhere pname = '&1‘ AND pid = '&2';

• Then can run the script by:@S 'Fred' '1';• ‘Fred’ is parameter 1 and ‘1’ is parameter 2• May need to use file/open in SQL*Plus to set directory

• In more serious programming with SQL– often need to know whether a variable has been initialised or not.

• An un-initialised variable has a null value – unless a default has been supplied

• Cannot search for nulls as `’ or ``’’

select * from patient where pname IS NULL;

• finds rows where patient name is null

select * from patient where pname IS NOT NULL;

• finds rows where patient name is not null

Spooling

• Useful to record a whole session

• From the File menu in SQL*Plus:– Can set a (text) file as the recipient of all output

including commands– Need

• SET ECHO ON

– to have a record of everything

Intermediate Results• Building up results in stages is often a good idea:

– Can check intermediate results for correctness– May be able to re-use intermediate results in more than one

• Views may be used for this purpose– No data storage costs for a view– Updated automatically as data changes (in effect)

• Reflects latest data position

• Tables are less satisfactory– Duplicate data storage– Out of date as snapshot of data held

Examples of Views

create view pv as(select patient.*, visits.did, visits.vdatefrom patient, visitswhere patient.pid = visits.pid);• Natural join of patient and visits, contains:

– pid, pname, address, dobirth, date_reg, did, vdate

create view dv as(select doctor.*, visits.pid, visits.vdatefrom doctor, visitswhere doctor.did = visits.did);• Natural join of doctor and visits, contains:

– did, dname, date_start, pid, vdate

Operations on Views 1

• Can select and search as if they were tables• Updates may cause problems (not considered here)

• Examplesa) select * from pv; b) select * from pv where pid = ‘5’;c) Create view pvv as (select pv.*, action, vaccinatedfrom pv, vaccinations vcwhere pv.pid = vc.pidand pv.vdate = vc.vdate);• Does natural join of pv and vaccinations over pid and vdate. • Natural join pvv contains:

– pid, pname, address, dobirth, date_reg, did, vdate, action, vaccinated

vc is alias forvaccinations

Operations on Views 2

• View pvv is the natural join of patient, visits and vaccinations

• It can be presented to users as a structure:– For ease of searching (just where clause)– In which no knowledge of joins is required

select distinct pid, pname from pvvwhere upper(vaccinated) = ‘TYPHOID’ and address = ‘Heaton’and vdate < ’25-apr-2002’; • Searches for values in all three base tables with joins (that is

logical connections) already made in pvv.

Setting the SQL*Plus Environment

• Over 50 variables control the environment in which SQL*Plus runs.

• Can see all their current settings through:show all;

• Cover appearance of prompts, formats on screen, transaction settings, recovery, escape, compatibility, …

• A potential pitfall for imported applications if different environment assumed

SHOW var;

• shows the current setting for a particular variable

Examples of Environment Variables 1

• Autocommit– If on updates are committed after each update

command (insert, delete, update)– If off updates are not committed after each

update command

• Time– If on all prompts are preceded by time giving

time stamping;– If off time is not displayed with the prompt

Examples of Environment Variables 2

• Linesize– Set to an integer giving width of line display

• SQLPrompt– Can vary default prompt from SQL>

• Feedback– If on report on number of rows found– If off give no report

• Echo– If on echo input on screen– If off do not echo input

Setting Environment Variables

• SET variable value;– variable is the environment variable– value is the new value

• Examples:– set autocommit on; – set sqlprompt input>;– Set linesize 100;

Semester 2/ Week 15

SQL*Plus Scripting 1• Plus points:

– Same SQL language as in interactive mode• Can test programs interactively first

• Includes predefined (built-in) functions

– Fast development possible• Rapid prototyping

• Get results and feedback quickly

• No cumbersome environment

– Variable inputs• Parameters, substitution, accept

SQL*Plus Scripting 2

• Plus points (continued)– Can have multiple script files

• Each file created by simple text editor

– Can have master script file• calling others in sequence

– Or can nest script files more generally• scripts can call other scripts (default file type sql)

– @S6– start S6

• Problems with scripts:– interpreted each time they are run

• not verified and compiled

• optimisation of SQL code done each run time

• poor performance

– no control environment• procedural actions lacking (case, if, while, for, repeat)

• no error handling– resulting in outright failures or ignoring of messages

• Problems with scripts (continued):– Lack of control by business (via DBA -- DataBase

Administrator)

– How do we permit scripts for usage by particular people?

• Can anyone write a script to do anything they like?

– If people write scripts themselves to handle business rules

• how do we know they’ve implemented the rules in the same way?

Example: Different Representation of Same Rule

• update patient set dobirth = '20-feb-1932’ where pid = '3';• select round(months_between(sysdate,dobirth)/12) as age_mb from

patient;• select trunc((sysdate-dobirth)/365.25) as age_dd from patient;• select trunc((sysdate-dobirth)/365) as age_ddnl from patient;• select (extract(year from sysdate)-extract(year from dobirth)) as

age_yr from patient;

• Above:• Alter dobirth for patient ‘3’• Run four queries each one, according to the user,

calculating the age.

Results 14-Feb-2004

pid 1 2 3 4

Age_mb 34 20 72 22

Age_dd 33 20 71 22

Age_ddnl 33 20 72 22

Age_yr 34 21 72 22

Comments on Table• Minor differences such as these

– often more of a problem than major differences

• If big differences– these rapidly become obvious– e.g. paying an interest rate ten times more than others

• Age differences here could play havoc with:– social security benefits

• Later we will create a function to calculate the age precisely

Production Environment

• Encourages:– business rules in one place

• application of rules then controlled by DBA

– users need permission to apply rules• permission is granted/revoked by DBA

• Discourages:– duplicated, potentially inconsistent, rules– access by users to anything they like

SQL Procedure

• An important technique• Part of PL/SQL in Oracle

– Procedural Language/Structured Query Language

• Part of the SQL standard– approximate portability from one system to another

• Techniques are available for:– procedural control (case, if, while, …)– parameterised input/output– security

Oracle PL/SQL

• Not available in Oracle 8i Lite• Available in Oracle 9i at Northumbria• Available in Oracle 9i Personal Edition for

Windows (XP/NT/2000/98) and linux. – http://otn.oracle.com/software/products/oracle9i/index.html

– c1.4Gb download -- needs Broadband -- 3 CDs

• Useful guide to PL/SQL:– http://www-db.stanford.edu/~ullman/fcdb/oracle/or-plsql.html

– Using Oracle PL/SQL -- Jeffrey Ullman, Stanford University

Procedures are First-class Database Objects

• Procedures are held in database tables– under the control of the database system

• in the data dictionary• select object_type, object_name• from user_objects• where object_type = 'PROCEDURE';

• user_objects is data dictionary table maintained by Oracle• object_type is attribute of table user_objects holding value

‘PROCEDURE’ (upper case) for procedures – other values for object_type include ‘TABLE’, ‘VIEW’

• object_name is user assigned name for object e.g. ‘PATIENT’

Procedures aid Security

• Privileges on Tables: • Select

– query the table with a select statement. • Insert

– add new rows to the table with insert statement. • Update

– update rows in the table with update statement. • Delete

– delete rows from the table with delete statement. • References

– create a constraint that refers to the table. • Alter

– change the table definition with the alter table statement. • Index

– create an index on the table with the create index statement

Privileges on Tables• SQL statement -- issued by DBA:

– GRANT select, insert, update, delete ON patient TO cgnr2;

– ‘no grants to cgnr3 for table access’• allows user cgnr2 to issue SQL commands:

• beginning with SELECT, INSERT, UPDATE, DELETE on table patient

• but this user cannot issue SQL commands

• beginning with REFERENCES, ALTER, INDEX on table patient

• User cgnr3 does not know that table patient exists

Privileges on Procedures• The SQL statement

– GRANT execute ON add_patient TO cgnr3;

• allows user cgnr3 to execute the procedure called add_patient

• So user cgnr3 can add patients

– presumably the task of add_patient

• but cannot do any other activity on the patient table

– including SELECT

• So procedures give security based on tasks– powerful task-based security system

Semester 2/ Week 16

SQL Procedure Construction• Simple example, SQL*Plus Window:SQL> create or replace procedure add_patient as2 begin3 insert into patient values('99','Smith','Newcastle','12-mar-1980');4 end;5 /Warning: Procedure created with compilation errorsSQL> show errorsErrors for PROCEDURE ADD_PATIENT

LINE/COL ERROR-------- -----------------------------------------3/1 PL/SQL: SQL Statement ignored3/13 PL/SQL: ORA-00947: not enough values

Technique

• Have procedure code in text file managed by simple editor– E.g. Notepad

create or replace procedure add_patient asbegininsert into patient values('99','Smith','Newcastle','12-mar-1980');end;/

• Copy and paste code from text file into SQL*Plus window

• Oracle does keep a copy in its data dictionary• Many users work from text files

Features of Procedure

• CREATE OR REPLACE add_patient AS– Either add or over-write procedure called add_patient

• Needs care

• Could over-write existing procedure

• IS is alternative for AS

• BEGIN and END – Start and finish block

• INSERT is standard SQL statement• / means compile

Error Tracking

• ‘with compilation errors’– Problem(s) encountered in compilation

• Look at these through SQL command– SHOW ERRORS (SHO ERR abbreviation)

• Diagnostics:– Statement at line 3 ignored

• As not enough values at line 3, column 13 for patient• Five columns in patient, four in insert statement

– So in compilation tables are checked for compatibility with procedure operations

• ORA-00947 is an Oracle return code for ‘not enough values’

• Only execute procedures compiled without errors

Try again

SQL> create or replace procedure add_patient (reg in char) as

2 begin 3 insert into patient

values('99','Smith','Newcastle','12-mar-1980',reg); 4 end; 5 /

Procedure created.

Parameters

• Have added 5th variable to values• Also added a parameter

– Reg• type char (as in SQL types) and in (input, read-only)

– Other types at this level are number, date

• Message ‘Procedure created’ means:– No errors found

– Procedure can be executed

– Procedure is held in Oracle’s data dictionary

Data Dictionary entry for procedure

SQL> select object_type, object_name 2 from user_objects 3 where object_type = 'PROCEDURE';

OBJECT_TYPE------------------OBJECT_NAME--------------------------------------------------PROCEDUREADD_PATIENT

Executing Procedure

SQL> execute add_patient('14-feb-2002');PL/SQL procedure successfully completed.SQL> select * from patient where pid = '99';PID PNAME------ --------------------ADDRESS-----------------------------------------------DOBIRTH DATE_REG--------- ---------99 SmithNewcastle12-MAR-80 14-FEB-02

Features of Execution

• ’14-feb-2002’ is value for parameter of type date

• Other values are hard-wired in procedure

• Message ‘… successfully completed’– No errors during run

• Subsequent SELECT confirms– New data entered for patient with pid = ’99’

Now run procedure again

SQL> execute add_patient('14-feb-2002');

BEGIN add_patient('14-feb-2002'); END;

ERROR at line 1:

ORA-00001: unique constraint (CGNR1.PKP) violated

ORA-06512: at "CGNR1.ADD_PATIENT", line 3

ORA-06512: at line 1

Error – why?

• Attempt to add row with same primary key as last run (’99’).

• So violation at line 3 of procedure of primary key constraint CGNR1.PKP – CGNR1 is id– PKP is constraint from CREATE TABLE

• create table patient (• pid char(6) constraint pkp primary key, ….

• ‘ORA-00001: unique constraint violated’– Oracle return code and associated message

All values from parameters

SQL> CREATE OR REPLACE PROCEDURE add_patient (pid in char, pname in char, address in char, dobirth in date, regdate in date)

2 AS 3 BEGIN 4 insert into patient values(pid,pname,address,dobirth,regdate); 5 DBMS_OUTPUT.PUT_LINE ('Insert attempted'); 6 END; 7 /

Procedure created.

No data hard-wired/output strings

• Usually meaningless to have hard-wired data values– Need dynamic input at run-time– Note two types – char, date– Values may be captured through SQL Forms

• Output strings– Varies from system to system– In Oracle

• Use DBMS_OUTPUT.PUT_LINE • Needs earlier SQL command:

– Set serveroutput on

– Note output here is unconditional and vague

Execution with all values as parameters

SQL> execute add_patient('124','Smith','Edinburgh','13-nov-1980','27-dec-2002');Insert attempted

PL/SQL procedure successfully completed.

SQL> select * from patient where pid = '124';

PID PNAME------ --------------------ADDRESS--------------------------------------------------------------------------------DOBIRTH DATE_REG--------- ---------124 SmithEdinburgh13-NOV-80 27-DEC-02

Make columns explicit

SQL> CREATE OR REPLACE PROCEDURE add_patient (pat_id in char, pat_name in char, pat_address in char, pat_dobirth in date, pat_regdate in date)

2 AS 3 BEGIN 4 insert into patient(pid,pname,address,dobirth,date_reg)

values(pat_id,pat_name,pat_address,pat_dobirth,pat_regdate); 5 DBMS_OUTPUT.PUT_LINE ('Insert attempted'); 6 END; 7 /Procedure created.

• Specifying columns for patient makes procedure immune to any later changes in order of columns in patient

Semester 2/ Week 17

Transactions -- Rationale

• Consider two clients booking airline tickets• There are 2 seats left on a flight• Client A wants 2 seats:

– time 12:02 makes initial request– 12:06 confirms purchase through booking form– 12:08 authorises credit card payment

• Client B wants 2 seats:– time 12:03 makes initial request– 12:05 confirms purchase through booking form– 12:09 authorises credit card payment

• Situation needs careful control

Some Possibilities• Clients A and B are both told 2 seats are free in

initial enquiries• B confirms purchase before A

– But A may still proceed

• A attempts credit card debit first– If successful A secures tickets at 12:08

• B then attempts credit card debit– If successful B secures tickets at 12:09

• potentially over-writing A’s tickets• A has paid for tickets no longer his/hers

Requirements 1• When client A beats B in the initial enquiry:

– they should form a queue (serialisability)– B must wait for A to finish

• Different kinds of finish for A:– successful

• completes booking form• makes credit card debit• store results (commit)

– number of seats available is now zero

• write transaction log and finish• B cannot proceed with purchase as no tickets left

Requirements 2

– unsuccessful• may not complete booking form

• may not have funds on credit card

• undo any database changes (rollback) and finish– number of seats available is still 2

• B can now proceed to attempt to purchase the 2 tickets left

• Techniques required emulate business practice

Transactions -- ACID• A transaction is a unit of operation on a database.

– typically comprises a collection of individual actions• e.g. in SQL INSERT, UPDATE, DELETE, SELECT

• Satisfies ACID requirements:– Atomicity

• Collection of operations is viewed as a single data process

– Consistency• Data integrity is preserved

– Isolation• No interaction between one transaction and another• Intermediate results not viewable by others

– Durability• Once completed, effect of transaction is guaranteed to persist

Transactions in SQL• Logical units of work• A group of related operations that

– must be performed successfully• before any changes to the database are finalised.

• Variable size:

– entire run on SQL*Plus • e.g. spend 2 hours inserting data

– single command in SQL*Plus• e.g. one insert command

– one execution of a procedure• e.g. one run of add_patient (week 16)

SQL approach may be informal

• No explicit– BEGIN transaction, END transaction

• With autocommit OFF– SET AUTOCOMMIT OFF

• Implicit BEGIN transaction by:– start of SQL*Plus session

• Implicit END transaction by:– end of SQL*Plus session

SQL Transaction commands

• Commit;– saves current database state

– releases resources held

– equivalent to Save and Exit in MS Word

• Rollback;– returns database state to that at start of transaction

– releases resources held

– equivalent to dismiss/ do not save changes in MS Word

Use of Commands• Commit/Rollback

– explicitly entered in:• SQL*Plus window interactively• PL/SQL code including procedures

– implicitly entered• on normal EXIT from Oracle (commit 9i, rollback 8i Lite)• on abnormal exit from Oracle e.g. dismiss (rollback)• after each update command in SQL*Plus (commit)

– when autocommit is ON (or IMMEDIATE)

• after change to data definition, e.g. alter table (commit)– whatever autocommit setting

ACID in Oracle 1

• Atomicity – all commands in a transaction form a single

logical group

• Consistency– integrity checks within transaction

• Isolation– data modified by transaction not visible to

others until end of transaction

ACID in Oracle 2

• Durability– On Commit

• database state is first saved

• transaction log file is then updated– this log file may be held in several locations

• confirmation of log file writes ends transaction

– If crash (e.g. of disk) after commit• restore last save of database file

• run transaction log on database forward from:– save point to last transaction that committed

Partial Rollbacks

• Savepoints can be declared in SQL*Plus window or PL/SQL:

SAVEPOINT label; (label is a character string)

• The command ROLLBACK to label;• undoes changes back to the label in the

program or window• Many different savepoints can be declared

• Resources are held by locks• In SQL lock management is done:

– automatically with COMMIT and ROLLBACK

• Users and programmers can rely on defaults• However, some knowledge is useful for:

– tuning in production systems using LOCK command for efficiency

– understanding problems in running concurrent transactions

Example Lock Table

Task Table Row Lock type

CGNR1–1 Patient 8 W

CGNR1-2 Patient 1 R

CGSA1-1 Patient 7 W

CGSA1-2 Patient 1 R

Locking Modes • R (read) or shared

– any number of tasks can read the same data items concurrently

– CGSA1-2 and CGNR1-2 both read patient 1

• W (write) or exclusive– when writing data need exclusive access– otherwise values can change while in use by others– So CGNR1-1 is only task that can access Patient 8– and CGSA1-1 is only task that can access Patient 7

• None -- no entry in table

Locking Granularity• Can lock at level of:

– table– page (unit of disk storage)– row

• Coarse locks:– for instance a whole table– give small lock tables (not that many tables)– much contention for resources (many users queue for table access)

• Fine locks:– for instance a single row– give large lock tables (many rows included)– less contention for resources (few users queue for row access)

• Oracle defaults give fine locking

Semester 2/ Week 18

Transactions in Procedures

• On the surface – very easy.• If everything goes well (at end):

– COMMIT:

• If things go badly (at end):– ROLLBACK;

• Problem is controlling bad outcomes:– Handling exceptions

– Giving useful feedback

Week 16 Example – with Commit

SQL> CREATE OR REPLACE PROCEDURE add_patient (pat_id in char, pat_name in char, pat_address in char,

pat_dobirth in date, pat_regdate in date) 2 AS 3 BEGIN 4 insert into patient(pid,pname,address,dobirth,date_reg)

values(pat_id,pat_name,pat_address,pat_dobirth,pat_regdate); 5 DBMS_OUTPUT.PUT_LINE ('Insert attempted'); 6 COMMIT; 7 END; 8 /

Procedure created.

Review of Assignment Procedure

• Asked to add a procedure to add vaccination data

• Generate:– one successful run– three unsuccessful runs

• Here review closely results of run

ADD_VACC procedure

SQL> CREATE OR REPLACE PROCEDURE add_vacc (pat_id in char, vis_vdate in date, vis_act in number, vac_vacc in char)

2 AS 3 BEGIN 4 insert into vaccinations(pid,vdate,action,vaccinated)

values(pat_id,vis_vdate,vis_act,vac_vacc); 5 DBMS_OUTPUT.PUT_LINE ('Insert attempted'); 6 END; 7 /Procedure created.

Successful Run 1aSQL> execute add_vacc('2','16-dec-1999',3,'cholera');

SQL> select * from vaccinations 2 where pid = '2' and action = 3;

PID VDATE ACTION VACCINATED------ --------- ---------- --------------------2 06-AUG-91 3 polio2 16-DEC-99 3 cholera

SQL> commit;

Commit complete.

Successful Run 1b

• No error messages• Message ‘PL/SQL procedure successfully completed’ is

significant. It means:– Any exception raised during run has been properly

handled

– Does not necessarily mean data has been added successfully

• COMMIT should have been last line in procedure• Here user has made decision to commit

Unsuccessful Run 1a

SQL> execute add_vacc('2','16-dec-1999',1,'cholera');

BEGIN add_vacc('2','16-dec-1999',1,'cholera'); END;

ERROR at line 1:

ORA-00001: unique constraint (CGNR1.PKVAC) violated

ORA-06512: at "CGNR1.ADD_VACC", line 4

Unsuccessful Run 1b

• Error message returned:– ORA-00001 indicates non-unique primary key value– Text message ‘unique constraint violated’ spells out nature of problem– CGNR1.PKVAC is name of constraint in CREATE TABLE definition for

Vaccinations• constraint pkvac primary key (pid,vdate,action)

• Note no message about successful completion.– Does not necessarily mean unsuccessful addition– Means that exception raised in INSERT operation has not been

handled within the procedure

Unsuccessful Run 2a

SQL> execute add_vacc('2','17-dec-1999',1,'cholera');

BEGIN add_vacc('2','17-dec-1999',1,'cholera'); END;

ERROR at line 1:

ORA-02291: integrity constraint (CGNR1.SYS_C0080698) violated - parent key not found

ORA-06512: at "CGNR1.ADD_VACC", line 4

Unsuccessful Run 2b• Error message returned:

– ORA-02291 indicates foreign key entered does not match a primary key value (in visits)

– Text message ‘parent key not found’ spells out nature of problem

– foreign key(pid,vdate) REFERENCES visits(pid,vdate);

– CGNR1.SYS_C0080698 is name of constraint– Named constraints give more information

• Again no message about successful completion– As exception not handled

Attempted unsuccessful run

SQL> execute add_vacc('2','16-dec-1999','4','cholera');

SQL> select * from vaccinations 2 where pid = '2' and action = 4;

PID VDATE ACTION VACCINATED------ --------- ---------- --------------------2 16-DEC-99 4 cholera

Worked as ‘4’ char value entered for numeric attribute action was type cast (automatically) to a number

Unsuccessful Run 3a

SQL> execute add_vacc('2','16-dec-1999','4',cholera);

BEGIN add_vacc('2','16-dec-1999','4',cholera); END;

ERROR at line 1:

ORA-06550: line 1, column 38:

PLS-00201: identifier 'CHOLERA' must be declared

ORA-06550: line 1, column 7:

PL/SQL: Statement ignored

Unsuccessful Run 3b

• Error message returned:– ORA-06550 indicates non-declared identifier– Parameter value CHOLERA is not in quotes– Therefore taken as variable– Not declared to system

• Again no message about successful completion– As exception not handled

Exception Handling PL/SQL

• Essential part of any program• Particularly needed for updates

– open-ended nature of user inputs

• But also needed for searches– e.g. may not find any matching data

• An exception is raised when an operation:– fails to perform normally

• A non-handled exception leads to program failure

Exceptions Raised

• With input particularly– Cannot specify all Oracle error codes in

advance– Too many codes to specify– Some rule exceptions though can be

emphasised

• Need specific exceptions• And general (catch-all) exceptions

Complete PL/SQL procedure

• CREATE OR REPLACE PROCEDURE proc_name (parameters) AS

• [DECLARE] local_vars• BEGIN • executable_code• EXCEPTION exception_code• END• /

Explanation• Upper case -- literal (as is)• Lower case (to be substituted)• [DECLARE] omitted in procedures but part of full definition for

PL/SQL • Executable_code

– SQL commands, assignments, condition checking, text output, transactions

• Exception_code:– event handling, transactions

• proc_name is procedure name• local_vars are variables declared for use within procedure

(standard SQL types + Boolean types)

Example Procedure - part 1CREATE OR REPLACE PROCEDURE add_patient (pat_id in char, pat_name

in char, pat_address in char, pat_dobirth in date, pat_regdate in date) ASpid_too_high exception;PRAGMA EXCEPTION_INIT(pid_too_high,-20000); BEGINinsert into patient(pid,pname,address,dobirth,date_reg)

values(pat_id,pat_name,pat_address,pat_dobirth,pat_regdate);DBMS_OUTPUT.PUT_LINE ('Insert attempted');IF pat_id > '500' THENRAISE pid_too_high;END IF;COMMIT;

Example Procedure - part 2

EXCEPTION

WHEN pid_too_high THEN

DBMS_OUTPUT.PUT_LINE ('pid too high');

ROLLBACK;

Explanation 1• pid_too_high exception;

– variable pid_too_high of type exception (value true or false)

• PRAGMA EXCEPTION_INIT(pid_too_high,-20000);

– instruction to compiler – enables launch of further transaction to handle

exception pid_too_high • IF pat_id > '500' THEN RAISE pid_too_high; END IF;

– IF .. THEN … END IF construction – enforces a business rule that pid <= 500 by

• raising exception pid_too_high when this state occurs

Explanation 2

• EXCEPTION– opens exception handling part of procedure

• WHEN … THEN …; – defines actions when a particular exception

occurs

Flow of Action 1• If no exception raised

– insert is performed– commit takes place– procedure terminates with ‘successful’ message

• If specific exception for business rule raised– insert is performed– exception pid_too_high is raised in IF code– execution of main code immediately finishes– code in EXCEPTION section after WHEN pid_too_high is

executed• including rollback

– procedure terminates with ‘successful’ message

Flow of Action 2

• If another exception raised (on insert e.g. primary key violation)– insert is not performed– exception is raised in procedure– execution of main code immediately finishes– As no further exception handlers are declared

• procedure terminates with:– error reports– no ‘successful’ message

• Need catch-all exception handlers

Semester 2/ Week 19

Selects in Procedures 1

SQL> CREATE OR REPLACE PROCEDURE sel_patient AS 2 BEGIN 3 select * from patient; 4 END; 5 /Warning: Procedure created with compilation errors.SQL> sho err……3/1 PLS-00428: an INTO clause is expected in this

SELECT statement

• A procedure is no substitute for scripting here– Cannot put in simple selects

• SELECT is used in procedures to:– Fetch one row at a time

• An exception may be generated when we fetch:– no rows– multiple rows

• A cursor construction is used to handle multiple rows in an orderly fashion

• SELECT attribute_list INTO variable_list– is the basic format

• Lists can be singular or multiple • Multiple entries are comma delimited• Attribute 1 goes into variable 1, 2 into 2, …

– Variables are declared in Declare section• Must be of compatible type to that in CREATE TABLE …

• Single row retrieval is guaranteed– When WHERE clause searches only on primary key or

alternate key– No need for cursor here

SELECT – Single Row –Partial Attributes

SQL> CREATE OR REPLACE PROCEDURE sel_patient 2 (pat_id in char) 3 AS 4 pname_var char(20); 5 BEGIN 6 select pname into pname_var 7 from patient 8 where pid = pat_id; 9 DBMS_OUTPUT.PUT_LINE (pname_var); 10 END; 11 /

Procedure created.

Explanation

• Input is pid• Local variable pname_var is declared of

same type– as in CREATE TABLE for patient

• Attribute value pname is passed into:– Variable pname_var– For the one row where pid = ‘1’

• The value of pname_var is then displayed

Run of Single Record Procedure

SQL> set serveroutput on

SQL> execute sel_patient('1');

• Above gives patient name ‘Fred’ for patient with id ‘1’

Automatic variable typing

• In declarations

pname_var patient.pname%TYPE;

• Gives pname_var same type as pname in patient

• Good practice:• ensures types of table attributes and procedure

variables are exactly the same

Exceptions – no rows found

• Exception which needs handling is:– when no rows are found

• PL/SQL provides a pre-defined exception:– NO_DATA_FOUND

– Can test with WHEN clause in EXCEPTION part of procedure

• To avoid procedure error at run time:– Include this exception handler

– Or use an equivalent technique (cursor attributes)

Example – Single Row Retrieval with Exception

SQL> CREATE OR REPLACE PROCEDURE sel_patient 2 (pat_id in char) 3 AS 4 pname_var patient.pname%TYPE; 5 BEGIN 6 select pname into pname_var 7 from patient 8 where pid = pat_id; 9 DBMS_OUTPUT.PUT_LINE (pname_var); 10 EXCEPTION 11 WHEN no_data_found THEN 12 DBMS_OUTPUT.PUT_LINE ('pid does not exist'); 13 END; 14 /

Procedure created.

Run with exception

SQL> execute sel_patient('77');pid does not existPL/SQL procedure successfully completed.

• Error message comes from exception• Exception handled so successful completion

– Even though nothing useful achieved (no pid ’77’)

Retrieval of Complete Row

• Declare variable (instead of pname_var):

pat_row patient%ROWTYPE;• Pat_row is a rowtype

– Holds one row of patient data

– Types as in patient table

– Refer to columns by pat_row.column• e.g. pat_row.pname addresses:

– Column pname in row pat_row

• Use separator || for multiple items in output

Revised Procedure with Rowtype

SQL> CREATE OR REPLACE PROCEDURE sel_patient2 2 (pat_id in char) AS 3 pat_row patient%ROWTYPE; 4 BEGIN 5 select * into pat_row from patient 6 where pid = pat_id; 7 DBMS_OUTPUT.PUT_LINE ('Name is:' || pat_row.pname 8 || 'Address is:' || pat_row.address); 9 EXCEPTION 10 WHEN no_data_found THEN 11 DBMS_OUTPUT.PUT_LINE ('pid does not exist'); 12 END; 13 /

Execution of Procedure

SQL> execute sel_patient2('1');

Name is:Fred Address is:Newcastle

Selections of Multiple Rows

• PL/SQL deals with one row at a time

• If SELECT potentially retrieves more than one row:– Procedure still compiles– Will work with retrieval of 0 or 1 row– Will fail with more than 1 row

• Consider retrieval on patient name

Procedure for Retrieval on Name

SQL> CREATE OR REPLACE PROCEDURE sel_patient3 2 (pat_name in char) AS 3 pat_row patient%ROWTYPE; 4 BEGIN 5 select * into pat_row from patient 6 where pname = pat_name; 7 DBMS_OUTPUT.PUT_LINE ('Id is:' || pat_row.pid 8 || 'Address is:' || pat_row.address); 9 EXCEPTION 10 WHEN no_data_found THEN 11 DBMS_OUTPUT.PUT_LINE ('pname does not exist'); 12 END;

Execution of Procedure

SQL> execute sel_patient3('Fred');Id is:1 Address is:NewcastlePL/SQL procedure successfully completed.*********************************SQL> execute sel_patient3('smith');BEGIN sel_patient3('smith'); END;*ERROR at line 1:ORA-01422: exact fetch returns more than requested number of rowsORA-06512: at "CGNR1.SEL_PATIENT3", line 5ORA-06512: at line 1 ‘Fred’ appears once in patient; ‘smith’ appears twice (in my current data). Can use

predefined exception too_many_rows to avoid error.

Cursors

• Cannot rely on luck with searches which may retrieve multiple rows

• Declare cursor (before BEGIN) as the select statement

• Have in executable part:– Open cursor– Process set, row by row, until exit– Close cursor

• Can have multiple cursors

Cursor Declaration

CURSOR p IS

select * from patient

where pname = pat_name;

• The variable p addresses– the set defined by the SELECT statement

• No INTO are needed here

Cursor Executable

OPEN p;

FETCH p INTO pat_row;

EXIT WHEN p%NOTFOUND;

DBMS_OUTPUT.PUT_LINE ('Id is:' || pat_row.pid

|| 'Address is:' || pat_row.address);

END LOOP;

CLOSE p;

Explanation• OPEN p

– Retrieves set of rows satisfying select statement– Sets pointer to 1st row

• LOOP– Start of instructions for processing a row

• FETCH– Transfers data from current row to variables– Sets pointer to next row

• EXIT WHEN p%NOTFOUND– Exits loop when no row was transferred in last fetch

• END LOOP– Ends processing of current row; returns to LOOP

• CLOSE p – Closes cursor and releases resources

Processing of Data

• Within FETCH and END LOOP– Can do any processing required for application

• Statistical calculations• Re-packaging of data• Complex reports• Transfers to other tables• Integrity checks• Amalgamations of data from other cursors

• Exception handling is through cursor attribute %notfound– not in SELECT statement

Complete ProcedureCREATE OR REPLACE PROCEDURE sel_patient4 (pat_name in char) ASpat_row patient%ROWTYPE;CURSOR p IS select * from patientwhere pname = pat_name;BEGINOPEN p;LOOP

FETCH p INTO pat_row;EXIT WHEN p%NOTFOUND; DBMS_OUTPUT.PUT_LINE ('Id is:' || pat_row.pid|| 'Address is:' || pat_row.address);

END LOOP; CLOSE p; END; /

Execution

SQL> execute sel_patient4('smith');Id is:42 Address is:grimsbyId is:43 Address is:grimsby

SQL> execute sel_patient4('Fred');Id is:1 Address is:Newcastle

SQL> execute sel_patient4('Nigel');

Useful Reference Book

• Oracle 9i: PL/SQL Programming– Develop Powerful PL/SQL Applications

• by Scott Urman

• Oracle Press

• McGraw-Hill (2002)

Semester 2/ Week 20

PL/SQL Review

• From perspective of assignment 5

• Plus previous exercises

Exercise week 19 - Declare Section

• CREATE OR REPLACE PROCEDURE ...– age number;– mondiff number;– daydiff number;– action_too_high exception;– vdate_too_early exception;– pragma ...

• Local variables for use within procedure– age, mondiff, daydiff for age calculations

• Exceptions to be RAISEd during run if business rule broken

• Pragma for compiler instructions (not important to logic)

Executable - Age Calculation• select * INTO pat_row from patient where

pname = pat_name;– places a row from table into pat_row PL/SQL– if row not found, constraint exception by systemage:=extract(year from sysdate) - extract(year from pat_row.dobirth);

mondiff:=extract(month from sysdate) - extract(month from pat_row.dobirth);daydiff:=extract(day from sysdate) - extract(day from pat_row.dobirth);

IF mondiff < 0 THEN age:= age - 1; END IF;

IF mondiff = 0 AND daydiff < 0 THEN age:= age - 1; END iF;

• Note use of SQL predefined procedure extract

Executable - Transfer of DataIF age >= 50 THEN DBMS_OUTPUT.PUT_LINE ('Inserting pid: ' ||

pat_row.pid);

INSERT INTO patover50(pid,pname,address,age)

VALUES(pat_row.pid, pat_row.pname, pat_row.address, age);

END IF;

• If PL/SQL variable age is over 50 then:– output message– insert into table– inserted values are

• pat_row variables from patient• PL/SQL variable -- age

Assignment 5

• Assignment 5: CM503/CM517• Set: end week 19 (Friday, 19th March 2004)• Assessed: in seminars during week 21 (Monday,

29th March - Thursday, 1st April 2004)• The assignment extends work done in weeks 18

and 19. The solutions for these exercises are on Blackboard.

• The client now wishes to revise the add vaccination procedure (call it say add_vacc2) so that it does the following in total:

Business Rules• To raise exceptions when rules broken:1.No more than two vaccinations ... per day. 2.If over 75 in age, no more than one vaccination …

per day. 3.The vaccination date >= 1st January 2003.4.The combination of cholera and typhoid on same

day ... is not permitted.5.A vaccination which is within safe period ... shall

not be given.

Exceptions

• Exception:– “We have a problem”– Fatal error for procedure

• Need to get out• Give useful info to user

– In PL/SQL never attempt to recover position• Rollback

– undo any changes made– release resources

Strategy• Insert data first• Then look at consistency (against rules)• So add vaccination data

– Then look at business rules• For instance:

– cannot assess number of actions– or see whether both cholera and typhoid given– until new data is added

• If new data breaks rules, then raise exception and undo changes (rollback)

Types of Exception• Business Rules

– Typically rules not specified in CREATE TABLE• may vary slightly from procedure to procedure

– Determined by inspecting data when provisionally added– RAISE exception by code in procedure when rule broken– Give error message to user– Rollback (Undo changes)

• Table (General) Constraints– Specified in CREATE TABLE– Exception is raised by system automatically– Can handle with WHEN OTHERS …– Display error code (SQLCODE) and Rollback

Parameters

• “Values in/out to/from the outside world”• Typed as:

– IN (input), OUT (output), INOUT (both)

– char, number, date (broad-brush types)

• Input as for add_vacc: – patient id, visit date, action number, vaccine given.

• So procedure always runs with 4 input values

Data Transfers• Common to write verified values to variety of

tables (logs, checks, safety)• For each vaccination given (i.e. each validated

treatment):a. update the vaccinations tableb. insert into the table VACC_RECORD (which you

should create) the following:pid, pname, address, age, vdate, action, vaccinated,

expiry_date– 1st, 5th, 6th, 7th given as input parameters

Remaining Values?• 2nd, 3rd from look up in Patient table• 4th by calculation on dobirth in Patient table• 8th by calculation on valid_for

– could use SQL function Add_Months(date,lasting_year*12)

• Need to retrieve patient data for supplied pid– Calculate age for pid

• Need to retrieve information on vaccine in valid_for table

Implementing Business Rules

• Need to write SQL statement to determine whether rule holds or fails

• Often declared as CURSOR (e.g. c, d, e, or descriptive name)– Could have one cursor per rule

• but may have multiple rules on one cursor

– Then OPEN a cursor (after INSERT of new data made):

• FETCH data

• Look at cursor attributes

Cursor Attributes

• c%FOUND– means last FETCH from cursor c successful

• c%NOTFOUND– means last FETCH from cursor c unsuccessful

• c%ROWCOUNT– gives running total of number of rows retrieved

in FETCHes so far

Testing of Cursor Attributes• WHEN c%NOTFOUND THEN EXIT;

– terminates current loop when rows finished

• IF d%FOUND THEN RAISE vacc_already; END IF;– if find safe vaccination, raise exception

• IF e%ROWCOUNT > 12 THEN RAISE too_many_modules; END IF;– on 13th fetch of module row, raise exception– immediately goto EXCEPTION part, too_many_modules

will be of exception type, have pragma and an exception handler

Basic Procedure Structure

• CREATE … name procedure, parameters .. AS

• [DECLARE] … PL/SQL variables, exception variables, pragmas, cursors

• BEGIN … general messages, INSERTs, deriving data, testing of cursors against rules, commit (if data satisfies rules) … END

• EXCEPTION … handle business rule problems and general constraint violations, rollback

Oracle’s PL/SQL Approach• Fairly typical for relational databases• Previous slide shows general structure• Can use experience here in:

– other SQL systems (procedure is standard)– scripting systems (e.g. PHP/Oracle or PHP/MySQL)

• Useful for placements and final-year projects

• IN JDBC and other Java-based embedded approaches:– PL/SQL type system would be Java type system

Semester 2/ Week 21

Database Design

• After producing logical design with elegant maintainable structures:

• Need to do physical design to make it run fast.• Performance is often more important in database

applications than in more general information system design:– Emphasis on number of transactions per second

(throughput)

Database Design Methodologies

• Produce default storage schema• May be adequate for small applications• For large applications, much further tuning

required• Physical design is the technique• Concepts: memory (main/disk), target disk

architecture, blocks, access methods, indexing, clustering.

Aims of Physical Design

• Fast retrieval – usually taken as <= 5 disk accesses– Since disk access is very long compared to other access

times, number of disk accesses is often used as indicator of performance

• Fast placement – within 5 disk accesses– Insertion of data, may be in middle of file not at end

– Deleting data, actual removal or tombstone

– Updating data, including primary key and other data

Retrieval/Placement

• Distinguish between actions involving primary and secondary keys

• Primary key is that determined by normalisation– May be single or multiple attributes– Only one per table

• Secondary keys – Again may be single or multiple attributes– Many per table– Include attributes other than the primary key

• Complications such as candidate keys are omitted in this part of the course

Access Method

• An access method is the software responsible for storage and retrieval of data on disk

• Handles page I/O between disk and main memory– A page is a unit of storage on disk

– Pages may be blocked so that many are retrieved in a single disk access

• Many different access methods exist– Each has a particular technique for relating a primary

key value to a page number

Processing of Data

• All processing by software is done in main memory

• Blocks of pages are moved– from disk to main memory for retrieval by user– from main memory to disk for storage by user

• Access method drives the retrieval/storage process

Cost Model

• Identify cost of each retrieval/storage operation• Access time to disk to read/write page = D =

seek time (time to move head to required cylinder)

+ rotational delay (time to rotate disk once the head is over the required track)

+ transfer time (time to transfer data from disk to main memory)

• Typical value for D =15 milliseconds (msecs) or

15 x 10-3 secs

Other times:

• C = average time to process a record in main memory = 100 nanoseconds (nsecs) = 100 x 10-9 secs.

• R = number of records/page

• B = number of pages in file

• Note that D > C by roughly 105 times

Access Method I: the Heap

• Records (tuples) are held on file:– in no particular order

– with no indexing

– that is in a ‘heap’ – unix default file type

• Strategy:– Insertions usually at end

– Deletions are marked by tombstones

– Searching is exhaustive from start to finish until required record found

The Heap – Cost Model 1

• Cost of complete scan: B(D+RC)– For each page, one disk access D and process of R records

taking C each.– If R=1000, B=1000 (file contains 1,000,000 records)– Then cost = 1000(0.015+(1000*10-7)) =

1000(0.0150+0.0001) = 1000(0.0151) = 15.1 secs

• Cost for finding particular record: B(D+RC)/2 (scan half file on average) = 7.55 secs

• Cost for finding a range of records e.g. student names beginning with sm: B(D+RC) (must search whole file) =15.1 secs

• Insertion: 2D + C – Fetch last page (D), process record (C ), write

last page back again (D). – Assumes:

• all insertions at end

• system can fetch last page in one disk access

• Cost = (2*0.015)+10-7 0.030 secs

• Deletions: B(D+RC)/2 + C + D– Find particular record (scan half file -

B(D+RC)/2), process record on page (C ), write page back (D).

– Record will be flagged as deleted (tombstone)– Record will still occupy space– If reclaim space need potentially to read/write

many more pages

• Cost = 7.550 + 10-7 + 0.015 7.565 secs

Pros and Cons of Heaps

• Pros:– Cost effective where many records processed in a

single scan (can process 1,000,000 records in 15.1 secs)

– Simple access method to write and maintain

• Cons:– Very expensive for searching for single records in large

files (1 record could take 15.1 secs to find)

– Expensive for operations like sorting as no inherent order

Usage of Heaps

• Where much of the file is to be processed:– Batch mode (collect search and update requests

into batches)– Reports– Statistical summaries– Program source files

• Files which occupy a small number of pages

Semester 2/ Week 22

Hashing

• One of the big two Access Methods• Very fast potentially

– One disk access only in ideal situation

• Used in many database and more general information systems:– where speed is vital

• Many variants to cope with certain problems

Meaning of Hash

• Definition 3. to cut into many small pieces; mince (often fol. by up).

• Example He chopped up some garlic. • Synonyms dice , mince (1) , hash (1) • Similar Words chip1 , cut up {cut (vt)} , carve ,

crumble , cube1 , divide – From

http://www.wordsmyth.net/live/home.php?content=wotw/2001.0521/wotw_esl

Hash Function

• Takes key value of record to be stored

• Applies some function (often including a chop) delivering an integer

• This integer is a page number on the disk. So– input is key– output is a page number

Simple Example

• Have:– B=10 (ten pages for disk file)– R=2,000 (2,000 records/page)– Keys {S12, S27, S30, S42}

• Apply function ‘chop’ to keys giving:– {12, 27, 30, 42} so that initial letter is discarded

Simple Example

• Then take remainder of dividing chopped key by 10.

• Why divide?– Gives integer remainder

• Why 10?– Output numbers from 0 … 9– 10 possible outputs corresponds with 10 pages for

storage

• In this case, numbers returned are:– {2, 7, 0, 2}

S12, S422

Records (only keys shown)

Disk File: hash table

Retrieval

• Say user looks for record with key S42• Apply hash function to key:

– Discard initial letter, divide by 10, take remainder

– Gives 2

• Transfer page 2 to buffer in main memory• Search buffer looking for record with key

Cost Model

• Retrieval of a particular record:D+0.5RC (one disk access + time taken to search half a

page for the required record)

= 0.015+(0.5*2000*10-7) = 0.0151 secs (very fast)

• Insertion of a record:Fetch page (D) + Modify in main memory (C ) + Write

back to disk (D)

= 0.015+10-7+0.015 0.0300

• Deletions same as insertions

Effectiveness

• Looks very good:– Searches in one disk access– Insertions and deletions in two disk accesses– So:

• Searching faster than heap and sorted

• Insertions and deletions similar to heaped, much faster than sorted

Minor Problem

• Complete scan:– Normally do not fill pages to leave space for

new records to be inserted– 80% initially loading– So records occupy 1.25 times number of pages

if densely packed

1.25B(D+RC) = 1.25*10(0.015+2000*10-7) 0.189 secs (against 0.152 if packed densely)

Larger Problems

• Scan for groups of records say S31-S37 will be very slow

• Each record will be in different page, not in same page.

• So 7 disk accesses instead of 1 with sorted file (once page located holding S31-S37).

Larger Problems

• What happens if page becomes full?

• This could happen if– Hash function poorly chosen e.g. all keys end

in 0 and hash function is a number divided by 10

• All records go in same page

– Simply too many records for initial space allocated to file

Overflow Area

• Have extra pages in an overflow area to hold records– Mark overflowed pages in main disk area

• Retrieval now may take 2 disk accesses to search expected page and overflow page.

• If have overflow on overflow page, may take 3+ disk accesses for retrieval.

• Insertions may also be slow – collisions on already full pages.

• Performance degrades

At Intervals in Static Hashing

• The Data Base Administrator’s lost weekend

• He/she comes in

• Closes system down to external use

• Runs a utility expanding number of pages and re-hashing all records into the new space

Alternatives to Static Hashing

• Automatic adjustment – Dynamic Hashing

• Extendible Hashing– Have index (directory) to pages which adjusts

to number of records in the system

• Linear Hashing– Have family of hash functions

Pros and Cons of Hashing

• Pros:– Very fast for searches on search key (may be 1 disk access)– Very fast for insertions and deletions (often 2 disk accesses)– No indexes to search/maintain (in most variants)

• Cons:– Slightly slower than sorted files for scan of complete file– Requires periodic off-line maintenance in static hashing as

pages become full and collisions occur– Poor for range searches

Usage of Hashing

• Applications involving:– Searching (on key) in files of any size for single

records – very fast – Insertions and deletions of single records

• So typical in On-line Transaction Processing (OLTP)

Semester 2/ Week 23

B-trees

• What does the B stand for?• Balanced, Bushy or Bayer (apparently not clear)• Balanced means distance from root to leaf node is

same for all of tree• Bushy is a gardening term meaning all strands of

similar length• Bayer is a person who wrote a seminal paper on

B+-Trees

• There are some variants on B-trees.• We deal here with B+-trees where all data

entries are held in leaf nodes of tree• The two terms are interchangeable for our

purposes.• B+-trees are dynamic index structures• The tree automatically adjusts as the data

changes

B+-tree

• A B+-tree is a:– Multiway tree (fan-out or branching factor >=

2, binary tree has fan-out 2) with– Efficient self-balancing operations

• Minimum node occupancy is 50%

• Typical node occupancy can be greater than this but initially keep it around 60%

B+-tree Diagram: index+sequence set

Sequence Set The dataentries

Index set

Internal structureas in root

Structure

• Index set (tree, other than leaf pages) is sparse– Contains key-pointer pairs– Not all keys included

• Sequence set (leaf pages) is dense– All data entries included– Data held is key + non-key data– Pages are connected by double linked lists– Can navigate in either direction– Pages are usually in sequence of primary key– No pointers are held at this level

Parameters

• d =2 says that the order of the tree is 2• Each non-terminal node contains between d and 2d

index entries (except root node 1…2d entries)• Each leaf node contains between d and 2d data

entries• So tree shown can hold 2, 3 or 4 index values in

each node• How many index pointers? • Format is always one more pointer than value. So 3,

4 or 5 pointers per node.

Capacity of Tree

• d = 2– One root node – can hold 2*d = 4 records– Next level – 5 pointers from root, each node holds

maximum 4 records = 20 records in 5 nodes– Next level – 5 pointers from each of the 5 nodes above,

each node maximum 4 records = 100 records in leaf nodes

• In practice will not pack as closely• But d=2, 3-levels – potentially addresses 100 records• If all held on disk, 3 disk accesses

Capacity of Tree

• High branching factors (fan-out) increase capacity dramatically (see seminar).

• So tree structure with high branching factor can be kept to a small number of levels

• Bushy trees (heavily pruned!) mean less disk accesses

• Root node at least will normally be held in main memory

26 33 7 13 52 65

Index multi-levels

Data Entries

1 2 3 4 7 9 13

Example of a B+-tree Order (d) = 2

Search Times - Single Record

• Always go to leaf node (data entries) to retrieve actual data.

• Root node in main memory• Cost = (T-1)*(D+0.5RC) for single record

– T = height of tree– (T-1) as root node is already in main memory

• If d=2, then R=4 (max), cost = (3-1)*(0.015+(0.5*4*10-7))

= 2*0.0150002 = 0.0300004 secs

Search Times - Whole File/Ranges

• Lowest cost is B(D+RC)– B is number of pages assuming data is packed

to 100% capacity, D is time for disk access, R us number of records/page, C is cost for processing each record in memory

• If Sequence Set is packed at 60%, then cost is: (100/60) * B(D+RC)

• Ranges are held in proximity in Sequence Set -- search for these is fast

Insertions

• First add into sequence set– Search for node– If space add to node

• Leave index untouched

– If no space• Try re-distributing records in sibling nodes

– Sibling node is adjacent node with same parent

• Or split node and share entries amongst the two nodes• Will involve alteration to index

• Insertions tend to push index entries up the tree • If splits all the way up and root node overflows, then

height of tree T increases by one.

Deletions• First delete from sequence set

– Search for node– Delete record from node– If node still has at least d entries

• Leave index untouched

– If node has less than d entries• Try re-distributing records in sibling nodes

– Sibling node is adjacent node with same parent

• Or merge sibling nodes• Will involve alteration to index

• Deletions tend to push index entries down the tree • If merges all the way up and root node underflows, then T

decreases by one.

Usage of B+-trees• General Purpose• Single-record Searching

– not as fast as hashing but acceptable with usually five or less disk accesses

• Processing ranges of key values– faster than hashing as records held in order of key in

Sequence Set

• Robust as while data changes– algorithms for inserts/deletes give automatic self-

balancing of tree (no re-organisations)

Semester 2/ Week 24

Revision

• First, remarks on B+-tree Exercises.

• Can think of B+-tree as generalised binary search tree

• Binary search trees are:– good memory structures

• fast tree traversal

– poor disk structures

• if every pointer access involves a disk access

Binary Search Tree compared with B+-tree

• d = 1• then root node has:

– 1...2 data entries– 2…3 pointers

• Leaf node has 1…2 data entries, no pointers• Intermediate node has 1…2 data entries, 2…3

pointers• So binary search tree is special case of B+-tree

with d=1 and properties in red above.

Maximum Capacity (records) of B+-trees

Order (d) Fan out Maxrecords/node

Capacityat 5 levels

1 2…3 2 2*3*3*3*3 = 162

2 3…5 4 4*5*5*5*5 =2500

10 11…21 20 20*21*21*21*21= c4x106

50 51…101 100 100*101*101*101*101 = 1010

200 201…401 400 400*401*401*401*401 = 1013

Cost of Searching B+-trees (order = d)

• Number of disk accesses is number of levels (T) minus 1:– all data held in leaves of tree– top level (root) is in main memory

• Assume search half of each node on average to find a particular record (0.5RC)

• Cost = (T-1)*(D+0.5RC) where D = 0.015 secs. C = 10-7 secs, R = a number from d….2*d

Insertions

• Put record into sequence in leaf nodes• If inserted node has <=2*d records, OK• If inserted node has >2*d records:

– first try to redistribute records between inserted node, its parent and immediate sibling

– otherwise split inserted node into two nodes and pass intermediate record (key) up one level in tree

Balance of Course - 1

• SQL Scripting– bridging level 1 and level 2– reinforcing SQL knowledge– use of variables– pre-defined functions– Important area for:

• prototyping applications

• some production environments (web-based)

• Database Fundamentals– Transactions– Concurrency– Security– Procedures– Important for:

• production multi-user systems

• SQL Procedures (PL/SQL)– Differences from scripting– Business rules– Constructions– Parameterised SQL statements– Exception handling– Important for:

• handling business rules across an enterprise

• Access Methods– physical side – placement and retrieval of data– choice of algorithms– two main types

• B+-trees• Hashing

– Important for:• efficient access to data

Exam Paper

• Database paper counts 40%– database assignments 10%– java paper 35%– java assignments 15%

• Database paper is independent of 1st semester work– exam will be on 2nd semester material

• No previous database paper in this area

Type of exam

• Closed book– no materials can be carried in

• Two hours duration

• Four questions on paper

• Three questions should be attempted

Type of question

• 20 marks in total on each question– 8-10 marks typically for testing basic

knowledge of a subject (definitions, simple derivations, material from lecture notes)

– 10-12 marks typically for problem solving i.e. taking a scenario and providing a solution in terms of code or algorithm.

Recommended Strategy

• Be familiar with the lecture notes• Be familiar with existing exercises and their

solutions (including assignments) on BB• You should assume that the exam will test your

understanding of the lecture notes and exercises• Problems in the exam environment will generally

be simpler than those in the assignment environment.

Problem Solutions

• Under desk-bound exam conditions it is not possible to test program code exhaustively.

• Also student does not have feedback from compiler or run-time system.

• Ideal expectation is that code will:– provide basis of implementation– with feedback from live system, be readily

modified to provide an acceptable deliverable.

Data Structures, Algorithms and Database Programming Semester 2/ Weeks 13-24 Database Programming Nick Rossiter/Emma-Jane Phillips- Tait.

Documents

ITN 170 - MySQL Database Programming 1 Lecture 3 :Database.....

System i: Database Database programming

Database Programming Summary

IBM i: Database Distributed database programming

Database Programming with JDBC

(Part 3) Database Programming

Alan Rossiter President, Rossiter & Associates...

Database Distributed database programming - IBM i

Class 8 - Database Programming

Application Programming Database Manager

Visual Programming - Database Architecture

Database programming in Java

Database Modelling Week 2: Directed Reading Relational...

Database Design for Object- Relational Systems I. Nested...

Database Programming in CSharp

Java Database Programming with...