8/9/2019 2.Dbms Systems
1/23
8/9/2019 2.Dbms Systems
2/23
MS Systems http://www.lightenna.com/book/export/s
3 03/03/20
2.1.02b. QBE visual exampleRecord advancingQuery designing
Finite domain attributesWeb search parallel
2.1.03. QBE text-format exampleP. print, I. insert, D. delete, U. update_VARNAME, copy field value into variable
2.1.05. SQLStructured Query Language
(SQL or SEQUEL)Wikipedia reference
Success of relational databasesDeveloped for SystemR at IBM
ANSI standardisedSQL-1986 (SQL1), ongoing extensionSQL-1992 (SQL2), current version (Oracle 9i)SQL-1999 (SQL3), regular expression matching, recursive queriesSQL-2003, XML features, auto-generated columns
2.1.05b. SQL command syntaxWhere follows here is a brief summary
8/9/2019 2.Dbms Systems
3/23
MS Systems http://www.lightenna.com/book/export/s
3 03/03/20
Oracle syntaxSimilar but not identical to MySQL/MSSQL
General familiarityQuery writing best learnt by doing itLecture live-exampleCoursework 1 will be SQLOracle (9i) SQL referenceMySQL (5.0) SQL reference
2.1.05c. SQL in applicationKeyword oriented languageKeywords not congruous with Relational modelLots of different ways to write SQL
Analogous to C/Java formattingif (b==2) { a=1; } else { a=0; }
Recommend using case to differentiate attributes and keywords
SELECT colour, size, shape FROM fruit WHERE weight>22;Oracle user accounts on Teaching databaseNamespace references, e.g. shared.cars
2.1.06. SQL Create schemaData definition commandsCREATE
SCHEMA AUTHORIZATION
or workspaceBeware of namesName collisions produce odd behaviours
SQL Schema embraces Tables (relations), constraints, views, domains, authorizations
2.1.07. SQL Create tableCREATE TABLE
.(
)CREATE TABLE example (Oracle)Tables can (and should) be indexed by usere.g. .Normal login implies usernameNon-local table access
8/9/2019 2.Dbms Systems
4/23
MS Systems http://www.lightenna.com/book/export/s
3 03/03/20
2.1.08. Data types and domains (Oracle)Numeric
ENUMNUMBER, NUMBER(i), NUMBER(i,j)
Formatted numbers, i precision, j scale
(number of digits total, after decimal point)Character-stringCHAR(n) - n is lengthVARCHAR2(n) - n is max
DESCRIBE output exampleMulti-database comparison of DatatypesDatabase legacy: limited storage necessitated efficient storageDoes it need to be efficient anymore?
You might consider all SQL types as being conceptually similar to attribute types in the relationalmodel, although in reality the implementation of these types in a DBMS only approximates the
mathematical purity of unordered domain sets etc.
2.1.08b. Data types and domains (MySQL)Numeric
TINYINT, INT, INT UNSIGNEDFLOAT, DOUBLE, DECIMALENUMCharacter-string
CHAR(n) - n is lengthVARCHAR(n) - n is maxTINYTEXT, TEXTBeware different default/maximum lengths to Oracle
BLOBMulti-database comparison of Datatypes
2.1.09. Time-based data typesDate and Time
DATETen positions, components YYYY-MM-DD
TIMEEight positions, components HH:MM:SS
TIME(i)Time fractional seconds precisionAdds i+1 positions
TIMESTAMPoptionally WITH TIME ZONE
Very sensitive to syntactical ambiguities
8/9/2019 2.Dbms Systems
5/23
MS Systems http://www.lightenna.com/book/export/s
3 03/03/20
day/month/year/hour/minute separators
2.1.10. DROPingDROP DROP SCHEMA CASCADE
drops all workspace tables, domainsDROP TABLE RESTRICTonly drops table if not referenced in any constraints/views
Notion of cascadingTable links
2.1.11. ALTERingSchema evolutionDesign sideALTER TABLE . ADD ;Example
ALTER TABLE uni.student ADD hall VARCHAR(32);Upper and lower case syntaxNaming conventions
2.1.12. QueriesHelper interfaces
HeidiSQL/phpMyAdmin/Sword/SQLplusDesign/perform a lot of routine queries for youImportant to learn SQL, reinforcementDesigning select queries is more difficultVisual interfaces still lacking in this area
Select queries in SQLBasic singletsRenamingQueries with JoinsNested queries
2.1.13. SQL QueriesSELECT statementSimilar to relational data model SELECT then PROJECT
SELECT FROM
8/9/2019 2.Dbms Systems
6/23
MS Systems http://www.lightenna.com/book/export/s
3 03/03/20
WHERE ;
2.1.14. SQL QueriesSELECT
FROM R,S,TWHERE DNO = 10
equivalent top (sDNO=10 (R X S X T))True-false evaluation tuple by tupleWHERE clause as compound logical statement
2.1.15. SQL QueriesProduces a relation/set of tuplesCan be used to extract a single tuplee.g. SELECT bday, age
FROM studentWHERE fname='Tim' AND lname='Smith'Result = (13-05-80, 20)
Argument quoting (')SQL poisoningNot nullNot numeric values
MySQL Attribute quoting (`)Hypothetical attribute `all`, all, and ALL
8/9/2019 2.Dbms Systems
7/23
MS Systems http://www.lightenna.com/book/export/s
3 03/03/20
SQL poisoning is a vulnerability exposed by inadequate escaping of arguments/variables used to composeSQL queries.
E.g. Tim in previous example, could be Tim'; DELETE FROM student;' SELECT * FROM student WHERE 1
2.1.16. Renaming and referencingAS keyword(Partial) Attribute renaming in projection list
SELECT fname AS firstName, minit, lname AS surname...Role names for relations
SELECT S.FNAME, F.FNAME, S.LNAMEFROM STUDENT AS S, STUDENT AS FWHERE S.LNAME=F.LNAME
(Total) Attribute renaming in FROMSELECT s.firstName, s.surname
FROM student AS s(firstName,surname,DOB,NINO,tutor)
Wildcards (SELECT s.* FROM...)
2.1.17. SQL TablesRelations are bags, not sets
e.g. projection of non-key attributesSet cannot contain duplicate item/repetitionDuplicates exist in bags and be:
SELECT DISTINCT (eliminated)SELECT ALL (ignored/kept)
2.1.18. Queries and JoinsRelational database allows inter-related dataSQL select FROM gives Cartesian productWHERE clause defines join condition
SELECT proj.pnum, mgr.ssn
FROM project AS proj, employee AS mgrWHERE proj.mgrssn = mgr.ssn;
Alternatively, explicitly define join (note type)SELECT project.pnum, employee.ssnFROM project INNER JOIN employeeON project.mgrssn = employee.ssn;
2.1.18b. Outer joins
8/9/2019 2.Dbms Systems
8/23
MS Systems http://www.lightenna.com/book/export/s
3 03/03/20
Outer joins are crucial in the real-worldDatabases often contain NULLs (3VL)Analysis of where the crucial data is across a relationshipPrevious example, only get project data for managed projects
SELECT project.*, employee.*FROM project INNER JOIN employeeON project.mgrssn = employee.ssn;
2.1.18c. Outer joins (cont)Scale of loss isn't always instantly obviousNULLs often used unpredicablyMay want project information, even if no employee attached as manager
SELECT project.*, employee.*
FROM project LEFT OUTER JOIN employeeON project.mgrssn = employee.ssn;
8/9/2019 2.Dbms Systems
9/23
MS Systems http://www.lightenna.com/book/export/s
3 03/03/20
2.1.19. 2y and 3y joinsQueries can encapsulate any number of relations
Even one relation many times (in different roles)Relationship chainAcross many relations
Tuples as Entities OR Relationshipse.g. Employee -> Works_on -> Project -> Department ->Manager
2.1.20. Recursive closureCant be done in SQL2Recursive relationshipsUnknown number of stepsSQL2 cant generalise in single query
2.1.21. Nested queriesEssential one or more (inner) queries within an (outer) queryInner and outer queryNot to be confused with inner and outer joinsInner query can go in three places
SELECT clause (projection list)Must return a single value, then aliased as attribute in outer result
FROM clauseInner query result used as standard table in FROM cross product
WHERE clause
2.1.21b. Nested query exampleUse of query result as comparator for other (outer) query
SELECT DISTINCT courseFROM dept WHERE course IN (
SELECT d.courseFROM dept AS d, faculty AS f, student AS sWHERE d.ownfac=f.id AND s.owndept=d.idAND f.name='Eng' AND s.year='3'
) OR course IN (SELECT courseFROM deptWHERE code LIKE 'COMS3%');
8/9/2019 2.Dbms Systems
10/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
2.1.22. Bridging SQL across 3 tiersThree tier database designChanging role of DBMSIndicesAggregate functions (conceptual)
Over bags and sub-bagsCreating and updating views (ext)SQL embedding
In this subsection we look at the different roles SQL play across the three tiers of database design. We discussthe areas in which SQL is lacking and how those difficiencies can be complemented by embedding SQL inother languages.
2.1.25. IndicesLow/Internal levelIndex by one attributeFor queries selecting by that attribute:
Faster tuple access (ordered tuples)Reduces database memory load
Small cross product relation, only crosses requisitesAccelerates query resolution time
CREATE INDEX Index_Name ON RELATION(Attribute);
2.1.26. Aggregate functionsRun over groups of tuplesTakes a projected attribute list as an argumentProduce relation with single tupleSUM, MAX, MIN, AVG, COUNTe.g. AggFunc over all tuples
SELECT SUM(SALARY), MAX(SALARY), MIN(SALARY)
8/9/2019 2.Dbms Systems
11/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
FROM EMPLOYEE;Single attribute lists (distinct values)Multi-attribute lists (granularity of distinct values by pairing)
2.1.27. Aggregates over sub-bags
Can run over subsets of tuplesGROUP BY keywordSpecifies the grouping attributesNeed to also appear in projected attr_listShow result along side value for group attre.g. AggFunc over subgroups
SELECT dno, COUNT(*)FROM employeeGROUP BY dno
Quick SQL check, do all attributes in the SELECT projection list appear in the GROUP BY projection list.
2.1.28. Creating viewsViews are partial projectionsVirtual relations, or views of live relationsUpdate synchronised
CREATE VIEW AS
Real relation could be a query resultClever bit is the change propagationUPDATEs made to the view dataset are flooded back to relations
INSERT and DELETE behaviour needs to be definedNon-trivial as INSERT into view (virtual relation) may leave holes in real relation
2.1.29. Embedding SQLSQL (alone) can do lots of clever things in one expressionBut can only execute a single expressionCan structure SQL commands into proper programming languagesJava Database Connection (JDBC)
javac, then java VMCOBOL, C or PASCAL
precompiled with PRO*COBOL or PRO*CProcedure Language (PL/SQL)
Oracle/MySQL procedural languageStored procedures can take parameters
2.2. Internals
8/9/2019 2.Dbms Systems
12/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
In this lecture we look at...[Section notes PDF 180Kb]
2.2.01. IntroductionDatabase internals (base tier)
RAID technologyReliability and performance improvementRecord and field basicsHeaders to hashingIndex structures
2.2.01b. Machine architecture (by distance)Distance from chip determines minimum latencySpeed of light is a constantImpact of bus frequencies
IDE (66,100,133 Hz)PCI, PCI-X (66,100,133 Hz)PCI Express (1Ghz to 12Ghz)
Impact of bus bandwidthsPCI (32/64 bit/cycle, 133MB/s)PCI Express (x16 8.0GB/s)
Here's a link from Intel showing a machine architecture with signal bandwidths: Intel diagram
2.2.01c. Machine architecture (by capacity)Capacity increased with distanceStaged architecture as compromiseSpeed, time/distance
8/9/2019 2.Dbms Systems
13/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
Also cost, heat, usage scale
2.2.02. Database internalsStored as files of records (of data values)
Auxiliary data structures/indices1y and 2y storage
memory hierarchy (pyramid diagram)volatility
Online and offline devicesPrimary file organisation, records on disk
Heap - unorderedSorted - ordered, sequential by sort keyHashed - ordered by hash keyB-trees - more complex
2.2.03. Disk fundamentalsDBMS task
linked to backup1y, 2y and 3y
e.g. DLT tapeChanging face of current technology
Impact of inexpensive harddisksFlash memory devices (CF, USB)
Random versus sequential accessLatency (rotational delay) andBandwidth (data transfer rate)
2.2.04. RAID technologyRedundant Array of Independent DisksData striping
Blocks (512 bytes), bits and transparencyReliability (1/n)
8/9/2019 2.Dbms Systems
14/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
Mirroring/shadowingError correction codes/parity
Performance (n)Mirroring (2x read access)Multiple parallel access
2.2.05. RAID levels0 No redundant data1 Disk mirrors (performance gain)2 Hamming codes (also detect)3 Single parity disk 4 Block level striping5 and parity/data distribution6 Reed-Soloman codes
2.2.06. Records and fieldsDBMS specific, generallyRecords (tuples) comprise fields (attributes)File is a sequence of recordsVariable length records
Variable length fieldsMulti-valued attributes/repeating fieldsOptional fieldsMixed file of different record types
2.2.07. Fieldsrecords -> files -> disksFixed length for efficient accessNetworking issuesDelimit variable length fields (max)Explicit record/field lengthsSeparators (,;,:,$,?,%)Record headers and footersSpanning
block boundaries and redundancy
2.2.08. Primary organisationBias data manipulation to 1y memory
Load record to 1y, write back Cache theorem
8/9/2019 2.Dbms Systems
15/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
Data storage investment, rapidity of accessoptimisations based on frequent algorithmic use
Ordering, ordering field/key fieldHashing
2.2.09. Indexes/indicesAuxiliary structures/secondary access pathSingle level indexes (Key, Pointer)File of recordsOrdering key fieldPrimary, Secondary and Clustering
2.2.09b. Primary index examplePrimary index on simple tableOrdering key field (primary key) is IntegerPointers as addressesSparse, not dense
2.2.10. Primary Index file (as pairs list)Two fields Ordering key field and pointer to block Second example, indexing candidate key Surname
K(1)="Barnes",P(1) -> block 1Barnes record is first/anchor entry in block 1
K(2)="Smith",P(2) -> block 6K(3)="Zeta",P(3) -> block 8
Dense (K(i) for every record), or SparseEnforce key constraint
2.2.10b. Clustering index example
8/9/2019 2.Dbms Systems
16/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
Clustering indexOrdering key field (OKF) is non-keyEach entry points to multiple records
2.2.11. Clustering Index (as pairs list)
Two fields Ordering non-key field and pointer to block Internal structure e.g. linked list of records
Each block may contain multiple recordsK(1)="Barnes",P(1) -> block 1K(2)="Bates",P(2) -> block 2K(3)="Zeta",P(3) -> block 3
K(i) not required to havea distinct value for each recordnon-dense, sparse
2.2.11b. Secondary Index exampleIndependent of primary orderingCan't use block anchorsNeeds to be dense
2.2.12. More indices
8/9/2019 2.Dbms Systems
17/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
Single level indexordered index filelimited by binary search
Multi level indicesbased on tree data structures (B+/B-trees)
faster reduction of search space (log fobi)
2.2.13. IndicesDatabase architecture
Intension/extensionIndexes separated from data file
Created/disgraded dynamicallyTypically 2y to avoid reordering records on disk
2.2.14. Query optimisationFaster query resolution
improved performancelower loadhardware cost:performance ratio
Moore's lawQuery process chainQuery optimisation
2.2.15. Query processingCompile-track familiarity
Scanner/tokeniser - break into tokensParser - semantic understanding, grammarValidated - check attribute names
Query treeExecution strategy, heuristic
Query optimisationIn (extended relational) canonical algebra form
2.2.16. Query optimisationSQL query
SELECT lname, fnameFROM employeeWHERE salary > (
SELECT MAX(salary)FROM employee
8/9/2019 2.Dbms Systems
18/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
WHERE dno=5);
Worst-caseProcess inner for each outer
Best-baseCanonical algrebraic form
2.2.16b. Query optimisation implementationIndexing accelerates query resolutionClosed comparison (intra-tuple)
all variables/attributes within single tuplee.g. x < 100
Open comparison (inter-tuple)variables span multiple tuples
Essentially a sorting problemInternal sorting covered (pre-requisites)
Need external sort for non-cached lists
2.2.17. Query optimisationExternal sorting
Stems from large disk (2y), small memory (1y)Sort-merge strategy
Sort runs (small sets of total data file)Then merge runs back together
Used inSELECT, to accelerate selection (by index)PROJECT, to eliminate duplicatesJOIN, UNION and INTERSECTION
2.3. B-treesIn this lecture we look at...[Section notes PDF 159Kb]
2.3.01. Hash tablesUsed to implement IndiciesO(n) accessOrdering Key Field (K) as argument to Hash function H()Address H(K) maps to pointer
8/9/2019 2.Dbms Systems
19/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
2.3.02. Tree structureTree revisionNode basedBranching nodes/leaf nodesParent/child nodesRoot nodeCardinality
2.3.03. Multi-level indicesMulti-level indicesOne index indexes another
8/9/2019 2.Dbms Systems
20/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
Implemented by multiple hash-tables pairs(data far right)
2.3.04. Index zippingCollapsing a single indexTwo columns become one pairs sequentially storedCommon in the Elmasri
8/9/2019 2.Dbms Systems
21/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
2.3.05. B-treeParitioning structureEach node contains keys & pointersPointers can be:
Node pointers - to child nodesData pointers - to records in heap
Number of keys = Number of pointers - 1Every node in the tree is identical
2.3.06. B+ treesSimilar to B-treesDifferent types of nodes
8/9/2019 2.Dbms Systems
22/23
MS Systems http://www.lightenna.com/book/export/s
23 03/03/20
Branching nodesLeaf nodes
Each branching node has:At most U children (max U)At least L children (min L)U = 2L, or U = 2L-1
2.3.07. Properties of B+ treesBalancedAll leaf nodes at same levelRecord search takes same time for every record
Partitioning needs to be comprehensiveB-tree: a 1 < x < a 2B+tree: a 1
8/9/2019 2.Dbms Systems
23/23
MS Systems http://www.lightenna.com/book/export/s