Top Banner

of 23

2.Dbms Systems

May 29, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/9/2019 2.Dbms Systems

    1/23

  • 8/9/2019 2.Dbms Systems

    2/23

    MS Systems http://www.lightenna.com/book/export/s

    3 03/03/20

    2.1.02b. QBE visual exampleRecord advancingQuery designing

    Finite domain attributesWeb search parallel

    2.1.03. QBE text-format exampleP. print, I. insert, D. delete, U. update_VARNAME, copy field value into variable

    2.1.05. SQLStructured Query Language

    (SQL or SEQUEL)Wikipedia reference

    Success of relational databasesDeveloped for SystemR at IBM

    ANSI standardisedSQL-1986 (SQL1), ongoing extensionSQL-1992 (SQL2), current version (Oracle 9i)SQL-1999 (SQL3), regular expression matching, recursive queriesSQL-2003, XML features, auto-generated columns

    2.1.05b. SQL command syntaxWhere follows here is a brief summary

  • 8/9/2019 2.Dbms Systems

    3/23

    MS Systems http://www.lightenna.com/book/export/s

    3 03/03/20

    Oracle syntaxSimilar but not identical to MySQL/MSSQL

    General familiarityQuery writing best learnt by doing itLecture live-exampleCoursework 1 will be SQLOracle (9i) SQL referenceMySQL (5.0) SQL reference

    2.1.05c. SQL in applicationKeyword oriented languageKeywords not congruous with Relational modelLots of different ways to write SQL

    Analogous to C/Java formattingif (b==2) { a=1; } else { a=0; }

    Recommend using case to differentiate attributes and keywords

    SELECT colour, size, shape FROM fruit WHERE weight>22;Oracle user accounts on Teaching databaseNamespace references, e.g. shared.cars

    2.1.06. SQL Create schemaData definition commandsCREATE

    SCHEMA AUTHORIZATION

    or workspaceBeware of namesName collisions produce odd behaviours

    SQL Schema embraces Tables (relations), constraints, views, domains, authorizations

    2.1.07. SQL Create tableCREATE TABLE

    .(

    )CREATE TABLE example (Oracle)Tables can (and should) be indexed by usere.g. .Normal login implies usernameNon-local table access

  • 8/9/2019 2.Dbms Systems

    4/23

    MS Systems http://www.lightenna.com/book/export/s

    3 03/03/20

    2.1.08. Data types and domains (Oracle)Numeric

    ENUMNUMBER, NUMBER(i), NUMBER(i,j)

    Formatted numbers, i precision, j scale

    (number of digits total, after decimal point)Character-stringCHAR(n) - n is lengthVARCHAR2(n) - n is max

    DESCRIBE output exampleMulti-database comparison of DatatypesDatabase legacy: limited storage necessitated efficient storageDoes it need to be efficient anymore?

    You might consider all SQL types as being conceptually similar to attribute types in the relationalmodel, although in reality the implementation of these types in a DBMS only approximates the

    mathematical purity of unordered domain sets etc.

    2.1.08b. Data types and domains (MySQL)Numeric

    TINYINT, INT, INT UNSIGNEDFLOAT, DOUBLE, DECIMALENUMCharacter-string

    CHAR(n) - n is lengthVARCHAR(n) - n is maxTINYTEXT, TEXTBeware different default/maximum lengths to Oracle

    BLOBMulti-database comparison of Datatypes

    2.1.09. Time-based data typesDate and Time

    DATETen positions, components YYYY-MM-DD

    TIMEEight positions, components HH:MM:SS

    TIME(i)Time fractional seconds precisionAdds i+1 positions

    TIMESTAMPoptionally WITH TIME ZONE

    Very sensitive to syntactical ambiguities

  • 8/9/2019 2.Dbms Systems

    5/23

    MS Systems http://www.lightenna.com/book/export/s

    3 03/03/20

    day/month/year/hour/minute separators

    2.1.10. DROPingDROP DROP SCHEMA CASCADE

    drops all workspace tables, domainsDROP TABLE RESTRICTonly drops table if not referenced in any constraints/views

    Notion of cascadingTable links

    2.1.11. ALTERingSchema evolutionDesign sideALTER TABLE . ADD ;Example

    ALTER TABLE uni.student ADD hall VARCHAR(32);Upper and lower case syntaxNaming conventions

    2.1.12. QueriesHelper interfaces

    HeidiSQL/phpMyAdmin/Sword/SQLplusDesign/perform a lot of routine queries for youImportant to learn SQL, reinforcementDesigning select queries is more difficultVisual interfaces still lacking in this area

    Select queries in SQLBasic singletsRenamingQueries with JoinsNested queries

    2.1.13. SQL QueriesSELECT statementSimilar to relational data model SELECT then PROJECT

    SELECT FROM

  • 8/9/2019 2.Dbms Systems

    6/23

    MS Systems http://www.lightenna.com/book/export/s

    3 03/03/20

    WHERE ;

    2.1.14. SQL QueriesSELECT

    FROM R,S,TWHERE DNO = 10

    equivalent top (sDNO=10 (R X S X T))True-false evaluation tuple by tupleWHERE clause as compound logical statement

    2.1.15. SQL QueriesProduces a relation/set of tuplesCan be used to extract a single tuplee.g. SELECT bday, age

    FROM studentWHERE fname='Tim' AND lname='Smith'Result = (13-05-80, 20)

    Argument quoting (')SQL poisoningNot nullNot numeric values

    MySQL Attribute quoting (`)Hypothetical attribute `all`, all, and ALL

  • 8/9/2019 2.Dbms Systems

    7/23

    MS Systems http://www.lightenna.com/book/export/s

    3 03/03/20

    SQL poisoning is a vulnerability exposed by inadequate escaping of arguments/variables used to composeSQL queries.

    E.g. Tim in previous example, could be Tim'; DELETE FROM student;' SELECT * FROM student WHERE 1

    2.1.16. Renaming and referencingAS keyword(Partial) Attribute renaming in projection list

    SELECT fname AS firstName, minit, lname AS surname...Role names for relations

    SELECT S.FNAME, F.FNAME, S.LNAMEFROM STUDENT AS S, STUDENT AS FWHERE S.LNAME=F.LNAME

    (Total) Attribute renaming in FROMSELECT s.firstName, s.surname

    FROM student AS s(firstName,surname,DOB,NINO,tutor)

    Wildcards (SELECT s.* FROM...)

    2.1.17. SQL TablesRelations are bags, not sets

    e.g. projection of non-key attributesSet cannot contain duplicate item/repetitionDuplicates exist in bags and be:

    SELECT DISTINCT (eliminated)SELECT ALL (ignored/kept)

    2.1.18. Queries and JoinsRelational database allows inter-related dataSQL select FROM gives Cartesian productWHERE clause defines join condition

    SELECT proj.pnum, mgr.ssn

    FROM project AS proj, employee AS mgrWHERE proj.mgrssn = mgr.ssn;

    Alternatively, explicitly define join (note type)SELECT project.pnum, employee.ssnFROM project INNER JOIN employeeON project.mgrssn = employee.ssn;

    2.1.18b. Outer joins

  • 8/9/2019 2.Dbms Systems

    8/23

    MS Systems http://www.lightenna.com/book/export/s

    3 03/03/20

    Outer joins are crucial in the real-worldDatabases often contain NULLs (3VL)Analysis of where the crucial data is across a relationshipPrevious example, only get project data for managed projects

    SELECT project.*, employee.*FROM project INNER JOIN employeeON project.mgrssn = employee.ssn;

    2.1.18c. Outer joins (cont)Scale of loss isn't always instantly obviousNULLs often used unpredicablyMay want project information, even if no employee attached as manager

    SELECT project.*, employee.*

    FROM project LEFT OUTER JOIN employeeON project.mgrssn = employee.ssn;

  • 8/9/2019 2.Dbms Systems

    9/23

    MS Systems http://www.lightenna.com/book/export/s

    3 03/03/20

    2.1.19. 2y and 3y joinsQueries can encapsulate any number of relations

    Even one relation many times (in different roles)Relationship chainAcross many relations

    Tuples as Entities OR Relationshipse.g. Employee -> Works_on -> Project -> Department ->Manager

    2.1.20. Recursive closureCant be done in SQL2Recursive relationshipsUnknown number of stepsSQL2 cant generalise in single query

    2.1.21. Nested queriesEssential one or more (inner) queries within an (outer) queryInner and outer queryNot to be confused with inner and outer joinsInner query can go in three places

    SELECT clause (projection list)Must return a single value, then aliased as attribute in outer result

    FROM clauseInner query result used as standard table in FROM cross product

    WHERE clause

    2.1.21b. Nested query exampleUse of query result as comparator for other (outer) query

    SELECT DISTINCT courseFROM dept WHERE course IN (

    SELECT d.courseFROM dept AS d, faculty AS f, student AS sWHERE d.ownfac=f.id AND s.owndept=d.idAND f.name='Eng' AND s.year='3'

    ) OR course IN (SELECT courseFROM deptWHERE code LIKE 'COMS3%');

  • 8/9/2019 2.Dbms Systems

    10/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    2.1.22. Bridging SQL across 3 tiersThree tier database designChanging role of DBMSIndicesAggregate functions (conceptual)

    Over bags and sub-bagsCreating and updating views (ext)SQL embedding

    In this subsection we look at the different roles SQL play across the three tiers of database design. We discussthe areas in which SQL is lacking and how those difficiencies can be complemented by embedding SQL inother languages.

    2.1.25. IndicesLow/Internal levelIndex by one attributeFor queries selecting by that attribute:

    Faster tuple access (ordered tuples)Reduces database memory load

    Small cross product relation, only crosses requisitesAccelerates query resolution time

    CREATE INDEX Index_Name ON RELATION(Attribute);

    2.1.26. Aggregate functionsRun over groups of tuplesTakes a projected attribute list as an argumentProduce relation with single tupleSUM, MAX, MIN, AVG, COUNTe.g. AggFunc over all tuples

    SELECT SUM(SALARY), MAX(SALARY), MIN(SALARY)

  • 8/9/2019 2.Dbms Systems

    11/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    FROM EMPLOYEE;Single attribute lists (distinct values)Multi-attribute lists (granularity of distinct values by pairing)

    2.1.27. Aggregates over sub-bags

    Can run over subsets of tuplesGROUP BY keywordSpecifies the grouping attributesNeed to also appear in projected attr_listShow result along side value for group attre.g. AggFunc over subgroups

    SELECT dno, COUNT(*)FROM employeeGROUP BY dno

    Quick SQL check, do all attributes in the SELECT projection list appear in the GROUP BY projection list.

    2.1.28. Creating viewsViews are partial projectionsVirtual relations, or views of live relationsUpdate synchronised

    CREATE VIEW AS

    Real relation could be a query resultClever bit is the change propagationUPDATEs made to the view dataset are flooded back to relations

    INSERT and DELETE behaviour needs to be definedNon-trivial as INSERT into view (virtual relation) may leave holes in real relation

    2.1.29. Embedding SQLSQL (alone) can do lots of clever things in one expressionBut can only execute a single expressionCan structure SQL commands into proper programming languagesJava Database Connection (JDBC)

    javac, then java VMCOBOL, C or PASCAL

    precompiled with PRO*COBOL or PRO*CProcedure Language (PL/SQL)

    Oracle/MySQL procedural languageStored procedures can take parameters

    2.2. Internals

  • 8/9/2019 2.Dbms Systems

    12/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    In this lecture we look at...[Section notes PDF 180Kb]

    2.2.01. IntroductionDatabase internals (base tier)

    RAID technologyReliability and performance improvementRecord and field basicsHeaders to hashingIndex structures

    2.2.01b. Machine architecture (by distance)Distance from chip determines minimum latencySpeed of light is a constantImpact of bus frequencies

    IDE (66,100,133 Hz)PCI, PCI-X (66,100,133 Hz)PCI Express (1Ghz to 12Ghz)

    Impact of bus bandwidthsPCI (32/64 bit/cycle, 133MB/s)PCI Express (x16 8.0GB/s)

    Here's a link from Intel showing a machine architecture with signal bandwidths: Intel diagram

    2.2.01c. Machine architecture (by capacity)Capacity increased with distanceStaged architecture as compromiseSpeed, time/distance

  • 8/9/2019 2.Dbms Systems

    13/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    Also cost, heat, usage scale

    2.2.02. Database internalsStored as files of records (of data values)

    Auxiliary data structures/indices1y and 2y storage

    memory hierarchy (pyramid diagram)volatility

    Online and offline devicesPrimary file organisation, records on disk

    Heap - unorderedSorted - ordered, sequential by sort keyHashed - ordered by hash keyB-trees - more complex

    2.2.03. Disk fundamentalsDBMS task

    linked to backup1y, 2y and 3y

    e.g. DLT tapeChanging face of current technology

    Impact of inexpensive harddisksFlash memory devices (CF, USB)

    Random versus sequential accessLatency (rotational delay) andBandwidth (data transfer rate)

    2.2.04. RAID technologyRedundant Array of Independent DisksData striping

    Blocks (512 bytes), bits and transparencyReliability (1/n)

  • 8/9/2019 2.Dbms Systems

    14/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    Mirroring/shadowingError correction codes/parity

    Performance (n)Mirroring (2x read access)Multiple parallel access

    2.2.05. RAID levels0 No redundant data1 Disk mirrors (performance gain)2 Hamming codes (also detect)3 Single parity disk 4 Block level striping5 and parity/data distribution6 Reed-Soloman codes

    2.2.06. Records and fieldsDBMS specific, generallyRecords (tuples) comprise fields (attributes)File is a sequence of recordsVariable length records

    Variable length fieldsMulti-valued attributes/repeating fieldsOptional fieldsMixed file of different record types

    2.2.07. Fieldsrecords -> files -> disksFixed length for efficient accessNetworking issuesDelimit variable length fields (max)Explicit record/field lengthsSeparators (,;,:,$,?,%)Record headers and footersSpanning

    block boundaries and redundancy

    2.2.08. Primary organisationBias data manipulation to 1y memory

    Load record to 1y, write back Cache theorem

  • 8/9/2019 2.Dbms Systems

    15/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    Data storage investment, rapidity of accessoptimisations based on frequent algorithmic use

    Ordering, ordering field/key fieldHashing

    2.2.09. Indexes/indicesAuxiliary structures/secondary access pathSingle level indexes (Key, Pointer)File of recordsOrdering key fieldPrimary, Secondary and Clustering

    2.2.09b. Primary index examplePrimary index on simple tableOrdering key field (primary key) is IntegerPointers as addressesSparse, not dense

    2.2.10. Primary Index file (as pairs list)Two fields Ordering key field and pointer to block Second example, indexing candidate key Surname

    K(1)="Barnes",P(1) -> block 1Barnes record is first/anchor entry in block 1

    K(2)="Smith",P(2) -> block 6K(3)="Zeta",P(3) -> block 8

    Dense (K(i) for every record), or SparseEnforce key constraint

    2.2.10b. Clustering index example

  • 8/9/2019 2.Dbms Systems

    16/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    Clustering indexOrdering key field (OKF) is non-keyEach entry points to multiple records

    2.2.11. Clustering Index (as pairs list)

    Two fields Ordering non-key field and pointer to block Internal structure e.g. linked list of records

    Each block may contain multiple recordsK(1)="Barnes",P(1) -> block 1K(2)="Bates",P(2) -> block 2K(3)="Zeta",P(3) -> block 3

    K(i) not required to havea distinct value for each recordnon-dense, sparse

    2.2.11b. Secondary Index exampleIndependent of primary orderingCan't use block anchorsNeeds to be dense

    2.2.12. More indices

  • 8/9/2019 2.Dbms Systems

    17/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    Single level indexordered index filelimited by binary search

    Multi level indicesbased on tree data structures (B+/B-trees)

    faster reduction of search space (log fobi)

    2.2.13. IndicesDatabase architecture

    Intension/extensionIndexes separated from data file

    Created/disgraded dynamicallyTypically 2y to avoid reordering records on disk

    2.2.14. Query optimisationFaster query resolution

    improved performancelower loadhardware cost:performance ratio

    Moore's lawQuery process chainQuery optimisation

    2.2.15. Query processingCompile-track familiarity

    Scanner/tokeniser - break into tokensParser - semantic understanding, grammarValidated - check attribute names

    Query treeExecution strategy, heuristic

    Query optimisationIn (extended relational) canonical algebra form

    2.2.16. Query optimisationSQL query

    SELECT lname, fnameFROM employeeWHERE salary > (

    SELECT MAX(salary)FROM employee

  • 8/9/2019 2.Dbms Systems

    18/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    WHERE dno=5);

    Worst-caseProcess inner for each outer

    Best-baseCanonical algrebraic form

    2.2.16b. Query optimisation implementationIndexing accelerates query resolutionClosed comparison (intra-tuple)

    all variables/attributes within single tuplee.g. x < 100

    Open comparison (inter-tuple)variables span multiple tuples

    Essentially a sorting problemInternal sorting covered (pre-requisites)

    Need external sort for non-cached lists

    2.2.17. Query optimisationExternal sorting

    Stems from large disk (2y), small memory (1y)Sort-merge strategy

    Sort runs (small sets of total data file)Then merge runs back together

    Used inSELECT, to accelerate selection (by index)PROJECT, to eliminate duplicatesJOIN, UNION and INTERSECTION

    2.3. B-treesIn this lecture we look at...[Section notes PDF 159Kb]

    2.3.01. Hash tablesUsed to implement IndiciesO(n) accessOrdering Key Field (K) as argument to Hash function H()Address H(K) maps to pointer

  • 8/9/2019 2.Dbms Systems

    19/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    2.3.02. Tree structureTree revisionNode basedBranching nodes/leaf nodesParent/child nodesRoot nodeCardinality

    2.3.03. Multi-level indicesMulti-level indicesOne index indexes another

  • 8/9/2019 2.Dbms Systems

    20/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    Implemented by multiple hash-tables pairs(data far right)

    2.3.04. Index zippingCollapsing a single indexTwo columns become one pairs sequentially storedCommon in the Elmasri

  • 8/9/2019 2.Dbms Systems

    21/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    2.3.05. B-treeParitioning structureEach node contains keys & pointersPointers can be:

    Node pointers - to child nodesData pointers - to records in heap

    Number of keys = Number of pointers - 1Every node in the tree is identical

    2.3.06. B+ treesSimilar to B-treesDifferent types of nodes

  • 8/9/2019 2.Dbms Systems

    22/23

    MS Systems http://www.lightenna.com/book/export/s

    23 03/03/20

    Branching nodesLeaf nodes

    Each branching node has:At most U children (max U)At least L children (min L)U = 2L, or U = 2L-1

    2.3.07. Properties of B+ treesBalancedAll leaf nodes at same levelRecord search takes same time for every record

    Partitioning needs to be comprehensiveB-tree: a 1 < x < a 2B+tree: a 1

  • 8/9/2019 2.Dbms Systems

    23/23

    MS Systems http://www.lightenna.com/book/export/s