Top Banner
Performance Analysis of Temporal Queries (Information Sciences #49, 1989) (Information Sciences #49, 1989) by by Ilsoo Ahn Ilsoo Ahn , , AT&T Bell Laboratories, Columbus, Ohio AT&T Bell Laboratories, Columbus, Ohio and and Richard Snodgrass Richard Snodgrass Dept. of Computer Science Dept. of Computer Science , , University of University of Arizona Arizona Communicated by Ahmed Elmagarmid Communicated by Ahmed Elmagarmid ~ * ~ ~ * ~ Presented by Barry Klein for CS-599, Presented by Barry Klein for CS-599, 10/26/2000 10/26/2000
32

Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Dec 31, 2015

Download

Documents

Rosalyn Foster
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Performance Analysis of Temporal Queries

(Information Sciences #49, 1989)(Information Sciences #49, 1989)

by by Ilsoo AhnIlsoo Ahn, , AT&T Bell Laboratories, Columbus, OhioAT&T Bell Laboratories, Columbus, Ohio

and and Richard SnodgrassRichard SnodgrassDept. of Computer ScienceDept. of Computer Science, , University of ArizonaUniversity of Arizona

Communicated by Ahmed ElmagarmidCommunicated by Ahmed Elmagarmid

~ * ~~ * ~

Presented by Barry Klein for CS-599, 10/26/2000Presented by Barry Klein for CS-599, 10/26/2000

Page 2: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Abstract

Temporal databases that maintain history data Temporal databases that maintain history data add historical queries and rollback operations add historical queries and rollback operations to conventional db’s. This paper proposes a to conventional db’s. This paper proposes a model for analyzing the performance of model for analyzing the performance of temporal queries over a range of access temporal queries over a range of access methods.methods.

Page 3: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Abstract, continued

ModelModel: : 4 transformations through a series of 4 transformations through a series of formal expressions common to all phases of query formal expressions common to all phases of query processing.processing.

InputInput: : Temporal Query + DB schemaTemporal Query + DB schema OutputOutput: : Estimated I/O cost for it.Estimated I/O cost for it. ValidationValidation: : Compare estimated cost from model Compare estimated cost from model

with actual cost from a prototype.with actual cost from a prototype.

Page 4: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Introduction

Factors affecting performance of a Factors affecting performance of a Temporal DBMSTemporal DBMS::

Access methods availableAccess methods available Query-processing strategiesQuery-processing strategies Size and composition of the dataSize and composition of the data

Page 5: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Introduction, continued

Methods for describing TDBMS effectiveness:Methods for describing TDBMS effectiveness: Empirical approach – actual performance is Empirical approach – actual performance is

measured. measured. AdvantageAdvantage: Results are reliable.: Results are reliable. Analytical approach – develop a math model Analytical approach – develop a math model

of the performance, which can predict of the performance, which can predict performance in controlled context. performance in controlled context. AdvantageAdvantage: less effort, but results are : less effort, but results are questionable.questionable.

Page 6: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Introduction, continued

Three orthogonal types of Time:Three orthogonal types of Time: Valid time, Transaction time, User-definedValid time, Transaction time, User-defined

4 categories of DBs defined in terms of4 categories of DBs defined in terms of support for valid/transaction time:support for valid/transaction time:

Snapshot – conventional, no temporal support.Snapshot – conventional, no temporal support. Rollback – support transaction time.Rollback – support transaction time. Historical – support valid time (real-world history)Historical – support valid time (real-world history) Temporal – support both valid and txn time.Temporal – support both valid and txn time.

Page 7: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Introduction, continued

TQuel—TQuel—non-proceduralnon-procedural language based on language based on tupletuple calculus—is chosen here to express calculus—is chosen here to express historical queries and rollback operations:historical queries and rollback operations:

Augments Augments retrieveretrieve statement with statement with whenwhen predicate predicate – temporal relations among tuples.– temporal relations among tuples.

ValidValid clause specifies how implicit time attributes clause specifies how implicit time attributes are computed for result tuples.are computed for result tuples.

Rollback operations implemented with Rollback operations implemented with as ofas of clause (in either rollback or temporal db’s).clause (in either rollback or temporal db’s).

Page 8: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Introduction, continued

Temporal relations used Temporal relations used with the added with the added constructs:constructs:

PrecedePrecede OverlapOverlap ExtendExtend Begin ofBegin of End ofEnd of

TQuel augments valid TQuel augments valid and when clauses to:and when clauses to:

AppendAppend DeleteDelete ReplaceReplace

CreateCreate statement statement supported for supported for temporal relations.temporal relations.

Page 9: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

The New Model

Performance analysis based on these givens:Performance analysis based on these givens: A set of temporal queriesA set of temporal queries Some query-processing/optimization strategySome query-processing/optimization strategy File structure(s) to implement the TDBFile structure(s) to implement the TDB A set of parameters characterizing the storage A set of parameters characterizing the storage

devices.devices.

Page 10: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

The New Model, continued

Assumptions and decisions for this model:Assumptions and decisions for this model: Disk I/O traffic is used as measurement Disk I/O traffic is used as measurement

key: ~proportional to performance;key: ~proportional to performance; Inputs must be flexible;Inputs must be flexible; Resulting estimate must be accurate.Resulting estimate must be accurate.

Page 11: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

The New Model, continued

The 4 transformations of the model use:The 4 transformations of the model use: Algebraic expressions;Algebraic expressions; File-primitive expressions;File-primitive expressions; Access-path expressions.Access-path expressions.

Page 12: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

The Algebraic Expression

Since TQuel is non-procedural, the algebraic Since TQuel is non-procedural, the algebraic expression is defined first:expression is defined first:

Algebraic operatorsAlgebraic operators Conventional: Conventional: select, project, join, union, differenceselect, project, join, union, difference Temporal: Temporal: when, as ofwhen, as of Auxiliary: Auxiliary: temporary, sort, reformattemporary, sort, reformat

Page 13: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Conventional Algebraic Operators

SelectSelect – – has a relations and a predicate to specify has a relations and a predicate to specify constraint that result tuples must satisfy.constraint that result tuples must satisfy.

ProjectProject – – parameters are a relation and a set of parameters are a relation and a set of attributes to be extracted from the relation.attributes to be extracted from the relation.

JoinJoin – – performs a theta-join of 2 relations, given performs a theta-join of 2 relations, given the first 2 parameters; 3the first 2 parameters; 3rdrd parm is parm is join methodjoin method, 4, 4thth is combining-method predicate.is combining-method predicate.

UnionUnion – – set addition on 2 relations.set addition on 2 relations. DifferenceDifference – – set subtraction on 2 relations. set subtraction on 2 relations.

Page 14: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Temporal Algebraic Operators

WhenWhen – – performs temporal selection on a relation performs temporal selection on a relation according to a temporal predicate on the values of according to a temporal predicate on the values of valid time attributes.valid time attributes.

AsOfAsOf – – similar, but compares 2 time constants similar, but compares 2 time constants with transaction-time attribute values.with transaction-time attribute values.

ValidValid – – performs temporal projection performs temporal projection the the values of the valid time attributes. (It might values of the valid time attributes. (It might perform similarly to perform similarly to projectproject.).)

Page 15: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Auxiliary Algebraic Operators

Operations that don’t change the query result Operations that don’t change the query result but affect the query cost.but affect the query cost.

TemporaryTemporary – – create or access a temporary create or access a temporary relation for the result of its parameter’s operation.relation for the result of its parameter’s operation.

SortSort – – tuples in the rel sorted by 1tuples in the rel sorted by 1stst parm, with parm, with remaining parms as key sort attributes.remaining parms as key sort attributes.

ReformatReformat – changes the structure of the – changes the structure of the relation relation 11stst parm, to form of 2 parm, to form of 2ndnd parm , with parm , with remaining parms as key sort attributes.remaining parms as key sort attributes.

Page 16: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

TQuel Algebraic xform’s: Example 1

range ofrange of h h isis relation_h relation_h

retrievretriev (h, id, h.seq) (h, id, h.seq) wherewhere h.id = 500 h.id = 500

is mapped to:is mapped to:

{L1:{L1: SelectSelect (h, h.id=500); (h, h.id=500);

ProjectProject (L1, h.id, h.seq) } (L1, h.id, h.seq) }

Selects id=500 from rel_h, then extracts attribs id & Selects id=500 from rel_h, then extracts attribs id & seq from L1, the result of the previous operation. seq from L1, the result of the previous operation. The “;” forces sequential execution.The “;” forces sequential execution.

Page 17: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Example 1, continued

The same expression can be mapped instead The same expression can be mapped instead to:to:

{[ L1:{[ L1: SelectSelect (h, h.id=500); (h, h.id=500);

ProjectProject (L1, h.id, h.seq) ]} (L1, h.id, h.seq) ]}

The “[]” eliminates need for temporary file for The “[]” eliminates need for temporary file for intermediate results.intermediate results.

Page 18: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

TQuel Algebraic xform’s: Example 2

range ofrange of h h isis relation_h relation_h

range ofrange of i i isis relation_i relation_i

retrievretriev (h.id, h.id, id.amount) (h.id, h.id, id.amount)

wherewhere h.id = id.amount h.id = id.amount

whenwhen h overlap i and i h overlap i and i

overlap “now”overlap “now”

is mapped to 2 different is mapped to 2 different algebraic expressions:algebraic expressions:

{L1: {L1: JoinJoin (h, I, TS, h.id= (h, I, TS, h.id= i.amount i.amount andand h h overlapoverlap i); i);

ProjectProject (L1, h.id, h.seq) } (L1, h.id, h.seq) }

L2: L2: whenwhen (L1, i (L1, i overlapoverlap “now”); “now”);

ProjectProject (L2, h.id,i.id, (L2, h.id,i.id,

i.amount) }i.amount) }

Specifies Join using tuple substi- Specifies Join using tuple substi- tution (TS) of rel’s h & i.tution (TS) of rel’s h & i.

Page 19: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Example 2, continued

The original expression:The original expression:

range ofrange of h h isis relation_h relation_h

range ofrange of i i isis relation_i relation_i

retrievretriev (h.id, h.id, (h.id, h.id, id.amount) id.amount)

wherewhere h.id = id.amount h.id = id.amount

whenwhen h h overlapoverlap i i andand i i

overlapoverlap “now” “now”

is also mapped to:is also mapped to:

{[ L1: {[ L1: WhenWhen (i, i (i, i overlapoverlap now”); now”);

L2: L2: ProjectProject (L1, i.id, (L1, i.id,

i.amount, i.valid_from, i.amount, i.valid_from,

i.valid_to) ]}i.valid_to) ]}

L3: L3: TemporaryTemporary (L2); (L2);

[L4: [L4: JoinJoin (h, L3, TS, h.id= (h, L3, TS, h.id=

i.amount i.amount andand h h overlapoverlap I);I);

ProjectProject (L4, h.id, i.amount) ]} (L4, h.id, i.amount) ]}

Equivalent to prev example, but Equivalent to prev example, but performs much more efficientlyperforms much more efficiently

Page 20: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Xform to File Primitive Expression

The 2 primitives, The 2 primitives, ReadRead and and WriteWrite, take parms:, take parms: Access method - Heap, Hash, Isam or Btree;Access method - Heap, Hash, Isam or Btree; File sizeFile size Length of overflow chainLength of overflow chainAn FPE combines primitives to repeat or execute An FPE combines primitives to repeat or execute

together to perform an algebraic operation.together to perform an algebraic operation.

The simple example FPE-1: The simple example FPE-1: ReadRead (Hash, 0) (Hash, 0) specifies one hashed access with no overflow specifies one hashed access with no overflow records.records.

Page 21: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

File Primitive Expression, example 2

FPE-2:FPE-2: ReadRead (Heap, 128) (Heap, 128) ++

( ( ReadRead (Heap, 19) * 2 - 1 (Heap, 19) * 2 - 1 ++

WriteWrite (Heap, 19) * 3 - 1 ) (Heap, 19) * 3 - 1 ) ++

ReadRead (Heap, 19) (Heap, 19) ++

ReadRead (Hash, 0) * 1024 (Hash, 0) * 1024

This indicates one This indicates one ReadRead from the 128-block heap, 2 from the 128-block heap, 2 ReadRead s from 19 blocks, 3 s from 19 blocks, 3 WriteWrites to the 19-block s to the 19-block heap, and a hashed access on a file with no heap, and a hashed access on a file with no overflow records, iterated 1024 times.overflow records, iterated 1024 times.

Page 22: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Characteristics of DB Relations

Transforming alg expressions to FPE, need:Transforming alg expressions to FPE, need: Relation namesRelation names Temporal typeTemporal type Storage structuresStorage structures Attribute counts, names, formats, lengthsAttribute counts, names, formats, lengths Key attributesKey attributes Tuple lengths and countsTuple lengths and counts Selectivity & distribution of attribute valuesSelectivity & distribution of attribute values Data volatilityData volatility Update count (particularly for TDB)Update count (particularly for TDB)

Page 23: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Steps of Transformation

For each algebraic operator, substitute file For each algebraic operator, substitute file primitive(s) with the particular DB parameters.primitive(s) with the particular DB parameters.

Omit any algebraic operation that can be Omit any algebraic operation that can be performed simultaneously with another operation.performed simultaneously with another operation.

Identify basic constructs in temporal queries.Identify basic constructs in temporal queries. Transform the subset of algebraic expressions Transform the subset of algebraic expressions

(composed of these constructs) to FPEs.(composed of these constructs) to FPEs.

Page 24: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Access Path Expression

APEAPE: the path through the storage structure which : the path through the storage structure which satisfies an FPE access request.satisfies an FPE access request.

NodeNode: physically contiguous record(s) involved in : physically contiguous record(s) involved in the access.the access.

AccessAccess (read or write) of a tuple: traverses node(s). (read or write) of a tuple: traverses node(s).

Access pathAccess path: a set of nodes connected (in)directly; : a set of nodes connected (in)directly; also, also, set of chainsset of chains..

ChainChain: a group of nodes.: a group of nodes.

Page 25: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Access Path Expression Modes

GuidedGuided if there’s a random-access location if there’s a random-access location mechanism:mechanism: HH: address is computed by a hash function;: address is computed by a hash function; PP: there’s a pointer to the address;: there’s a pointer to the address; AA: component follows adjacently;: component follows adjacently; SS: component shares starting address with its : component shares starting address with its

parent;parent; MM: the component is in main memory.: the component is in main memory.

SearchedSearched otherwise: otherwise: O: file is ordered, enabling log search;O: file is ordered, enabling log search; U: unordered - requires sequential search.U: unordered - requires sequential search.

Page 26: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

APE Subcomponent Parameters

f = f = number of records in a filenumber of records in a file bb = = number of records in a blocknumber of records in a block rr = = number of bytes in a recordnumber of bytes in a record nn = = number of records to be accessed.number of records to be accessed.

Page 27: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Inverted & Multi-List File structures

Page 28: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

APE for Inverted Files

ReadRead (Inverted, 3): (Inverted, 3):

(P 3 (P 1 (S 1)) (P 1 (S 1)) (P 1 (S 1)))(P 3 (P 1 (S 1)) (P 1 (S 1)) (P 1 (S 1)))

The head of the path is located by a pointer; it contains a key The head of the path is located by a pointer; it contains a key value and 3 chains, each of which is also located by a ptr; value and 3 chains, each of which is also located by a ptr; each has one node, which shares the same address with the each has one node, which shares the same address with the chain, and contains one record. The expression abbreviates:chain, and contains one record. The expression abbreviates:

(P 3 (P 1 (S 1))(P 3 (P 1 (S 1))

Page 29: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

APE for Multilist Files

ReadRead (Multilist, 3): (Multilist, 3):

(P 1 (P 3 (S 1) (P 1) (P 1)))(P 1 (P 3 (S 1) (P 1) (P 1)))

The head of the path is located by a pointer; it contains1 chain The head of the path is located by a pointer; it contains1 chain which is also located by a ptr, and has 3 nodes, each of which is also located by a ptr, and has 3 nodes, each of which contains one record. The first node has the same which contains one record. The first node has the same address as the chain, and next nodes via pointers. Since the address as the chain, and next nodes via pointers. Since the 2nd and 3rd nodes are identical, the expression abbreviates:2nd and 3rd nodes are identical, the expression abbreviates:

(P 1 (P 3 (S 1) (P 1)))(P 1 (P 3 (S 1) (P 1)))

Page 30: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Transform FPE Access Cost

Parse the APE and determine the access cost in Parse the APE and determine the access cost in terms of the random and the sequential access terms of the random and the sequential access counts.counts.

The avg access count for each componentThe avg access count for each component est’ed re est’ed re the component-location mode (see above)the component-location mode (see above)

The total access count for an APE = The total access count for an APE = of all its of all its components, each multiplied by the corresponding components, each multiplied by the corresponding value of value of count.count.

Ex:Ex: the APE (H 1 (P 28 (S 1) (P 1))) has a random the APE (H 1 (P 28 (S 1) (P 1))) has a random access count of 1 + 28 (0+1) = 29access count of 1 + 28 (0+1) = 29

Page 31: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Access-Time Calculations

The time elapsed to access disk blocks The time elapsed to access disk blocks requires modeling the characteristics of requires modeling the characteristics of storage devices. Some of the criteria are:storage devices. Some of the criteria are:

Type of mediaType of media Fixed or moving headsFixed or moving heads R/w or write-onceR/w or write-once Seek time and transfer rateSeek time and transfer rate Number and size of cylinders, tracks and sectorsNumber and size of cylinders, tracks and sectors Block size of DBMS Block size of DBMS vsvs page size of op system. page size of op system.

Page 32: Performance Analysis of Temporal Queries (Information Sciences #49, 1989) by Ilsoo Ahn, AT&T Bell Laboratories, Columbus, Ohio and Richard Snodgrass Dept.

Performance Analysis Summary

The steps are:The steps are: Examine TQuel query to decide processing strategyExamine TQuel query to decide processing strategy Transform it into an algebraic expressionTransform it into an algebraic expression Break down in terms of characteristics of DB/rel’sBreak down in terms of characteristics of DB/rel’s Transform into FPE, and then into APETransform into FPE, and then into APE Analyze for characteristics of storage devicesAnalyze for characteristics of storage devices Compute I/O costsCompute I/O costs Select and execute a validation method Select and execute a validation method