Top Banner
Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date (slides by Prof. Ghafoor, EE 562)] [Part 2 based on slides by Prof. Arge, I/O- algorithms]
38

Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

Dec 14, 2015

Download

Documents

Todd Varcoe
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

Temporal Databases

S. Srinivasa Rao

April 12, 2007

[Part 1 based on Ch23 of C.J. Date (slides by Prof. Ghafoor, EE 562)]

[Part 2 based on slides by Prof. Arge, I/O-algorithms]

Page 2: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

2

Outline

• Part 1: Introduction to temporal databases

• Part 2: Temporal index: Persistent B-tree and its applications

Page 3: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

3

Introduction

• Temporal database: a database that contains historical data as well as current data.

– Note: ‘historical’ is a misleading term – temporal databases may contain data regarding the future as well as the past.

• Extreme case: data is only inserted, never deleted from a temporal database (eg. vehicle position data in the ‘project’).

• So far, we have studied the other extreme - i.e. ‘snapshot’ databases.

• Distinguishing feature: the element of time.

Page 4: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

4

Introduction

• Temporal data: encoded representation of timestamped facts.

– Each tuple must include at least one timestamp.

– Problem:What about queries that produce results that are not temporal? i.e. result of query is outside the domain of (temporal) database.

– eg. Get names of all people who have supplied something in the past.

• Redefine temporal database: database that includes, but is not limited to, temporal data.

Page 5: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

5

Motivation

• Queries on time-varying data are difficult to express in SQL.

• Temporal databases provide build-in support for recording and querying such information.

• It is possible to use SQL to evaluate these queries, but performance is poor.

Page 6: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

6

Motivation

• Most applications manage temporal data.

• If a temporal database is used for such data:

– Schemas, including integrity constraints are simpler.

– Queries are simpler

• Application code is less complex

– easier to understand

– easier to produce

– easier to maintain

Page 7: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

7

Applications

Most applications of database technology are temporal in nature:

• Financial apps.: portfolio management, accounting & banking, stock market analysis, audit analysis

• Record-keeping apps.: personnel, medical records, inventory management, legal records (commercial laws change frequently)

• Data Warehousing: historical trends for analysis

• Scheduling apps.: airline, car, hotel reservations and project management

• Scientific apps.: weather monitoring, chemical process monitoring

Page 8: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

8

Intervals

• An interval [s,e] is a set of times from time s to time e.

– Does interval [s,e] represent an infinite set?

– Assumption: Timeline is a finite sequence of discrete, indivisible time quanta.

• Time Quanta: smallest unit of time system can represent.

• Timepoints/point: time unit considered indivisible for our purpose.

• An interval is treated as a single type, not as pair of separate values.

• Interval can be open/closed w.r.t. start point/end point.

– eg. [d04,d10],[d04,d11),(d03,d10],(d03,d11)

all represent the sequence of days from day4 to day10 inclusive.

Page 9: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

9

Operators on Intervals• Temporal predicate operators:

i1 = [s1,e1]; i2 = [s2,e2]

– i1 BEFORE i2

(e1<s2)

– i1 MEETS i2

(s2 = e1)

– i1 EQUALS i2

(s1 = s2 AND e1 = e2)

– i1 OVERLAPS i2

(s2 < s1 < e2 OR s1 < s2 < e1)

i1

i1

i1

i1

i2

i2

i2

i2

Page 10: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

10

Operators on Intervals

– i1 DURING i2

(s2 < s1 AND e2 > e1 )

– i1 STARTS i2

(s1 = s2 AND e1 < e2)

– i1 FINISHES i2

(e1 = e2 AND s1 > s2)

• Additional operators:

– i1 MERGES i2: (i1 MEETS i2 OR i1 OVERLAPS i2)

– i1 CONTAINS i2: (i2 DURING i1)

i1

i2

i1

i1

i2

i2

Page 11: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

11

Scalar and Relational Operators • DURATION(i) - returns the number of time points in i

– eg. DURATION ([d03,d07]) returns 5

• i1 UNION i2

– returns [MIN(s1,s2),MAX(e1,e2) ]

if (i1 MERGES i2)

otherwise undefined

• i1 INTERSECT i2

– returns [MAX(s1,s2),MIN(e1,e2)]

if (i1 OVERLAPS i2)

otherwise undefined

Page 12: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

12

Aggregate Operators• EXPAND(X):

Where X is a set. The output is also a set.

Used to generate time quantum intervals.

– The expanded form of X is the set of all intervals of the form [p,p] where p is a time point in some interval in X.

• e.g.:

– X1 = { [d01,d01],[d03,d05],[d04,d06] }

– X2 = { [d01,dp1],[d03,d04],[d05,d05],[d05,d06] }

– X3 = { [d01,d01],[d03,d03],[d04,d04],[d05,d05],[d06,d06] }

– Then EXPAND(X1) = EXPAND(X2) = X3

Page 13: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

13

Aggregate Operators• COLLAPSE(X):

The collapsed form of X is the set Y of intervals of the same type such that– (a) X & Y have the same unfolded form.– (b) no two distinct members i1 and i2 of Y are such that

(i1 MERGES i2) is true.

• e.g.:– X1 = { [d01,d01],[d03,d05],[d04,d06] }– X2 = { [d01,d01],[d03,d04],[d05,d05],[d05,d06] }– X3 = { [d01,d01],[d03,d06] }

– Then COLLAPSE (X1) = COLLAPSE (X2) = X3

Page 14: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

14

Relation Operators InvolvingIntervals

• PACK r on A: groups the relation r by all its attributes apart from A

This is equivalent to

WITH ( r GROUP {A} AS X ) AS R1

( EXTEND R1 ADD COLLAPSE (X) AS Y )

{ALL BUT X } AS R2 :

R2 UNGROUP Y

• UNPACK r on A:

Replace COLLAPSE with EXPAND in PACK.

Page 15: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

15

Example

S# P# During

S1 P1 [d04,d10]

S1 P7 [d05,d10]

S1 P3 [d09,d10]

S1 P5 [d06,d10]

S2 P1 [d02,d04]

S2 P9 [d03,d03]

S2 P1 [d08,d10]

S2 P5 [d09,d10]

S3 P1 [d08,d10]

S4 P2 [d06,d09]

S4 P5 [d04,d08]

S4 P7 [d05,d10]

SP

S# During

S1 [d04,d10]

S2 [d02,d04]

S2 [d07,d10]

S3 [d03,d10]

S4 [d04,d10]

S5 [d02,d10]

S

Given two temporal relations:

S: Supplier S# was under contract during the interval During

SP: Supplier S# was able to supplypart P# during the interval During

Page 16: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

16

Example 1

• Active supplier intervals: Get S#-DURING pairs for suppliers who have been able to supply at least one part during at least one interval of time, where DURING designates such an interval.

• PACK SP {S#,DURING} ON DURING

S# P# During

S1 P1 [d04,d10]

S1 P7 [d05,d10]

S1 P3 [d09,d10]

S1 P5 [d06,d10]

S2 P1 [d02,d04]

S2 P9 [d03,d03]

S2 P1 [d08,d10]

S2 P5 [d09,d10]

S3 P1 [d08,d10]

S4 P2 [d06,d09]

S4 P5 [d04,d08]

S4 P7 [d05,d10]

SP

S# During

S1 [d04,d10]

S2 [d02,d04]

S2 [d08,d10]

S3 [d08,d10]

S4 [d04,d10]

RESULT

Page 17: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

17

Example 2

• Inactive (passive) supplier intervals: Get S#-DURING pairs for suppliers who have been unable to supply any parts at all during at least one interval of time, where DURING designates such an interval.

• PACK

( ( UNPACK S {S#,DURING} ON DURING )

MINUS

( UNPACK SP {S#,DURING} ON DURING ) )

ON DURING

• Shorthand: U_MINUS

S# During

S2 [d07,d07]

S3 [d03,d07]

S5 [d02,d10]

RESULT

Page 18: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

18

More Relational Operators• USING ( AList ) ◄ r1 op r2 ► is a shorthand for:

PACK

( ( UNPACK r1 on (AList) ) op ( UNPACK r1 on (AList) ) )

ON (AList)

Where op is either UNION, INTERSECT, MINUS or JOIN

• Various comparison operators on relations are defined similarly.

USING ( AList ) ◄ r1 rel-op r2 ► is equivalent to

( ( UNPACK r1 on (AList) ) rel-op ( UNPACK r1 on (AList) ) )

Page 19: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

19

Part 2

Persistent B-treesand applications

Page 20: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

20

Persistent B-tree• In some applications we are interested in being able to access

previous versions of data structure

– Databases

– Geometric data structures

• Partial persistence:

– Update the current version (getting a new version)

– Query all versions

• We would like to have partial persistent B-tree with

– O(N/B) space – N is number of updates performed

– update

– query in any version)(log BT

B NO )(log NO B

Page 21: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

21

Persistent B-tree• East way to make B-tree partial persistent

– Copy structure at each operation

– Maintain “version-access” structure (B-tree)

• Good query in any version, but

– O(N/B) I/O update

– O(N2/B) space

)(log BT

B NO

i i+2i+1

update

i+3i i+2i+1

Page 22: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

22

Persistent B-tree• Idea: Elements augmented with “existence interval” and stored in

one structure

• Persistent B-tree with parameter b:

– Directed graph

* Nodes contain elements augmented with existence interval

* At any time t, nodes with elements alive at time t form B-tree with leaf and branching parameter b (i.e., each node/leaf has at least b/4 and at most b children/keys in them)

– B-tree with leaf and branching parameter b on indegree 0 nodes

If b=B: Query at any time t in I/Os)(log B

TB NO

Page 23: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

23

Persistent B-tree: Updates• Updates performed as in B-tree

• To obtain linear space we maintain new-node invariant:

– New node contains between and alive elements and no dead elements

B83 B8

7

B41 B

87B

83 B

B81 B

81B

21

Page 24: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

24

B41 B

87B

83 B

Persistent B-tree Insert• Search for relevant leaf u and insert new element

• If u contains B+1 elements: Block overflow

– Version split:

Mark u dead and create new node u’ with x alive element

– If : Strong overflow

– If : Strong underflow

– If then recursively update parent(u):

Delete (persistently) reference to u and insert reference to u’

B41 B

87B

83 B

Bx 87

Bx 83

BxB 87

83

Page 25: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

25

Persistent B-tree Insert• Strong overflow ( )

– Split u into u’ and u’’ with elements each ( )

– Recursively update parent(u):

Delete reference to u and insert reference to v’ and v’’

• Strong underflow ( )

– Merge x elements with y live elements obtained by version split on sibling ( )

– If then (strong overflow) perform split into nodes with (x+y)/2 elements each ( )

– Recursively update parent(u): Delete two insert one/two references

B41 B

87B

83 BB

41 B

87B

83 BB

41 B

87B

83 B

2x

B41 B

87B

83 B

BB x2

128

3 Bx 8

7

ByxB 811

21

Byx 87

Bx 83

ByxB 1611

167 2/)(

Page 26: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

26

Persistent B-tree Delete• Search for relevant leaf u and mark element dead

• If u contains alive elements: Block underflow

– Version split:

Mark u dead and create new node u’ with x alive element

– Strong underflow ( ):

Merge (version split) and possibly split (strong overflow)

– Recursively update parent(u):

Delete two references insert one or two references

Bx 41

B41 B

87B

83 B

B81 B

81B

21

Bx 83

Page 27: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

27

Persistent B-tree

B41 B

87B

83 B

B81 B

81B

21

Insert Delete

doneBlock overflow Block underflow

done

Version split Version split

Strong overflow Strong underflow

MergeSplit

done

done

Strong overflow

Split

done

-1,+1

-1,+2

-2,+2

-2,+1

0,0

Page 28: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

28

Persistent B-tree Analysis• Update:

– Search and “rebalance” on one root-leaf path

• Space: O(N/B)

– At least updates in leaf in existence interval

– When leaf u dies

* At most two other nodes are created

* At most one block over/underflow one level up (in parent(u))

– During N updates we create:

* leaves

* nodes i levels up

blocks B

41 B

87B

83 B

B81 B

81B

21

)(log NO B

B81

)()( BN

iB

N OO i )( iB

NO

)( BNO

Page 29: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

29

Summary/Conclusion: Persistent B-tree• Persistent B-tree

– Update current version

– Query all versions

• Efficient implementation obtained using existence intervals

– Standard technique

• During N operations

– O(N/B) space

– update

– query)(log BT

B NO )(log NO B

Page 30: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

30

• Problem:

– Maintain N intervals with unique endpoints dynamically such that stabbing query with point x can be answered efficiently

• As in (one-dimensional) B-tree case we are interested in

– space

– update

– query

Interval Management

)(log BT

B NO )(log NO B

)( BNO

x

Page 31: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

31

Interval Management: Static Solution• Sweep from left to right maintaining persistent B-tree

– Insert interval when left endpoint is reached

– Delete interval when right endpoint is reached

• Query x answered by reporting all intervals in B-tree at “time” x

– space

– query

– construction using buffer technique

• Dynamic with insert bound using logarithmic method

x

)(log BT

B NO )( B

NO

)(log2 NO B

)log( NO BBN

Page 32: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

32

Internal Memory Logarithmic Method Idea• Given (semi-dynamic) structure D on set V

– O(log N) query, O(log N) delete, O(N log N) construction

• Logarithmic method:

– Partition V into subsets V0, V1, … Vlog N, |Vi| = 2i or |Vi| = 0

– Build Di on Vi

* Delete: O(log N)

* Query: Query each Di O(log2 N)

* Insert: Find first empty Di and construct Di out of

elements in V0,V1, … Vi-1

– O(2i log 2i) construction O(log N) per moved element

– Element moved O(log N) times amortized

..................................

0 2222 1 2 log N

iij

j 221 10

)(log2 NO

Page 33: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

33

iij

j BB 101

External Logarithmic Method Idea

)(log2 NO B ..................................

0 BBBB 1 2 log NB

ij

ij BV0

10

1ij

ij BV

)(log2 NO B

• Decrease number of subsets Vi

to logB N to get query

• Problem: Since there are not enough elements in V0,V1, … Vi-1 to build Vi

• Solution: We allow Vi to contain any number of elements Bi

– Insert: Find first Di such that and construct new

Di from elements in V0,V1, … Vi

* We move elements

* If Di constructed in O((|Vi|/B)logB |Vi|) = O(Bi-1logB N) I/Os every moved element charged O(logB N) I/Os

* Element moved O(logB N) times amortized

Page 34: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

34

External Logarithmic Method Idea• Given (semi-dynamic) linear space external data structure with

– I/O query

– I/O construction

(– I/O delete)

• Linear space dynamic data structure with

– I/O query

– I/O insert amortized

(– I/O delete)

• Dynamic interval management

– I/O query

– I/O insert amortized

)(log BT

B NO )log( NO BB

N

)(log NO B

)(log2B

TB NO

)(log2 NO B

)(log NO B

)(log2B

TB NO

)(log2 NO B x

Page 35: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

35

Planar Point Location• Static problem:

– Store planar subdivision with N segments on disk such that region containing query point q can be found I/O-efficiently

• We concentrate on vertical ray shooting query

– Segments can store regions it bounds

– Segments do not have to form subdivision

• Dynamic problem:

– Insert/delete segments

(we will not discuss this)

q

Page 36: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

36

Static Solution• Vertical line imposes above-below order on intersected segments

• Sweep from left to right maintaining

persistent B-tree on above-below order

– Left endpoint: Insert segment

– Right endpoint: Delete segment

• Query q answered by successor query on B-tree at time qx

– space

– query)(log BT

B NO )( B

NO

q

Page 37: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

37

Static Solution

• Note: Not all segments comparable!

– Have to be careful about what we compare

• Problem: Routing elements in internal nodes of leaf oriented B-trees

– Luckily we can modify persistent B-tree to use regular (live) elements as routing elements

• However, buffer technique construction cannot be used

• Only I/O construction algorithm

• Cannot be made dynamic using logarithmic method

q

)log( NNO B

Page 38: Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date ( slides by Prof. Ghafoor, EE 562 )] [Part 2 based on slides by Prof.

38

References

• External Memory Geometric Data Structures

Lecture notes by Lars Arge.

– Section 1-4

• I/O-efficient Point Location using Persistent B-trees

– Lars Arge, Andrew Danner and Sha-Mayn Teh