Top Banner
1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD
32

1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Mar 31, 2015

Download

Documents

Marisa Vine
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

1

Incremental Validation of XML Databases

Yannis Papakonstantinou

Victor VianuComputer Science & Eng, UCSD

Page 2: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Incremental Validation of XML Databases:

XMLDatabase

Document Type Definition (DTD)

XML Schema/ XQuery Type

System

Updates

O(log n)

O(log2n)

n nodes

Page 3: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

XML As Labeled Ordered Trees

cars

used new

car car car car

year model year model model

92 Civic 96 Acura

model

Civic Maxima

year

03

Page 4: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Document Type Definitions (DTDs): Abstraction & Example

cars

used new

car car car

year model year model model

root : carscars used newused car*new car*car (year|) model

car

modelyear

92 Civic 96 Acura Civic Maxima03

Page 5: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Tree Satisfying DTD, General Case

1 2 ii-1 i+1 k-1 k… …

1 2 k-1 k…

…a b c

root : … r

r

Page 6: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

XML Schemas/XQuery Types as Specialized DTDs

cars

used new

car car car

year model year model model

root : carsT

carsT usedT newT

usedT carU *newT carN *carU yearT modelT

carN (yearT |) modelT

car

modelyear

usedT

newT

carsT

carU carNcarU, carN

modelTyearTyearTmodelT modelT modelTyearT

LABEL TYPEScar {carU, carN}cars {carsT}used {usedT} …

Page 7: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Tree Automata Specialized DTDs

cars

used new

car car car

year model year model model

car

modelyear

usedT

newT

carsT

carU,carN carN

carU,carN

carU,carN

modelTyearTyearTmodelT modelT modelTyearT

Page 8: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Incremental Validation Problem Statement

For each valid tree T use an auxiliary structure A(T)

so that,given a series of update commands

• efficiently decide if the updated tree T’ is valid

• efficiently update A(T) and T

Page 9: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Types of Updates: Node Renaming u(v, )

1 2 ii-1 i+1 k-1 k… …

r

1 2 k-1 k…

…a b c

v

Page 10: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Types of Updates: Deletion d(v)

1 2 i-1 i+1 k-1 k… …

r

…a b c

i

1 2 k-1 k…

v

Page 11: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Types of Updates: Insertion

1 2 i-1 i+1 k-1 k… …

r

…a b c

vi+1

i

insert_after(vi-1, i)

vi-1

Page 12: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Validating a Renaming u(i, ) on a Regular String of N : Take One

12 ii-1 i+1 n-1 n

… N…

Validation of one update in O(1) given

precomputedPre and Post

Post(i+1)

Pre(i-1)

u(i, ) requires recomputation of Pre(i),

Pre(i+1), … and of Post(i), Post(i-1), …

q0 1

2 i-1

qF

n

n-1i+1 …

q0

1

2 i-1

Page 13: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Transition Relation Definition

12 i j n-1 n

… …… …m

Ti,j = { (q, q’) | }

i+1

q i…i+1

q’j

m+1

Ti,j = Ti,m Tm+1,j

Page 14: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Transition Relation Trees

1 2 3 4 5 6 7 8

T5,8T1,4

T3,4T1,2 T5,6 T7,8

T1,1 T2,2

T3,3 T4,4

T5,5 T6,6

T7,7 T8,8

T1,8

Page 15: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Maintenance of the Structure and Validation in O(log n)

1 2 3 4 5 6 7 8

T1,1 T2,2 T3,3 T4,4 T5,5 T6,6 T7,7 T8,8

T1,2 T3,4 T5,6 T7,8

T5,8T1,4

T1,8

u(6, )

If (q0, qF) then valid

T6,6

T5,6

T5,8

T1,8

Page 16: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions

1

2

3

5

6

7

9

T1 T2 T3 T5 T6 T7 T9

Ta Tb Tc

Ta = T1 T2

If (q0, qF) Ta Tb Tc then valid

Page 17: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions

1

2

3

5

6

7

9

8

T1 T2 T3 T5 T6 T7 T8 T9

Ta Tb Tc

Page 18: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions

1

2

3

5

6

4

7

9

8

T1 T2T7 T8 T9

Ta Tb Tc

T3 T5 T6

Page 19: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions

T3 T4 T5 T6

1

2

3

5

6

4

7

9

8

T1 T2T7 T8 T9

Ta Tb Tc

Page 20: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions

Ta Td Te Tc

T3 T4 T5 T6

1

2

3

5

6

4

7

9

8

T1 T2T7 T8 T9

Tf Tg

Page 21: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Auxiliary Structures for Incremental DTD Validation

1 2 ii-1 i+1 k-1 k… …

r

1 2 k-1 k…

vi

u(vi, )

r

i…

r

r

Page 22: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Specialized DTD Incremental Validation: Take One

a1 aiai-1 ai+1 ak…

r

b1 bk-1 bk…

vi

u(vi, )…

types(vi)={i,1,…, i,n}

types()

types()

types()

types(vi)={i,1,…, i,n}

types()

types()

types()

Page 23: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Inefficient for Deep Trees: Apply Divide-And-Conquer in Vertical Direction

Turn Specialized DTD into NFA

that validates a vertical line

“Fuse” vertical and horizontal directions

using binary treeand split work in both

Page 24: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Tree Satisfying Specialized DTD transformed into Binary Tree Accepted By Tree Automaton

a

b

c

d j k

e

f h

g i

a

b

c

d j k

e

f h

g i

#

#

#

#

# #

#

#

#

#

# #

Page 25: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Designate Lines in Binary Trees

Size( ) > 2 Size( )

Size( ) > 2 Size( )

Size( ) > 4 Size( )

Page 26: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Example Line Structure

a

b

c

d j k

e

f h

g i

#

#

#

#

# #

#

#

#

#

# #

a

c

db

#

f

#

j

e

k

#

h

g i

##

#

#

#

#

#

#

#

Page 27: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

From Tree Automaton to Validating Lines with NFA

a

c

b

j

e

k

h

g

id fd

Page 28: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

From Tree Automaton to Validating Lines with NFA

a

c

b, Tc

j

e

k

h

g

id, Tj f, Tg

Page 29: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Incremental Validation of the Line Structure in O(log2|T|)

a

c

b, Tc

j

e

k

h

g

if, Tg

m

d, Tj

Insert m after k #updated lines < 1 + log |T|Cost of line update O(log |T|)

Page 30: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Validating Insertions and Deletions: the Non-Line-Preserving Case

Inse

rtion

Page 31: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Key Complexity Results

Given m updates on tree of size n, incrementally validate DTD in O(m log n) given alphabet , size of maximum regular

expression d: O(m || d2 log d log n) Data structure of size O(d2 n)

Specialized DTDs in O(m log2 n) given set of types ’

O(m |’|2 d2 (log d + log |’|) log2 n) Data structure of size O(|’|2 d2 log2 n)

Lower complexity for 1-unambiguous

Page 32: 1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Ongoing and Future Work (with Andrey Balmin)

Incorporate Transition Relation Trees in B-Tree Structure

Exploit “locality” Experimental evaluation on set of 65 DTDs: In 96% of

type definitions an update may only affect transition relations of length<4

Common case much more efficient than worse case Detect the property and employ algorithms that do not

build trt’s in such cases Optimization over multiple updates More complex updates & edit operations