1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD
Mar 31, 2015
1
Incremental Validation of XML Databases
Yannis Papakonstantinou
Victor VianuComputer Science & Eng, UCSD
Incremental Validation of XML Databases:
XMLDatabase
Document Type Definition (DTD)
XML Schema/ XQuery Type
System
Updates
O(log n)
O(log2n)
n nodes
XML As Labeled Ordered Trees
cars
used new
car car car car
year model year model model
92 Civic 96 Acura
model
Civic Maxima
year
03
Document Type Definitions (DTDs): Abstraction & Example
cars
used new
car car car
year model year model model
root : carscars used newused car*new car*car (year|) model
car
modelyear
92 Civic 96 Acura Civic Maxima03
Tree Satisfying DTD, General Case
1 2 ii-1 i+1 k-1 k… …
…
1 2 k-1 k…
…a b c
root : … r
…
r
XML Schemas/XQuery Types as Specialized DTDs
cars
used new
car car car
year model year model model
root : carsT
carsT usedT newT
usedT carU *newT carN *carU yearT modelT
carN (yearT |) modelT
car
modelyear
usedT
newT
carsT
carU carNcarU, carN
modelTyearTyearTmodelT modelT modelTyearT
LABEL TYPEScar {carU, carN}cars {carsT}used {usedT} …
Tree Automata Specialized DTDs
cars
used new
car car car
year model year model model
car
modelyear
usedT
newT
carsT
carU,carN carN
carU,carN
carU,carN
modelTyearTyearTmodelT modelT modelTyearT
Incremental Validation Problem Statement
For each valid tree T use an auxiliary structure A(T)
so that,given a series of update commands
• efficiently decide if the updated tree T’ is valid
• efficiently update A(T) and T
Types of Updates: Node Renaming u(v, )
1 2 ii-1 i+1 k-1 k… …
…
r
1 2 k-1 k…
…a b c
v
Types of Updates: Deletion d(v)
1 2 i-1 i+1 k-1 k… …
…
r
…a b c
i
1 2 k-1 k…
v
Types of Updates: Insertion
1 2 i-1 i+1 k-1 k… …
…
r
…a b c
vi+1
i
insert_after(vi-1, i)
vi-1
Validating a Renaming u(i, ) on a Regular String of N : Take One
12 ii-1 i+1 n-1 n
… N…
Validation of one update in O(1) given
precomputedPre and Post
Post(i+1)
Pre(i-1)
u(i, ) requires recomputation of Pre(i),
Pre(i+1), … and of Post(i), Post(i-1), …
q0 1
2 i-1
…
qF
n
n-1i+1 …
q0
1
2 i-1
…
Transition Relation Definition
12 i j n-1 n
… …… …m
Ti,j = { (q, q’) | }
i+1
q i…i+1
q’j
m+1
Ti,j = Ti,m Tm+1,j
Transition Relation Trees
1 2 3 4 5 6 7 8
T5,8T1,4
T3,4T1,2 T5,6 T7,8
T1,1 T2,2
T3,3 T4,4
T5,5 T6,6
T7,7 T8,8
T1,8
Maintenance of the Structure and Validation in O(log n)
1 2 3 4 5 6 7 8
T1,1 T2,2 T3,3 T4,4 T5,5 T6,6 T7,7 T8,8
T1,2 T3,4 T5,6 T7,8
T5,8T1,4
T1,8
u(6, )
If (q0, qF) then valid
T6,6
T5,6
T5,8
T1,8
Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions
1
2
3
5
6
7
9
T1 T2 T3 T5 T6 T7 T9
Ta Tb Tc
Ta = T1 T2
If (q0, qF) Ta Tb Tc then valid
Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions
1
2
3
5
6
7
9
8
T1 T2 T3 T5 T6 T7 T8 T9
Ta Tb Tc
Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions
1
2
3
5
6
4
7
9
8
T1 T2T7 T8 T9
Ta Tb Tc
T3 T5 T6
Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions
T3 T4 T5 T6
1
2
3
5
6
4
7
9
8
T1 T2T7 T8 T9
Ta Tb Tc
Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions
Ta Td Te Tc
T3 T4 T5 T6
1
2
3
5
6
4
7
9
8
T1 T2T7 T8 T9
Tf Tg
Auxiliary Structures for Incremental DTD Validation
1 2 ii-1 i+1 k-1 k… …
…
r
1 2 k-1 k…
…
vi
u(vi, )
r
i…
…
r
r
Specialized DTD Incremental Validation: Take One
a1 aiai-1 ai+1 ak…
…
r
b1 bk-1 bk…
…
vi
u(vi, )…
types(vi)={i,1,…, i,n}
types()
types()
types()
types(vi)={i,1,…, i,n}
types()
types()
types()
Inefficient for Deep Trees: Apply Divide-And-Conquer in Vertical Direction
…
…
Turn Specialized DTD into NFA
that validates a vertical line
“Fuse” vertical and horizontal directions
using binary treeand split work in both
Tree Satisfying Specialized DTD transformed into Binary Tree Accepted By Tree Automaton
a
b
c
d j k
e
f h
g i
a
b
c
d j k
e
f h
g i
#
#
#
#
# #
#
#
#
#
# #
Designate Lines in Binary Trees
Size( ) > 2 Size( )
Size( ) > 2 Size( )
Size( ) > 4 Size( )
Example Line Structure
a
b
c
d j k
e
f h
g i
#
#
#
#
# #
#
#
#
#
# #
a
c
db
#
f
#
j
e
k
#
h
g i
##
#
#
#
#
#
#
#
From Tree Automaton to Validating Lines with NFA
a
c
b
j
e
k
h
g
id fd
From Tree Automaton to Validating Lines with NFA
a
c
b, Tc
j
e
k
h
g
id, Tj f, Tg
Incremental Validation of the Line Structure in O(log2|T|)
a
c
b, Tc
j
e
k
h
g
if, Tg
m
d, Tj
Insert m after k #updated lines < 1 + log |T|Cost of line update O(log |T|)
Validating Insertions and Deletions: the Non-Line-Preserving Case
Inse
rtion
Key Complexity Results
Given m updates on tree of size n, incrementally validate DTD in O(m log n) given alphabet , size of maximum regular
expression d: O(m || d2 log d log n) Data structure of size O(d2 n)
Specialized DTDs in O(m log2 n) given set of types ’
O(m |’|2 d2 (log d + log |’|) log2 n) Data structure of size O(|’|2 d2 log2 n)
Lower complexity for 1-unambiguous
Ongoing and Future Work (with Andrey Balmin)
Incorporate Transition Relation Trees in B-Tree Structure
Exploit “locality” Experimental evaluation on set of 65 DTDs: In 96% of
type definitions an update may only affect transition relations of length<4
Common case much more efficient than worse case Detect the property and employ algorithms that do not
build trt’s in such cases Optimization over multiple updates More complex updates & edit operations