ADT 2008 ADT 2008 Lecture 5 Lecture 5 XQuery Updates in MonetDB/XQuery XQuery Updates in MonetDB/XQuery Stefan Manegold [email protected] http://www.cwi.nl/~manegold/
ADT 2008ADT 2008
Lecture 5Lecture 5
XQuery Updates in MonetDB/XQueryXQuery Updates in MonetDB/XQuery
Stefan [email protected]
http://www.cwi.nl/~manegold/
2
[email protected] Lecture 5: XQuery Updates ADT 2008
• skipping: avoid touching node ranges that cannot contain results
Generate a duplicate-free result in document order • pruning: reduce the context set a-priori• partitioning: single sequential pass over the document
document
List of context nodes
seek
seek scan skip seek scan skip ...
Staircase Join Staircase Join [VLDB03][VLDB03]
3
[email protected] Lecture 5: XQuery Updates ADT 2008
Loop-lifted XPath StepsLoop-lifted XPath Steps
Many algorithms have been proposed & studied for XPath evaluation:• Dataguide based, • Structural Join,• Staircase Join, • Holistic Twig Join
IN: sequence of context nodes in (doc order)OUT: sequence of document nodes (unique, in doc order)
4
[email protected] Lecture 5: XQuery Updates ADT 2008
Loop-lifted XPath StepsLoop-lifted XPath Steps
In XQuery, expressions generally occur inside FLWR blocks, i.e. inside a for-loop
for $x in doc()//employee $x/ancestor::department
Choice:• call XPath algorithm N times, accessing document and index structures N times.• use a loop-lifted algorithm:
IN: for each iteration, a sequence of context nodesOUT: for each iteration, a sequence of document nodes (per iteration unique, in doc order)
5
[email protected] Lecture 5: XQuery Updates ADT 2008
Staircase joinStaircase join
document
List of context nodes
6
[email protected] Lecture 5: XQuery Updates ADT 2008
Loop-lifted staircase joinLoop-lifted staircase join
document document
List of context nodes Active stack
Multiple lists of context nodes
Adapt:
pruning, partitioning and skipping rules
to correctly deal with multiple context sets
7
[email protected] Lecture 5: XQuery Updates ADT 2008
Loop-lifted staircase joinLoop-lifted staircase join
Results on the 20 XMark queries:
8
[email protected] Lecture 5: XQuery Updates ADT 2008
• 15.09.2008:
•RDBMS back-end support for XML/XQuery (1/2):
•Document Representation (XPath Accelerator, Pre/Post plane)
•XPath navigation (Staircase Join)
• 22.09.2008:
•XQuery to Relational Algebra Compiler:
•Item- & Sequence- Representation
•Efficient FLWoR Evaluation (Loop-Lifting)
•Optimization
• 29.09.2008:
•RDBMS back-end support for XML/XQuery (2/2):
•Updateable Document Representation
•Other (DB-) approaches to XML/XQuery processing
ScheduleSchedule
9
[email protected] Lecture 5: XQuery Updates ADT 2008
What is MonetDB?
• Main-memory based DBMS backend/kernel
• Developed at CWI since 1992
• “Query-intensive” applications
• Data mining
• Data warehousing / decision support
• Multi-media information retrieval (text, images, audio, video, XML, ...)
• XML databases
• GIS
• part of Data Distilleries' products
• CWI spin-off company
• (>100GB) databases at ABN Amro, Postbank, Ohra, Spaarbeleg, FBTO, Centerparcs, Vodafone
• Nowadays: part of SPSS
10
[email protected] Lecture 5: XQuery Updates ADT 2008
MonetDB: Motivation (1/2)• Relational DBMS dominate the scene
• Oracle, SQLserver, DB2
• databases a solved problem?
11
[email protected] Lecture 5: XQuery Updates ADT 2008
MonetDB: Motivation (1/2)• Relational DBMS dominate the scene
• Oracle, SQLserver, DB2
• databases a solved problem? No!
Problems:
• performance
• new ‘query intensive’ applications (data mining, et al)
• extensibility
• new applications (GIS,text,image,audio,video,XML)
12
[email protected] Lecture 5: XQuery Updates ADT 2008
MonetDB: Motivation (2/2)
• are relational DBMS fit for the job?
• developed in end 1970’s begin 1980’s
13
[email protected] Lecture 5: XQuery Updates ADT 2008
MonetDB: Motivation (2/2)
• are relational DBMS fit for the job?
• developed in end 1970’s begin 1980’s
• hardware has changed
• CPUs get faster but more vulnerable
• capacity and bandwidth follows Moore’s law
• latency becomes a bottleneck (I/O and RAM)
14
[email protected] Lecture 5: XQuery Updates ADT 2008
MonetDB: Motivation (2/2)
• are relational DBMS fit for the job?
• developed in end 1970’s begin 1980’s
• hardware has changed
• CPUs get faster but more vulnerable
• capacity and bandwidth follows Moore’s law
• latency becomes a bottleneck (I/O and RAM)
• applications have changed
• RDBMS tuned for transaction processing
• not query-intensive
• only business domain
17
[email protected] Lecture 5: XQuery Updates ADT 2008
How is MonetDB Different
• full vertical fragmentation: always!• everything in binary (2-column) tables
• saves you from table scan hell in OLAP and Data Mining
• the RISC approach to databases• simple data model, simple query language
• don’t need (to pay for) a buffer manager => manage virtual memory
• explicit transaction management => DIY approach to ACID
• CPU and memory cache optimized• programming team experienced in main memory DBMS techniques
• use of scientific programming optimizations (loop unrolling)
•Cache conscious data structures and algorithms
18
[email protected] Lecture 5: XQuery Updates ADT 2008
MonetDB: Shopping ListMonetDB: Shopping List
• A quantum leap in performance requires a quantum leap in technology (and risk)
• Better support for non-administrative applications, using:• Multi-model database kernel support• Extensible data types, operators, accelerators• Database hot-set is memory resident (but scale to TB)• Use simple data structures• Index management should be automatic• Algebraic language as the computational model• Query optimization = strategic + tactic + operational optimization• Dynamic optimization, parallelism, JIT-compile-link-run• Cooperative (application) transaction management• Do not replicate the operating system
19
[email protected] Lecture 5: XQuery Updates ADT 2008
Storing Relations in MonetDBStoring Relations in MonetDB
21
[email protected] Lecture 5: XQuery Updates ADT 2008
Object-Oriented MappingObject-Oriented Mapping
22
[email protected] Lecture 5: XQuery Updates ADT 2008
Hash tables,T-trees,R-trees,...
BAT Data StructureBAT Data Structure
BAT: binary association table
BUN: binary unit
BUN heap: - consecutive memory block (array) - memory-mapped file
23
[email protected] Lecture 5: XQuery Updates ADT 2008
BAT Storage OptimizationsBAT Storage Optimizations
Dense ascendingsequence
24
[email protected] Lecture 5: XQuery Updates ADT 2008
type - (physical) type number
enum - enumerated type flag
dense - dense ascending range
sorted - ascending head sorting
constant - all equal values
align - unique sequence id
key - no duplicates on column
set - no duplicates in BAT
hash - accelerator flag
Ttree - accelerator flag
mirrored - head=tail value
count - cardinality
BAT Property ManagementBAT Property Management
25
[email protected] Lecture 5: XQuery Updates ADT 2008
XQuery Update Facility 1.0 W3C Candidate Recommendation http://www.c3.org/TR/xquery-update-10/
• Categorize updates into• Value updates• Structural updates
(MonetDB/XQuery does not yet support the latest syntax changes made by W3C; for details see
http://monetdb.cwi.nl/XQuery/Documentation/XQuery-Updates.html)
XML/XQuery UpdatesXML/XQuery Updates
26
[email protected] Lecture 5: XQuery Updates ADT 2008
do replace value of fn:doc("bib.xml")/books/book[1]/pricewith fn:doc("bib.xml")/books/book[1]/price * 1.1
do replace value of fn:doc(“bib.xml”)/books/book[2]/@isbnwith “9061965179”
do rename fn:doc(“bib.xml”)/books/book[3]/author[1]into “primaryauthor”
do rename fn:doc(“bib.xml”)/journals/journal[9]/@isbninto “issn”
=> map directly to simple value updates in relational storage
Value UpdatesValue Updates
27
[email protected] Lecture 5: XQuery Updates ADT 2008
do insert attribute isbn {“906196517”}into fn:doc("bib.xml")/books/book[17]
do delete fn:doc(“bib.xml”)/books/book[2]/@wrong
do insert <author>Stefan Manegold</author>after fn:doc(“bib.xml”)/books/book[33]/author[last()]
do replace fn:doc(“bib.xml”)/books/book[44]/author[1]with fn:doc(“bib.xml”)/books/book[33]/author[last()]
do delete fn:doc(“bib.xml”)/books/book[author = “Kermit”]
=> How to implement on pre-/post-encoding?
Structural UpdatesStructural Updates
28
[email protected] Lecture 5: XQuery Updates ADT 2008
XML/XQuery XML/XQuery UpdatesUpdates
do insert <k><l/><m/></k> as first into /a/f/g
29
[email protected] Lecture 5: XQuery Updates ADT 2008
XML/XML/XQuery XQuery UpdatesUpdates
do insert <k><l/><m/></k> as first into /a/f/g
32
[email protected] Lecture 5: XQuery Updates ADT 2008
XML/XML/XQuery XQuery UpdatesUpdates
StaircaseStaircaseJoinJoin
33
[email protected] Lecture 5: XQuery Updates ADT 2008
XML Storage RevisitedXML Storage Revisited
N9N8N7
N6N5N4N3N2nullnullN1N0nid
147
null03
30113010229
208
306305224
null-121510110levelsizerid
309308227206145304303222131090levelsizepre
null-12nullnull3
30113010229208147306305224
1510110levelsizepre
69j58i77h46g85f14e03d22c31b90a
postpre
post = pre + size - level
Allow holes Define logical pages
34
[email protected] Lecture 5: XQuery Updates ADT 2008
XML Storage RevisitedXML Storage Revisited
N5N4N3
N2N9N8N7N6nullnullN1N0nid
307
null03
14113010309
228
306225204
null-121510110levelsizerid
309308227206145304303222131090levelsizepre
null-12nullnull3
30113010229208147306305224
1510110levelsizepre
69j58i77h46g85f14e03d22c31b90a
postpre
post = pre + size - level
Allow holes Define logical pages
122100mappage
rid = pre.swizzle( )
35
[email protected] Lecture 5: XQuery Updates ADT 2008
XML Storage RevisitedXML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column
MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join
Opportunity currently not exploited by other RDBMS
Occurs widely in our XQuery translation.
N5N4N3
N2N9N8N7N6nullnullN1N0nid
307
null03
14113010309
228
306225204
null-121510110levelsizerid
36
[email protected] Lecture 5: XQuery Updates ADT 2008
XML Storage RevisitedXML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column
MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join
Opportunity currently not exploited by other RDBMS
Occurs widely in our XQuery translation.
N5N4N3
N2N9N8N7N6nullnullN1N0nid
307
null03
14113010309
228
306225204
null-121510110levelsizerid
37
[email protected] Lecture 5: XQuery Updates ADT 2008
MonetDB/XQueryMonetDB/XQueryOur own XML DBMS with (almost..) full XQuery support.• Built purely on an RDBMS, namely MonetDB
• Future: also middleware support (P2P!!) in AmbientDB
Pathfinder compiler & “staircase join” (see later):– Technical University Munich (Torsten Grust, et al.)
– Technical University Twente (Maurice van Keulen, et. al.)
MonetDB High-Performance DBMS– CWI Amsterdam (Peter Boncz, Stefan Manegold, ...)
Useful for:
• Large XML databases!
• Querying XML annotations (multimedia, forensic NFI)
Pathfinder Compiler
RelationalAlgebra
XQuery
RDBMS
(MonetDB)
38
[email protected] Lecture 5: XQuery Updates ADT 2008
Current ProjectsCurrent Projects• Value indeces & runtime optimization
• Code freeze, release this week
• Algebraic Query Optimization• Some released, most still in the development version
• Distributed XQuery P2P XQuery• SOAP group communication, XQuery RPC
• VLDB'07 [Zhang, Boncz]
• Benchmarking beyond XMark• ExpDB'06 Workshop [Manegold]
• Support for XML Interval Annotations• XIME-P'06 Workshop [Alink et al.]
• ...
39
[email protected] Lecture 5: XQuery Updates ADT 2008
ConclusionsConclusions• Relational approach can be scalable & fast
• MonetDB/XQuery compares favorably with all other available systems
• Techniques that made it work• Property-driven peephole optimization
Order & other properties
• Loop-lifted XPath steps Evaluate Sets of context nodes in a single pass
• Support for dense (autoincrement) keys Positional lookup
• Background Information & Literaturehttp://monetdb-xquery.orghttp://pathfinder-xquery.org
40
[email protected] Lecture 5: XQuery Updates ADT 2008
Exam / TentamenExam / Tentamen
Tuesday October 21 2008
9:45 – 11:45
REC-G S.14