Page 1
TASM: Top-k Approximate Subtree Matching
Nikolaus Augsten1 Denilson Barbosa2
Michael Bohlen3 Themis Palpanas4
1Free University of Bozen-Bolzano, [email protected]
2University of Alberta, [email protected]
3University of Zurich, [email protected]
4University of Trento, [email protected]
ICDE 2010, March 3Long Beach, CA, USA
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 1 / 28
Page 2
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 2 / 28
Page 3
Motivation and Problem Definition
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 3 / 28
Page 4
Motivation and Problem Definition
Motivation
Query (XML fragment) Document (very large XML)
article
authors
author
Tim
author
John
booktitle
ICDEDBLP
28M nodes, 531MB
top-k matches?
Rank the top-k matches for the article query in the DBLP document!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 4 / 28
Page 5
Motivation and Problem Definition
Motivation
Query (XML fragment) Document (very large XML)
article
authors
author
Tim
author
John
booktitle
ICDEDBLP
28M nodes, 531MB
top-k matches?
Rank the top-k matches for the article query in the DBLP document!
Example Answer: k = 3inproceedings
authors
author
Tim
author
John
booktitle
ICDE
(1 error)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 4 / 28
Page 6
Motivation and Problem Definition
Motivation
Query (XML fragment) Document (very large XML)
article
authors
author
Tim
author
John
booktitle
ICDEDBLP
28M nodes, 531MB
top-k matches?
Rank the top-k matches for the article query in the DBLP document!
Example Answer: k = 3inproceedings
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
authorsauthor
John
booktitle
TKDE
(1 error) (2 errors)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 4 / 28
Page 7
Motivation and Problem Definition
Motivation
Query (XML fragment) Document (very large XML)
article
authors
author
Tim
author
John
booktitle
ICDEDBLP
28M nodes, 531MB
top-k matches?
Rank the top-k matches for the article query in the DBLP document!
Example Answer: k = 3inproceedings
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
authorsauthor
John
booktitle
TKDE
inproceedings
authors
author
Tim
author
John
author
Peter
booktitle
ICDE
(1 error) (2 errors) (3 errors)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 4 / 28
Page 8
Motivation and Problem Definition
TASM: Top-k Approximate Subtree Matching
Definition (TASM: Top-k Approximate Subtree Matching)
Given: query tree Q, document tree T , size k of rankingGoal: Compute a
top-k ranking R = (T1, T2, . . . ,Tk)
of all subtrees Ti of document T
with respect to query Q
using the tree edit distance for the ranking.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 5 / 28
Page 9
Motivation and Problem Definition
TASM: Top-k Approximate Subtree Matching
Definition (TASM: Top-k Approximate Subtree Matching)
Given: query tree Q, document tree T , size k of rankingGoal: Compute a
top-k ranking R = (T1, T2, . . . ,Tk)
of all subtrees Ti of document T
with respect to query Q
using the tree edit distance for the ranking.
Subtree Ti :
a node and all its descendantslargest subtree is document itself
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 5 / 28
Page 10
Motivation and Problem Definition
TASM: Top-k Approximate Subtree Matching
Definition (TASM: Top-k Approximate Subtree Matching)
Given: query tree Q, document tree T , size k of rankingGoal: Compute a
top-k ranking R = (T1, T2, . . . ,Tk)
of all subtrees Ti of document T
with respect to query Q
using the tree edit distance for the ranking.
Subtree Ti :
a node and all its descendantslargest subtree is document itself
top-k ranking R = (T1, Ti , . . . , Tk )
subtrees sorted by distance to querybest k subtrees: Ti /∈ R ⇒ ted(Q, Tk ) ≤ ted(Q, Ti )
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 5 / 28
Page 11
Motivation and Problem Definition
Ranking Function: Tree Edit Distance (TED)
article
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
TKDE
Tree Edit Distance: Minimum number of node edit operations(insert, rename, delete) that transform one tree into the other.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 6 / 28
Page 12
Motivation and Problem Definition
Ranking Function: Tree Edit Distance (TED)
article
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
TKDE
del(authors)
Tree Edit Distance: Minimum number of node edit operations(insert, rename, delete) that transform one tree into the other.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 6 / 28
Page 13
Motivation and Problem Definition
Ranking Function: Tree Edit Distance (TED)
article
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
TKDE
del(authors) ren(ICDE)
Tree Edit Distance: Minimum number of node edit operations(insert, rename, delete) that transform one tree into the other.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 6 / 28
Page 14
Motivation and Problem Definition
Ranking Function: Tree Edit Distance (TED)
article
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
TKDE
del(authors) ren(ICDE)
Tree Edit Distance: Minimum number of node edit operations(insert, rename, delete) that transform one tree into the other.
TASM computes TED between query and document subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 6 / 28
Page 15
Motivation and Problem Definition
Ranking Function: Tree Edit Distance (TED)
article
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
TKDE
del(authors) ren(ICDE)
Tree Edit Distance: Minimum number of node edit operations(insert, rename, delete) that transform one tree into the other.
TASM computes TED between query and document subtrees
Size and number of computed subtrees define TASM complexity
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 6 / 28
Page 16
Motivation and Problem Definition
State of the Art
TASM-Dynamic: dynamic programming solution1
computes distance to every subtree of the documentuse smaller subtrees to compute larger onesrank subtrees by visiting memoization tableSpace complexity: O(mn), m: query size, n: document size
1Zhang and Shasha 1989, Demaine et al. 2007Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 7 / 28
Page 17
Motivation and Problem Definition
State of the Art
TASM-Dynamic: dynamic programming solution1
computes distance to every subtree of the documentuse smaller subtrees to compute larger onesrank subtrees by visiting memoization tableSpace complexity: O(mn), m: query size, n: document size
Space complexity limits application to databases
in database applications n is huge (database size!)TASM-Dynamic maintains two m × n matrixes in RAM> 6GB RAM for our tiny query (m = 8) on DBLP (n = 28 × 106)
1Zhang and Shasha 1989, Demaine et al. 2007Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 7 / 28
Page 18
Motivation and Problem Definition
State of the Art
TASM-Dynamic: dynamic programming solution1
computes distance to every subtree of the documentuse smaller subtrees to compute larger onesrank subtrees by visiting memoization tableSpace complexity: O(mn), m: query size, n: document size
Space complexity limits application to databases
in database applications n is huge (database size!)TASM-Dynamic maintains two m × n matrixes in RAM> 6GB RAM for our tiny query (m = 8) on DBLP (n = 28 × 106)
For database size solutions dynamic programming is too expensive.
1Zhang and Shasha 1989, Demaine et al. 2007Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 7 / 28
Page 19
Motivation and Problem Definition
State of the Art
TASM-Dynamic: dynamic programming solution1
computes distance to every subtree of the documentuse smaller subtrees to compute larger onesrank subtrees by visiting memoization tableSpace complexity: O(mn), m: query size, n: document size
Space complexity limits application to databases
in database applications n is huge (database size!)TASM-Dynamic maintains two m × n matrixes in RAM> 6GB RAM for our tiny query (m = 8) on DBLP (n = 28 × 106)
For database size solutions dynamic programming is too expensive.
State-of-the-art algorithms do not scale!
1Zhang and Shasha 1989, Demaine et al. 2007Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 7 / 28
Page 20
Motivation and Problem Definition
Problem Definition
Find a solution for TASM (Top-k Approximate Subtree Matching) that
scales to very large documents
runs in small memory
ranks subtrees correctly (no heuristics!)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 8 / 28
Page 21
TASM-Postorder
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 9 / 28
Page 22
TASM-Postorder Upper Bound on Subtree Size
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 10 / 28
Page 23
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
worst match
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
Page 24
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
Page 25
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
Page 26
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
Page 27
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
Page 28
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
3. Size upper bound for subtree Ti
|Ti | − |Q| ≤ ted(Q, Ti ) Q Ti
at least:insert missing nodes
|Ti | − |Q|
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
Page 29
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
3. Size upper bound for subtree Ti
|Ti | − |Q| ≤ ted(Q, Ti ) Q Ti
at least:insert missing nodes
|Ti | − |Q|
|Ti | ≤ ted(Q, Ti ) + |Q|
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
Page 30
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
3. Size upper bound for subtree Ti
|Ti | − |Q| ≤ ted(Q, Ti ) Q Ti
at least:insert missing nodes
|Ti | − |Q|
|Ti | ≤ ted(Q, Ti ) + |Q| ≤ 2|Q| + |T ′
k |
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
Page 31
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
|T ′
k | ≤ k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
3. Size upper bound for subtree Ti
|Ti | − |Q| ≤ ted(Q, Ti ) Q Ti
at least:insert missing nodes
|Ti | − |Q|
|Ti | ≤ ted(Q, Ti ) + |Q| ≤ 2|Q| + |T ′
k |
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
Page 32
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
|T ′
k | ≤ k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
3. Size upper bound for subtree Ti
|Ti | − |Q| ≤ ted(Q, Ti ) Q Ti
at least:insert missing nodes
|Ti | − |Q|
|Ti | ≤ ted(Q, Ti ) + |Q| ≤ 2|Q| + |T ′
k | ≤ 2|Q| + k
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
Page 33
TASM-Postorder Upper Bound on Subtree Size
Upper Bound on Subtree Size
Theorem (Upper Bound on Subtree Size)
TASM needs to consider only small document subtrees of size τ or less:
τ = 2|Q| + k
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 12 / 28
Page 34
TASM-Postorder Upper Bound on Subtree Size
Upper Bound on Subtree Size
Theorem (Upper Bound on Subtree Size)
TASM needs to consider only small document subtrees of size τ or less:
τ = 2|Q| + k
Upper bound is very powerful:
independent of document size and structure!
linear in query size and k
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 12 / 28
Page 35
TASM-Postorder Upper Bound on Subtree Size
Upper Bound on Subtree Size
Theorem (Upper Bound on Subtree Size)
TASM needs to consider only small document subtrees of size τ or less:
τ = 2|Q| + k
Upper bound is very powerful:
independent of document size and structure!
linear in query size and k
Example: top-10 with example query |Q| = 8 on DBLP (28M nodes)
with bound: max subtree size τ = 2 ∗ 8 + 10 = 26
without bound: maximum subtree size is 28M (whole document)!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 12 / 28
Page 36
TASM-Postorder Upper Bound on Subtree Size
Upper Bound on Subtree Size
Theorem (Upper Bound on Subtree Size)
TASM needs to consider only small document subtrees of size τ or less:
τ = 2|Q| + k
Upper bound is very powerful:
independent of document size and structure!
linear in query size and k
Example: top-10 with example query |Q| = 8 on DBLP (28M nodes)
with bound: max subtree size τ = 2 ∗ 8 + 10 = 26
without bound: maximum subtree size is 28M (whole document)!
Document-independent upper bound on subtree size!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 12 / 28
Page 37
TASM-Postorder Prefix Ring Buffer Pruning
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 13 / 28
Page 38
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 39
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 40
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 41
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 42
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 43
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 44
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 45
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 46
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 47
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 48
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 49
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 50
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 51
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 52
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 53
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 54
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 55
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 56
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 57
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 58
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1 title,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 59
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1 title,2
book,3
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 60
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1 title,2
book,3 dblp,22
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 61
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1 title,2
book,3 dblp,22
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Relevant and state-of-the-art for XML Parsing
full subtree known only at closing tagclosing tags appear in postorder
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 62
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1 title,2
book,3 dblp,22
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Relevant and state-of-the-art for XML Parsing
full subtree known only at closing tagclosing tags appear in postorder
Implementation is efficient and heavily used for
XML streamsplain XML files (e.g., SAX)XML in database (Dewey, interval encoding, ...)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
Page 63
TASM-Postorder Prefix Ring Buffer Pruning
Candidate Subtrees
Candidate subtrees are all subtrees Ti of the document with
|Ti | ≤ τ ANDTi is not contained in a larger subtree |Tj | ≤ τ
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 15 / 28
Page 64
TASM-Postorder Prefix Ring Buffer Pruning
Candidate Subtrees
Candidate subtrees are all subtrees Ti of the document with
|Ti | ≤ τ ANDTi is not contained in a larger subtree |Tj | ≤ τ
Pruning: find candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 15 / 28
Page 65
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
Page 66
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
Page 67
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
Page 68
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
Page 69
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
Page 70
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Problem: memory buffer can grow very large!must keep subtrees in memory until non-candidate ancestor is readworst case: memory buffer stores O(n) nodes(frequent in data-centric XML!)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
Page 71
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Problem: memory buffer can grow very large!must keep subtrees in memory until non-candidate ancestor is readworst case: memory buffer stores O(n) nodes(frequent in data-centric XML!)
Example: DBLP, τ = 50
99% of nodes are still in buffer when root node is read!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
Page 72
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Problem: memory buffer can grow very large!must keep subtrees in memory until non-candidate ancestor is readworst case: memory buffer stores O(n) nodes(frequent in data-centric XML!)
Example: DBLP, τ = 50
99% of nodes are still in buffer when root node is read!
Simple pruning not feasible for large documents!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
Page 73
TASM-Postorder Prefix Ring Buffer Pruning
Efficient Pruning is Tricky!
Problem: when can we remove a node from the buffer?
when we see |Ti | ≤ τ , we don’t yet know about parent (postorder!)subtree of parent might be smaller than τ !
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 17 / 28
Page 74
TASM-Postorder Prefix Ring Buffer Pruning
Efficient Pruning is Tricky!
Problem: when can we remove a node from the buffer?
when we see |Ti | ≤ τ , we don’t yet know about parent (postorder!)subtree of parent might be smaller than τ !
Our Solution does not wait for parent
prefix ring buffer: fixed size bufferpruning rule: prune based on following nodes
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 17 / 28
Page 75
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)
e↑ s↑John,1 auth,2 X1,1 title,4 article,5
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
Page 76
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)
e↑ s↑John,1 auth,2 X1,1 title,4 article,5
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
Page 77
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)
e↑ s↑John,1 auth,2 X1,1 title,4 article,5
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
append new node
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
Page 78
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)VLDB,1
e↑ s↑John,1 auth,2 X1,1 title,4 article,5
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
append new node
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
Page 79
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)VLDB,1
e↑ s↑John,1 auth,2 X1,1 title,4 article,5
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
append new noderemove leftmost subtree/node
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
Page 80
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)
s↑VLDB,1
e↑
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
append new noderemove leftmost subtree/node
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
Page 81
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)
s↑VLDB,1
e↑
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
append new noderemove leftmost subtree/node
Pruning rule: If leftmost node in full ring buffer is
leaf: leftmost subtree is candidate subtree
non-leaf: leftmost node is non-candidate node
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
Page 82
TASM-Postorder Prefix Ring Buffer Pruning
Pruning Rule – Intuition
Candidate subtree: leftmost node is a leaf
Ti : leftmost subtree, starts with leftmost nodeTj : smallest subtree that contains Ti
due to postorder: Tj contains all nodes in buffersince |Ti | ≤ τ and |Tj | > τ : Ti is a candidate
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 19 / 28
Page 83
TASM-Postorder Prefix Ring Buffer Pruning
Pruning Rule – Intuition
Candidate subtree: leftmost node is a leaf
Ti : leftmost subtree, starts with leftmost nodeTj : smallest subtree that contains Ti
due to postorder: Tj contains all nodes in buffersince |Ti | ≤ τ and |Tj | > τ : Ti is a candidate
Non-candidate node: leftmost node is a non-leaf
leftmost non-leaf is parent of previously removed nodeswe remove either candidate subtrees and non-candidate nodesin both cases: parent is a non-candidate
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 19 / 28
Page 84
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)John,1 auth,2 X1,1 · · ·
prefix ring buffer (main memory)
s↑ e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 85
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)auth,2 X1,1 title,2 · · ·
prefix ring buffer (main memory)
s↑John,1
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 86
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X1,1 title,2 article,5 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 87
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)title,2 article,5 VLDB,1 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2 X1,1
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 88
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)article,5 VLDB,1 conf,2 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2 X1,1 title,2
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 89
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)VLDB,1 conf,2 Peter,1 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2 X1,1 title,2 article,5
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 90
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)conf,2 Peter,1 auth,2 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2 X1,1 title,2 article,5 VLDB,1
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 91
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)conf,2 Peter,1 auth,2 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2 X1,1 title,2 article,5 VLDB,1
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 92
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)conf,2 Peter,1 auth,2 · · ·
prefix ring buffer (main memory)
s↑VLDB,1
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 93
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)Peter,1 auth,2 X3,1 · · ·
prefix ring buffer (main memory)
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 94
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)auth,2 X3,1 title,2 · · ·
prefix ring buffer (main memory)Peter,1
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 95
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X3,1 title,2 article,5 · · ·
prefix ring buffer (main memory)Peter,1 auth,2
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 96
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)title,2 article,5 Mike,1 · · ·
prefix ring buffer (main memory)Peter,1 auth,2 X3,1
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 97
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)article,5 Mike,1 auth,2 · · ·
prefix ring buffer (main memory)Peter,1 auth,2 X3,1 title,2
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 98
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)article,5 Mike,1 auth,2 · · ·
prefix ring buffer (main memory)Peter,1 auth,2 X3,1 title,2
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 99
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)article,5 Mike,1 auth,2 · · ·
prefix ring buffer (main memory)
s↑Peter,1 auth,2 X3,1 title,2
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 100
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)Mike,1 auth,2 X4,1 · · ·
prefix ring buffer (main memory)
s↑Peter,1 auth,2 X3,1 title,2 article,5
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 101
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)auth,2 X4,1 title,2 · · ·
prefix ring buffer (main memory)
s↑Peter,1 auth,2 X3,1 title,2 article,5 Mike,1
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 102
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)auth,2 X4,1 title,2 · · ·
prefix ring buffer (main memory)
s↑Peter,1 auth,2 X3,1 title,2 article,5 Mike,1
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 103
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)auth,2 X4,1 title,2 · · ·
prefix ring buffer (main memory)
s↑Mike,1
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 104
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X4,1 title,2 article,5 · · ·
prefix ring buffer (main memory)
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 105
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)title,2 article,5 proc,13 · · ·
prefix ring buffer (main memory)X4,1
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 106
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)article,5 proc,13 X2,1 · · ·
prefix ring buffer (main memory)X4,1 title,2
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 107
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)proc,13 X2,1 title,2 · · ·
prefix ring buffer (main memory)X4,1 title,2 article,5
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 108
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X2,1 title,2 book,3 · · ·
prefix ring buffer (main memory)X4,1 title,2 article,5 proc,13
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 109
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X2,1 title,2 book,3 · · ·
prefix ring buffer (main memory)X4,1 title,2 article,5 proc,13
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 110
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X2,1 title,2 book,3 · · ·
prefix ring buffer (main memory)
s↑proc,13
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 111
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)title,2 book,3 dblp,22 · · ·
prefix ring buffer (main memory)
s↑proc,13 X2,1
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 112
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)book,3 dblp,22 · · ·
prefix ring buffer (main memory)
s↑proc,13 X2,1 title,2
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 113
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)dblp,22 · · ·
prefix ring buffer (main memory)
e↑ s↑proc,13 X2,1 title,2 book,3
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 114
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)dblp,22
e↑ s↑proc,13 X2,1 title,2 book,3
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 115
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)dblp,22
e↑ s↑proc,13 X2,1 title,2 book,3
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 116
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)dblp,22
e↑ s↑X2,1 title,2 book,3
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 117
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)dblp,22
e↑ s↑X2,1 title,2 book,3
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 118
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)
s↑dblp,22
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 119
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)
s↑dblp,22
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 120
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)
s↑ e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
Page 121
TASM-Postorder Prefix Ring Buffer Pruning
TASM-Postorder
TASM-postorder
1. empty ranking R, tightening upper bound τ ′= τ
2. for each candidate subtree Ti
a. if |R| = k: update τ ′ = min(τ,max(R) + |Q|)b. compute tree edit distance for all subtrees of Ti within τ ′
c. update ranking R
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 21 / 28
Page 122
TASM-Postorder Prefix Ring Buffer Pruning
TASM-Postorder
TASM-postorder
1. empty ranking R, tightening upper bound τ ′= τ
2. for each candidate subtree Ti
a. if |R| = k: update τ ′ = min(τ,max(R) + |Q|)b. compute tree edit distance for all subtrees of Ti within τ ′
c. update ranking R
Theorem (TASM-Postorder)
The space complexity of TASM-postorder is independent of thedocument size:
O(m2 + mk)
(m: query size, k: result size)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 21 / 28
Page 123
TASM-Postorder Prefix Ring Buffer Pruning
TASM-Postorder
TASM-postorder
1. empty ranking R, tightening upper bound τ ′= τ
2. for each candidate subtree Ti
a. if |R| = k: update τ ′ = min(τ,max(R) + |Q|)b. compute tree edit distance for all subtrees of Ti within τ ′
c. update ranking R
Theorem (TASM-Postorder)
The space complexity of TASM-postorder is independent of thedocument size:
O(m2 + mk)
(m: query size, k: result size)
TASM-postorder scales to very large documents!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 21 / 28
Page 124
Experiments
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 22 / 28
Page 125
Experiments
Pruning Effectiveness
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 23 / 28
Page 126
Experiments
Pruning Effectiveness
Prefix ring buffer pruning is very effective!Maximum subtree reduced from 37M to 18 nodes.
Dataset: PSD protein sequences, 37M nodes, 683MB
Compute TASM (|Q| = 4, k = 1)
TASM-dynamic (state of the art)TASM-postorder (our solution)
Histogram of computed subtrees
1e0
1e1
1e2
1e3
1e4
1e5
1e6
1e7
1e0 1e1 1e2 1e3 1e4 1e5 1e6 1e7
num
ber
of s
ubtr
ees
subtree size (nodes)
largest subtree: 37Mentire document
TASM-Dynamic
1e0
1e1
1e2
1e3
1e4
1e5
1e6
1e7
1e0 1e1 1e2 1e3 1e4 1e5 1e6 1e7
num
ber
of s
ubtr
ees
subtree size (nodes)
largest subtree: 18
TASM-Postorder
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 23 / 28
Page 127
Experiments
Scalability: TASM-Postorder vs. TASM-Dynamic
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 24 / 28
Page 128
Experiments
Scalability: TASM-Postorder vs. TASM-Dynamic
TASM-postorder much faster than TASM-dynamic.
Dataset: XMark (synthetic XML for benchmark)
Vary query size and document size
Compute TASM (k = 5)
TASM-dynamic (state of the art)TASM-postorder (our solution)
Measure wall clock time
1e0
1e1
1e2
1e3
4 8 16 32 64
time
(sec
onds
)
query size (nodes)
dyn, T:224MBdyn, T:112MBpos, T:224MBpos, T:112MB
1e0
1e1
1e2
1e3
112 224 448 896 1792
time
(sec
onds
)
document size (MB)
dyn, |Q|=8dyn, |Q|=4pos, |Q|=8pos, |Q|=4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 24 / 28
Page 129
Experiments
Scalability with Result Size k
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 25 / 28
Page 130
Experiments
Scalability with Result Size k
TASM-postorder scales well with k .Increasing k by 4 orders of magnitude only doubles runtime.
0
50
100
150
200
250
300
1e0 1e1 1e2 1e3 1e4
time
(sec
onds
)
k
dyn, T:224MBdyn, T:112MBpos, T:224MBpos, T:112MB
Dataset: XMark (synthetic XML forbenchmark)
Vary k (size of ranking)
Compute TASM (|Q| = 16)
TASM-dynamic (state of the art)TASM-postorder (our solution)
Measure wall clock time
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 25 / 28
Page 131
Experiments
Space complexity: TASM-Postorder vs. TASM-Dynamic
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 26 / 28
Page 132
Experiments
Space complexity: TASM-Postorder vs. TASM-Dynamic
TASM-postorder: space independent of document!
1e0
1e1
1e2
1e3
4e3
112 224 448 896 1792
mem
ory
(MB
)
document size (MB)
3GB
8MB
dyn, |Q|=16dyn, |Q|=4
pos, |Q|=16pos, |Q|=4
Dataset: XMark (synthetic XML forbenchmark)
Vary document size
Compute TASM (k = 5)
TASM-dynamic (state of the art)TASM-postorder (our solution)
Measure main memory usage
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 26 / 28
Page 133
Conclusion and Future Work
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 27 / 28
Page 134
Conclusion and Future Work
Conclusion
Conclusion
Prefix Ring Buffer for space efficient pruning
Dynamic programming does not scale for database size solutions.
Upper bound τττ : limit maximum subtree size for TASM
TASM-postorder: highly scalable TASM algorithm
TASM-postorder makes TASM feasible.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 28 / 28
Page 135
Conclusion and Future Work
Conclusion
Conclusion
Prefix Ring Buffer for space efficient pruning
Dynamic programming does not scale for database size solutions.
Upper bound τττ : limit maximum subtree size for TASM
TASM-postorder: highly scalable TASM algorithm
TASM-postorder makes TASM feasible.
Future Work – New research opportunities:
tune tree edit distance to different applications
index the document: can we avoid a document scan?
parallel TASM algorithm: where to split document?
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 28 / 28
Page 136
Erik D. Demaine, Shay Mozes, Benjamin Rossman, and OrenWeimann.An optimal decomposition algorithm for tree edit distance.In ICALP, volume 4596 of LNCS, pages 146–157, Wroclaw, Poland,July 2007. Springer.
K. Zhang and D. Shasha.Simple fast algorithms for the editing distance between trees andrelated problems.SIAM J. on Computing, 18(6):1245–1262, 1989.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 28 / 28