Page 1
Evaluating Reachability Queries over Path Collections*
P. Bouros1, S. Skiadopoulos2, T. Dalamagas3, D. Sacharidis3, T. Sellis1,3
1National Technical University of Athens2University of Peleponnese
3Institute for Management of Information Systems – R.C. Athena
HDMS'09
* Long version of SSDBM’09 paper
Page 2
Introduction (I)
• Several applications store and query large collections of data sequences– Recent advances in GIS and geoservices resulted
in large volumes of routes (e.g., Points of Interest (POIs) sequences)
• Route collections– Points => nodes– Sequences => routes
HDMS'09
Page 3
Introduction (II)
• Web sites retain huge collections of routes– ShareMyRoutes.com– TravelByGPS.com
• People visiting Athens– Track their sightseeing– Create routes of
interesting places
• Frequent updates– Users upload new routes
HDMS'09
Page 4
Problem
• Route collections1. Too large to fit in main
memory2. Frequently updated,
adding new routes
• Reachability queries– Q: path from Academy to
Zappeion– A: Academy -> University
of Athens (change to route p2) -> Parliament-> Zappeion
HDMS'09
Page 5
Problem
• Route collections1. Too large to fit in main
memory2. Frequently updated,
adding new routes
• Reachability queries– Q: path from Academy to
Zappeion– A: Academy -> University
of Athens (change to route p2) -> Parliament-> Zappeion
HDMS'09
Page 6
Why not a graph-based solution?
• Transform route collection P into graph GP
1) Searching: depth or breadth-first search• Low storage and maintance cost• Slow query evaluation
2) Enconding transitive closure:1)Fast query evaluation2)Expensive precomputation, not for frequently updated graphs
1)2-hop [CH+02], HOPI [STW05] 2)DAGs: Geometric-based & partitioning 2-hop [CY+06,08], interval LB
[AB+89]3)GRIPP [TL07]
HDMS'09
Page 7
Outline
• The pfs algorithm– Indexing route collections– Indexing route transitions
• Index maintenance• Experimental evaluation• Conclusions and Further work
HDMS'09
Page 8
The pfs algorithm (I)
• Path-first search, basic idea: – Examine part of routes at once, not single nodes
• Extend depth-first search– Work with routes instead of graph edges
• For each route p containing current node v– Visit each node after v (successor) in p– Push to dfs stack set of successors at once
HDMS'09
Page 9
The pfs algorithm (II)
• Find a path from node F to C
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 10
The pfs algorithm (II)
• Find a path from node F to C
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 11
The pfs algorithm (II)
• Find a path from node F to C
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 12
The pfs algorithm (II)
• Find a path from node F to C
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)• Answer:
(F, D, N, B, C)
Page 13
P-Index
• Inverted index on route collections– For each node store
routes containing it
• Access paths containing current node
• Better termination condition => pfsP– Identify a path containing
current node before target
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
node routes list
A <p1,1>, <p2,1>, <p5,1>
B <p1,2>, <p2,5>, <p4,3>
C <p1,3>
D <p1,4>, <p2,3>, <p4,1>
… …
Page 14
P-Index
• Inverted index on route collections– For each node store
routes containing it
• Access paths containing current node
• Better termination condition => pfsP– Identify a path containing
current node before target
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
node routes list
A <p1,1>, <p2,1>, <p5,1>
B <p1,2>, <p2,5>, <p4,3>
C <p1,3>
D <p1,4>, <p2,3>, <p4,1>
… …
Page 15
P-Index
• Inverted index on route collections– For each node store
routes containing it
• Access routes containing current node
• Better termination condition => pfsP– Identify a route
containing current node before target
HDMS'09
node routes list
A <p1,1>, <p2,1>, <p5,1>
B <p1,2>, <p2,5>, <p4,3>
C <p1,3>
D <p1,4>, <p2,3>, <p4,1>
… …p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 16
The pfsP algorithm
• Find a path from F to T
HDMS'09
node routes list
… ….
F <p2,2>, <p4,4>, <p5,2>
… …
T <p2,6>
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 17
The pfsP algorithm
• Find a path from F to T
HDMS'09
JOIN
JOIN
node routes list
… ….
F <p2,2>, <p4,4>, <p5,2>
… …
T <p2,6>
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 18
The pfsP algorithm
• Find a path from F to T
HDMS'09
JOIN
JOIN
node routes list
… ….
F <p2,2>, <p4,4>, <p5,2>
… …
T <p2,6>
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 19
The pfsP algorithm
• Find a path from F to T
• Answer: (F, D, N, B, T)
HDMS'09
JOIN
JOIN
node routes list
… ….
F <p2,2>, <p4,4>, <p5,2>
… …
T <p2,6>
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 20
H-graph (I)
HDMS'09
• Graph representation of collection– Nodes
• Routes of the collection
– Edges (pi, pj, v) • All possible transitions among
routes • Edge label v => share node,
link
• Better termination condition => pfsH– Identify an edge on H-graph
Page 21
H-graph (I)
• Graph representation of collection– Nodes
• Routes of the collection
– Edges (pi, pj, v) • All possible transitions among
routes • Edge label v => share node,
link
• Better termination condition => pfsH– Identify an edge on H-graph
HDMS'09
p1 (A, B, C, D, J)
p4 (D, N, B, F, K)
Page 22
H-graph (I)
HDMS'09
p1 (A, B, C, D, J)
p4 (D, N, B, F, K)
• Graph representation of collection– Nodes
• Routes of the collection
– Edges (pi, pj, v) • All possible transitions among
routes • Edge label v => share node,
link
• Better termination condition => pfsH– Identify an edge on H-graph
Page 23
H-graph (I)
HDMS'09
p1 (A, B, C, D, J)
p4 (D, N, B, F, K)
• Graph representation of collection– Nodes
• Routes of the collection
– Edges (pi, pj, v) • All possible transitions among
routes • Edge label v => share node,
link
• Better termination condition => pfsH– Identify an edge on H-graph
Page 24
H-graph (I)
HDMS'09
• Graph representation of collection– Nodes
• Routes of the collection
– Edges (pi, pj, v) • All possible transitions among
routes • Edge label v => share node,
link
• Better termination condition => pfsH– Identify an edge on H-graph
Page 25
H-graph (II)
• Find a path from node F to J
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 26
H-graph (II)
• Find a path from node F to J
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 27
H-graph (II)
• Find a path from node F to J
HDMS'09
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)• Answer: (F, D, J)
Page 28
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 29
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 30
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 31
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 32
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 33
H-Index
• In practice, H-Index, adj. lists of H-graph
HDMS'09
route edges listp1 <p2, B:2:5>,<p2, D:4:3>,<p4, B:2:3>,
<p4, D:4:1>
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>, <p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
p1 p2
B,D
Page 34
The pfsH algorithm
• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}
routes[J] = {<p1,5>}
HDMS'09
route edges list
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>
p5 <p2, F:2:2>,<p4, F:2:4>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 35
The pfsH algorithm
• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}
routes[J] = {<p1,5>}
HDMS'09
route edges list
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>
p5 <p2, F:2:2>,<p4, F:2:4>
… …
p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 36
The pfsH algorithm
• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}
routes[J] = {<p1,5>}
HDMS'09
route edges list
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>
p5 <p2, F:2:2>,<p4, F:2:4>
… …
JOIN
JOIN p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 37
The pfsH algorithm
• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}
routes[J] = {<p1,5>}
HDMS'09
route edges list
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>
p5 <p2, F:2:2>,<p4, F:2:4>
… …
JOIN
JOIN p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
Page 38
The pfsH algorithm
• Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>}
routes[J] = {<p1,5>}
HDMS'09
route edges list
p2 <p1, D:3:4>,<p1, B:5:2>,<p3, N:4:1>,<p4, F:2:4>,<p4, D:3:1>,<p4, N:4:2>,<p4, B:5:3>,<p5, F:2:2>
p4 <p1, B:3:2>,<p2, N:2:4>,<p2, B:3:5>,<p2, F:4:2>,<p3, N:2:1>,<p5, F:4:2>
p5 <p2, F:2:2>,<p4, F:2:4>
… …
JOIN
JOIN p1 (A, B, C, D, J)
p2 (A, F, D, N, B, T)
p3 (N, L, M)
p4 (D, N, B, F, K)
p5 (A, F, K)
• Answer: (F, D, J)
Page 39
Index maintenance
• P-Index, H-Index as inverted files on disk– Updates -> adding new routes– Not consider each new route separately– Batch updates, consider set of new routes
• Basic idea:– Build memory resident P-Index, H-Index for new
routes– Merge disk-based indices with memory resident
onesHDMS'09
Page 40
Outline
• The pfs algorithm– Indexing route collections– Indexing route transitions
• Index maintenance• Experimental evaluation• Conclusions and Further work
HDMS'09
Page 41
Setup
• Synthetic route collections– |P|, lavg, |V|, zipf, U
• Compare– Convert collection to graph, dfs & adjacency lists– pfsP & P-Index– pfsH & P-Index, H-Index
• Construction cost, query evaluation, vary one of |P|, lavg, |V|, zipf
• Maintenance cost, vary UHDMS'09
Page 42
Index construction
HDMS'09
|P| (x 103)lavg = 10, |V| = 100000, zipf = 0.8
|V| (x 103)|P| = 100000, lavg = 10, zipf = 0.8
Page 43
Query evaluation (I)
HDMS'09
|P| (x 103)lavg = 10, |V| = 100000, zipf = 0.8
lavg
|P| = 100000, |V| = 100000, zipf = 0.8
Page 44
Query evaluation (II)
HDMS'09
|V| (x 103)|P| = 100000, lavg = 10, zipf = 0.8
zipf|P| = 100000, lavg = 10, |V| = 100000
Page 45
Index maintenance
HDMS'09
|P| = 100000, lavg = 10, |V| = 100000, zipf = 0.8
U (%) U (%)
Page 46
Conclusions
• Reachability queries over frequently updated route collections
• The path-first search (pfs) algorithm– Indexing route collections: P-Index & pfsP– Indexing route transitions: H-Index & pfsH
• Handling frequent updates, adding new routes• Experimental evaluation
– P-Index & pfsP, low construction & maintance cost– H-Index, P-Index & pfsH, fast query evaluation
HDMS'09
Page 47
Further work
• Ongoing– New index that combines P-Index & H-Index
advantages• Low constructing and maintenance cost• Fast query evaluation
• Future work– Other types of queries
• Considering constraints
HDMS'09
Page 48
Thank you!
HDMS'09