Top Banner
A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1 , Philipp Rösch 1 , Erik Buchmann 2 , Klemens Böhm 2 1 Department of Computer Science and Automation, TU Ilmenau 2 Department of Computer Science, University of Magdeburg
22

A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

A Physical Query Algebra for DHT-based P2P Systems

Kai-Uwe Sattler1, Philipp Rösch1, Erik Buchmann2, Klemens Böhm2

1Department of Computer Science and Automation, TU Ilmenau2Department of Computer Science, University of Magdeburg

Page 2: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

2

Distributed Hash Tables

Examples: CAN, CHORD, PASTRY, etc. Advantages of P2P systems, e.g.,

No SPOF, shared infrastructure costs, censorship-resistance

Manage huge sets of (key, value)-pairs Cope with large numbers of parallel

transactions Efficient query processing:

Greedy forward routing, But only simple exact-match queries on

unstructured data sets

Page 3: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

3

Extended Queries in DHT Some extensions:

Trigrams - text retrieval beethoven: bee eet eth tho hov ove ven

Bloom filters - hash-based AND Feature vectors - multimedia documents

But: Extensions are application-specific No universal query algebra

Idea: Relational data sets, SQL-like queries

Applications: management of genom data, semantic web, distributed indexes

Page 4: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

4

Relational Data in DHT?

Storing relational data in DHT Fragmentation scheme? Accessing secondary keys?

Support for SQL-like query processing Distribution scheme for complex queries? Join operations? Full-table scan without flooding?

Exploiting the P2P nature No central instance, no global knowledge Parallel processing Problems with availability and failures

Page 5: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

5

Outline of Our Approach

Use Content-Addressable Networks (CAN) Locality-aware hash function

Preserving neighborhood of similar tuples Space-filling curve

API Extension Multicast Temporary re-hashing

Distributed query plan operators (POP) Selection, join, grouping/aggregation POP distribution scheme

Page 6: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

6

Content-Addressable Networks

Proposed by S. Ratnasamy (2001)

Keys: d-dimensional points

Key space is a torus in d dimensions

Example: d=2

Page 7: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

7

Zones and Neighbors in CAN

Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone

Each peer knows the neighbors of its zone

Random assignment of peers to zones at startup

Overloading of zones, multiple realities, ...

Page 8: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

8

Greedy Forward Routing in CAN

get(k):1. Forward request

to that neighbor whose zone is closest to k

2. Repeat until the peer responsible for k is reached

(k,v)

get(k)

Page 9: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

9

Managing Relational Data:Simple Approach

Relation r R, Tuple t r, t = {ak, a1, ..., an }Key k‘ = h(ak)

Problems:1. Tuples are irregularly

disseminated over the key space, i.e., only exact-match queries are supported

2. No search for attributes other than primary key

xx

x

x

x

x

x

x

σ5<a k<10

(r) ?

σab=20(r) ?

Page 10: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

10

Fragmentation Scheme Reverse bit interleaving (z-curve)

Tuple t r, t = {ak, a1, ..., an } Two hash functions:

Key k‘ = hr(r) ° hk(ak)

(RelationID,Key Value)

RelationID Key

0 0 0 1 0 1 0 0

0 0 0 1 0 0 1 0

hr hk

Dimension #1 Dimension #2

(1,2)

Key k‘ = h(ak)

Page 11: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

11

Two Hash Functions

Key k‘ = hr(r) ° hk(ak) hr(r): RelationID

determines the placement of the space-filling curve

hk(ak): primary key determines the position on the curve,locality-awarenessak = 0,

ra, rb, rc

1, 2, 3, 4, ...

Page 12: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

12

Additional API Primitives

Standard operations: put(k, v), v=get(k) Only two additional operations needed for our

query algebra: put_temp(), multicast()

put_temp(k, v, t) Re-hashing of a given relation Temporary put-operation Allows indexed access to other attributes

than the primary key

Page 13: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

13

Additional API Primitives (Cont.)

multicast(zmin, zmax, POP) Sends a message to a

group of peers Peers are identified by

an interval of the z-curve

Example: σ3<ak<6(r)

multicast(3,6, POP)

send(σak=3)

send(σ4<ak<6)

Page 14: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

14

Query Plan Operators (POP)

Hash-based implementation for selection, join, grouping, aggregation

Distributed query processing Operator Trees

R

S

T

Page 15: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

15

Selection

Selection POP On the primary key:

Example: σ3<ak<6(r) Determine the interval on the z-curve Send selection operator via multicast

On other attributes: Example: σ3<a5<6(r) Perform full-table scan,

e.g., multicast( min(a5), max(a5), POP)

Page 16: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

16

Join

Nested Loop Join POP, Symmetric Hash Join POP On the primary key:

Perform join immediately On other attributes:

Re-hash the relation using put_temp first Perform join as above

Page 17: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

17

Example: Symmetric Hash Join

shjoin(R,S)

shjoin(R,S)

shjoin(R,S)

shjoin(R,S)

put_temp(h(tR),tR,x)

put_temp(h(tS),tS,x)R2

R1

S1

S2

RS1

RS2

R S

Page 18: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

18

Sorting/Aggregation

Central grouping POP: One peer iterates over the z-curve,

performs central sorting/aggregation Hash group POP:

Re-distribute the relation using a hash function on the attribute to be sorted/aggregated

“Aggregation Peers” are responsible for sorting/aggregation of incoming attribute values

Page 19: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

19

Query Evaluation

Input Left-handed POP trees

Design Principles Stateless evaluation Blocking operations:

delivery of intermediate data (early aggregation)

R

S

T

Page 20: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

20

r1

Query Evaluation: Example

P0

P1

P2

P3

P4

P5

P0

r2a

r2b

rra

rrb

rr

r1

r2

Page 21: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

21

Conclusion

Current state: Prototype is fully implemented Execution of plans like

(shjoin a1=a2 (scan a3>42 REL1) (scan REL2))

First experiments in small CAN (100 Peers) are promising

Page 22: A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler 1, Philipp Rösch 1, Erik Buchmann 2, Klemens Böhm 2 1 Department of Computer Science.

E. Buchmann A Physical Query Algebra for DHT-based P2P Systems

22

Conclusion (cont.)

Future topics: Experiments with large data sets and many

nodes (100,000 nodes, 10 mio. queries, test data from the TCP-H benchmark)

Optimization of the different POP implementations

Efficient range queries Dynamic query operations