Top Banner
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003
26

Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Jan 20, 2018

Download

Documents

Paul Burstein: PIER, 11/10/20033 Motivation Inject a degree of distribution into databases Internet scale systems vs. hundred node systems Large scale applications requiring database functionaity
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Querying the Internet with PIER

CS294-4Paul Burstein11/10/2003

Page 2: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

2

Outline Motivation Architecture Join Algorithms Evaluation Discussion

Page 3: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

3

Motivation Inject a degree of distribution into

databases Internet scale systems vs. hundred

node systems Large scale applications requiring

database functionaity

Page 4: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

4

Applications P2P Databases

Highly distributed and available data

Network Monitoring Intrusion detection Fingerprint queries

Page 5: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

5

Design Principles Relaxed Consistency

Sacrifice Consistency in face of Availability and Partition tolerance

Organic Scaling Growth with deployment

Natural Habitats for Data Data remains in original format with a DB

interface Standard Schemas

Achieved though common software

Page 6: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

6

Outline Motivation Architecture Join Algorithms Evaluation Discussion

Page 7: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

7

PIER Architecture

Page 8: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

8

DHT Design Implemented with CAN

and Chord Routing Layer

Mapping for keys Storage Manager

Node data storage Provider

Storage access interface for higher levels

Page 9: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

9

Routing & Storage Routing Layer

DHT-based API locationMapChange – local

key set change Storage Manager

Easy to realize API Efficient performance relative

to network Main-memory storage

manager used

Page 10: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

10

Provider Couples the routing

and storage layers namespace – relation resourceId – primary key

namespace + resourceId key instanceId – distinguishes objects with same

namespace and resourceID lifetime – item storage duration

multicast – contacts namespace’s nodes lscan – iterates over a node’s local data newData – application callback on data arrival

Page 11: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

11

PIER Query Processor Query dataflow engine Operators:

Selection, projection, joins, grouping, aggregation

Operators push and pull data Current data modification is

though the DHT interface Relaxed consistency and reachable snapshot

Working only with nodes reachable at the time a query is issued

Page 12: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

12

Outline Motivation Architecture Join Algorithms Evaluation Discussion

Page 13: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

13

Join Algorithms Symmetric Hash Join

Rehashes the relations Scan and copy

Fetch Matches One relation already hashed on join attribute

R, S – relations Nr, Ns – relation namespaces Nq - DHT-based temporary table

Page 14: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

14

Join Rewriting Aimed at lowering the bandwidth

utilization

Symmetric semi-join Local projections to join keys Global fetch matches join

Bloom joins Local bloom filters are published into

temporary namespaces Filters multicast to opposite relation’s nodes

Page 15: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

15

How does this scale?

Page 16: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

16

Outline Motivation Architecture Join Algorithms Evaluation Discussion

Page 17: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

17

Workload Parameters CAN configuration: d = 4 R 10 times larger than S Constants provide 50% selectivity f(x,y) evaluated after the join 90% of R tuples match a tuple in S Result tuples are 1KB each Symmetric hash join used

Page 18: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

18

Simulation Setup Up to 10,000 nodes Network cross-traffic, CPU and

memory utilizations ignored1. 100ms and 10Mbps fully connected links2. GT-ITM transit-stub topology

Page 19: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

19

Scalability 1MB data per node Fully-connected topology Variable number of

computation nodes

Network congestion is an issue with few computation nodes

How is the computation workload distributed?

Page 20: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

20

Join Algorithms (1/2) Infinite Bandwidth 1024 data and computation nodes Core join Algorithms

Perform faster Rewrites

Bloom Filter: two multicasts Semi-join: two CAN lookups

Page 21: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

21

Join Algorithms (2/2) Limited Bandwidth

10Mbps inbound capacity 25GB relations, 1024 nodes

Symmetric Hash Join Rehashes both tables

Semi-join Transfers only matching tuples

At 40% selectivity, bottleneck switches from computation nodes to query sites

Page 22: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

22

Soft State Failure detection and

recovery 15 second failure

detection 4096 nodes

Refresh period Time to reinsert lost

tuples

Page 23: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

23

Transit Stub Topology GT-ITM

4 Domains, 10 nodes per domain, 3 stubs per node

50ms, 10ms, 2ms latency 10Mbps inbound links

Similar trends as fully connected topology A bit longer end-to-end

delays

Page 24: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

24

Experimental Results 64 PCs on 1Gbps

network All nodes are

computation nodes

Page 25: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

25

Outline Motivation Architecture Join Algorithms Evaluation Discussion

Page 26: Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

Paul Burstein: PIER, 11/10/2003

26

Discussion PIER presents a distributed query

engine

What remains to be done? DB issues Networking issues