Top Banner
Unstructure P2P Overla y
71

Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Jan 13, 2016

Download

Documents

Mae Franklin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Unstructure P2P Overlay

Page 2: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Improving Search in Peer-to-Peer Networks

ICDCS 2002

Beverly YangHector Garcia-Molina

Page 3: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Current Techniques

• Gnutella– BFS with depth limit D.– Waste bandwidth and processing resources

• Freenet– DFS with depth limit D.– Poor response time.

Page 4: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Iterative Deepening

• Basic idea is to reduce the number of nodes that process a query

• Under policy P= { a, b, c} ;waiting time W

• See example.

Page 5: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Directed BFS

• A source send query messages to just a subset of its neighbors

• A node maintains simple statistics on its neighbors– Number of results received from each neighbor– Latency of connection

Page 6: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Candidate nodes

• Returned the Highest number of results

• Return response messages that have taken the lowest average number of hops

• High messages

Page 7: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Local Indices

• Each node n maintains an index over the data of all nodes within r hops radius.

• All nodes at depths not listed in the policy simply forward the query.

• Example: policy P= { 1, 5}

Page 8: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Experimental Setup

• For each response ,we log:– Number of hops took– IP from which the Response message came– Response time– Individual results

Page 9: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Experimental result

Page 10: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Routing Indices For P-to-P Systems

ICDCS 2002

Page 11: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Introduction• Search in a P2P system

– Mechanisms without an index– Mechanisms with specialized index nodes (cent

ralized search)– Mechanisms with indices at each node

• Structure P2P network• Unstructure P2P network

• Parallel v.s. sequentially search– Response time– Network traffic

Page 12: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Routing indices(RI)• Query

– Documents are on zero or more “topics”, and queries request documents on particular topics.

– Documents topics are independent

• Local index• RI

– Each node has a local routing index which contains following information

• The number of documents along each path• The number of documents on each topic of interest

– Allow a node to select the “best” neighbors to send a query to

Page 13: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• The RI may be “coarser” than the local indices

Page 14: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• Goodness measure– Number of results in a path

• Using Routing indices

Page 15: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

– Storage space• N: number of nodes in the P2P network

• b: branching factor

• c: number of categories

• s: counter size in bytes

Centralized index : s*( c+1) *N

Distributed system: s*(c+1)*b (each node)

Page 16: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• Creating routing indices

Page 17: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• Maintaining Routing Indices– Trade off between RI freshness and update cost– No requiring the participation of a

disconnecting node

• Discussion– If the search topics is dependent?– Can the number of “hops” necessary to reach a

document be estimated?

Page 18: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Alternative Routing Indices

• Hop-count RI– Aggregated RIs for each “hop” up to a maximu

m number of hops are stored

Page 19: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

– Search cost• Number of messages

– The goodness of a neighbor• The ratio between the number of documents availabl

e through that neighbor and the number of messages required to get those documents

– Regular tree with fanout F

– It takes Fh messages to find all documents at hop h

– Storage cost?

Page 20: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• Exponentially aggregated RI– Store the result of applying the regular-tree cost

formula to a hop-count RI

– How to compute the goodness of a path for the query containing several topics?

Page 21: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Cycles in the P2P network

Page 22: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Efficient Content Location Using Interest-Based Locality in Peer-to-

Peer SystemsKunwadee Sripanidkulchai

Bruce Maggs

Hui Zhang

IEEE INFOCOM 2003

Page 23: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

motivation

• Although flooding is simple and robust, it is not scalable.

• A content location solution in which peers organized into an interest-based structure on top of Gnutella.

• The algorithm is called interest-based shortcuts

Page 24: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Interest-based locality

Page 25: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Shortcuts Architecture and Design Goals

• To create additional links on top of a peer-to-peer system’s overlay

• As a separate performance enhancement layer on top of existing content location mechanisms

Page 26: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Content location paths

Page 27: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Shortcut Discovery• The first lookup returns a set of peers that store

the content

• These are potential candidates.

• One peer is selected at random from the set and added

• For scalability, each peer allocates a fixed-size amount of storage to implement shortcuts.

• Alternatives for shortcut discovery– Exchanging shortcut lists between peers

Page 28: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Shortcut selection

• We rank shortcuts based on their perceived utility

• A peer sequentially asking all of the shortcuts on its list.

Page 29: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Ranking metrics

• Probability of providing content

• Latency of the path to the shortcut

• Load at the shortcut

• A combination of metrics can be used based on each peer’s preference

Page 30: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Potential and Limitations

• Adding 5 shortcuts at a time produces success rates that are close to the best possible.

• Slightly increase the shortest path length from 1 to 2 hops will perform better success rate.

Page 31: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Efficient and Scalable Query Routing for Unstructured Peer-to-Peer Networks

A. Kumar, J. Xu and W.W. Zegura

Page 32: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Overview

• As the distance from the node hosting the object increases, fewer bits are used to represent information about the direction in which the object located

Page 33: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Design• Exponential decay bloom filter (EDBF)

– Bloom filter is a data structure for approximately answering set membership questions

• k hash functions, and an array A• A[hi(x)]=1, for i=1…k• (x) =|{i|A[hi(x)]=1, i=1..k}|

– # of 1’s in the filter– (x) /k roughly indicates the probability of finding x along a specific lin

k in the overlay– Noise?

– When there is no noise• one hop away from the object x, (x) is approximately k bits• two hops away from the object x, (x) is approximately k/d

– Decay implementation• Decay rate is 1/d• Nodes reset each of the bits in the EDBFs received from upstream nei

ghbors with a probability (1-d)

Page 34: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• Creation and Maintenance of routing tables

Page 35: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• The initial advertisement is created by taking the union of all advertisements received from neighbors other than the target neighbor

• Decay the combined advertisement by the decay factor d

• Union the result with the local EDBF– The local EDBF is propagated without attenuation

• Loops– Split horizon with poisoned reverse

• Information received from a neighbor j will not be advertised back to j

– Exponentially decay• The count to infinity problem manifests itself as a “decay to

infinitely small amount of information”

Page 36: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• Query forwarding

Page 37: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• If the query is satisfied locally, it is answered

• Otherwise, if the TTL of the query has not expired– If the query was previously seen, it is forwarde

d to a randomly chosen neighbor– Otherwise, the query is forwarded to the neighb

or with the highest (x)

Page 38: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Structure P2P Overlay

Page 39: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Similarity Discovery in structured P2P Overlays

ICPP

Page 40: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Introduction• Structured P2P network

– Only support search with a single keyword

• Similarity between two documents– Keyword sets– Vector space– Measure

• Problems– Search problem– New keyword?

||||cos 1

ba

baab

Page 41: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Meteorograph

• Absolute angle

Page 42: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Publishing and Searching

• Publish– Hash

– Publish the item to a node np with the hash key closest to hash value

Page 43: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• Search problem– Nearest answers– K_nearest answers–

• Partial

• Comprehensive

• Search strategy

• Discussions

• What happened when keyword vector is represented by ?

Page 44: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Other issues

• Load balance

• Changes of vector space– Republished?– Comprehensive set of keywords– Other methods?

Page 45: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

SWAM: A Family of Access Methods for Similarity-Search in

Peer-to-Peer Data NetworksFarnoush Banaei-KashaniCyrus Shahabi

(CIKM04)

Page 46: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

PDN access method

• Defines

• How to organize the PDN topology to an index-like structure

• How to use the index structure

Page 47: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Hilbert space

• Hilbert space (V, Lp)• Key k = (a1,a2, … , ad)

– d: the dimension of a Vector space– The domain is a contiguous and finite interval o

f R

• The Lp norm with p belongs to Z+– The distance function to measure the dissimilari

ty

Page 48: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.
Page 49: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Topology

• Topology of a PDN can be modelled as a directed graph G(N, E)

• A(n) is the set of neighbors for node n

• A node maintains– A limited amount of information about its neigh

bors Includes • the key of the tuples maintained at neighbors

• The physical addresses of neighbors

Page 50: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• The processing of the query is completed when all expected tuples in the relevant result set are visited

• Access methods– Join, leave for virtual nodes– Forward for using local information to process

queries and make forwarding decisions

Page 51: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

The small world example

• Grid component

• Random graph component

• The process of queries (exact, range, kNN) in the highly locality topology

Page 52: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.
Page 53: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Flat partitioning

• SWAM also employs the space partitioning idea: flat partitioning

Page 54: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Query Processing

• Exact-Match query processing

• Range query processing

• kNN Query processing

Page 55: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Similarity Search in Peer-to-Peer Databases

IEEE International Conference on Distributed Computing Systems 2005

Page 56: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Data and Query Model

• All data objects are unit vectors in a d-dimensional Euclidean space

• Cosine distance

• Can

Page 57: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Design Details• The indexing scheme

– Locality sensitive hashing function is used to reduce the dimensionality

• r is a d-dimensional unit vector• h(x) is the concatenation of the bits br1(X),br2(X)…brk(X)

– Objects with the same hash value belong to the same cluster and are stored at the node which owns the DHT key h(x)

• Group nearby objects to indices with low hamming distance• To avoid the situation that nearby objects differ in some bit positions i

n their index– t hashing functions are used (replication)

» To ensures that there is a high probability of two related objects hashing onto indices with low hamming distance in at least one of these sets

Page 58: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• The search algorithm– Node u generate Query (x, )– Compute h(x)– Compute the set V of all indices whose hammin

g distance from h(x) is at most r.– Node u queries each of the node in V– Nodes in V return all data objects which match

u’s query– How to determine r?

Page 59: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• Adaptive replication– Ensure the number of copies of each key in the

network is proportional to its popularity• The number of copies of each key is proportional to

the rate at which queries arrive for this key

• Randomized Lookup– The lookup for a specific key terminates

uniformly at random at one of the copies of this key

– Guarantee that the load is balanced uniformly across all copies of all keys in the system.

Page 60: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Discussion

• Search cost ?• What is the cardinality of set V?

• Availability ?

Page 61: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Guaranteeing Correctness and Availability in P2P Range Indices

SIGMOD 2005

Page 62: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Introduction

• Hashing destroys the value ordering among the search key values– Cannot be used to process range queries

efficiently

• Solution– Range indices assign data items to peers

directly based on their search key value– Load balance?

Page 63: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

P-ring overview

• Two types of peers– Live peers

• Used to store data item• The data stored in each live peer is between sf and 2*sf (sf: st

orage factor)– Free peers

• Overflow (> 2*sf)– Split its assigned range with a free peer

• Underflow (< sf)– Merge with its successor in the ring to obtain more entr

ies

Page 64: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Incorrect query results

• Inconsistent Ring

Page 65: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• Concurrency in the data store

Page 66: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Solution

• Handling ring inconsistency– Two states

• Joined and joining

• Peer p remains in the joining state until all relevant peers know about p

• Only store items in peers in the joined state

• Handling data store concurrency– P stays in a lock state until psucc locks its range

Page 67: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Supporting Complex Multi-dimensional Queries in P2P systems

IEEE International Conference on Distributing Systems 2005

(HW)

Page 68: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

Data Indexing in Peer-to-Peer DHT Networks

ICDCS 2004

Page 69: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• Locating data using incomplete information.– How to search data in a DHT

• Data descriptors and queries– Semi-structured XML data

Page 70: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

– Query• Most specific query for d

• Relationship between queries

Page 71: Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

• Given the most specific query, finding the location of the file is simple

• How about less specific queries

• Solution– Provide query-to-query service

• For a given query q, the index service returns a list of more specific queries, covered by q

– DHT storage system must be extended• Insert(q.qi), q->qi, adds a mapping (q;qi) to the index

of the node responsible for key q.