Top Banner
Routing Indices For P- to-P Systems ICDCS 2002
53

Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Jan 01, 2016

Download

Documents

Piers Richard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Routing Indices For P-to-P Systems

ICDCS 2002

Page 2: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Introduction• Search in a P2P system

– Mechanisms without an index– Mechanisms with specialized index nodes (cent

ralized search)– Mechanisms with indices at each node

• Structure P2P network• Unstructure P2P network

• Parallel v.s. sequentially search– Response time– Network traffic

Page 3: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Routing indices(RI)• Query

– Documents are on zero or more “topics”, and queries request documents on particular topics.

– Documents topics are independent

• Local index• RI

– Each node has a local routing index which contains following information

• The number of documents along each path• The number of documents on each topic of interest

– Allow a node to select the “best” neighbors to send a query to

Page 4: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

• The RI may be “coarser” than the local indices – overcounts– Undercounts

Page 5: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

• Goodness measure– Number of results in a path

• Using Routing indices

Page 6: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

– Storage space• N: number of nodes in the P2P network

• b: branching factor

• c: number of categories

• s: counter size in bytes

Centralized index : s*( c+1) *N

Distributed system: s*(c+1)*b (each node)

Page 7: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

• Creating routing indices

Page 8: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

• Maintaining Routing Indices– Trade off between RI freshness and update cost– No requiring the participation of a

disconnecting node

• Discussion– If the search topics is dependent?– Can the number of “hops” necessary to reach a

document be estimated?

Page 9: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Alternative Routing Indices

• Hop-count RI– Aggregated RIs for each “hop” up to a maximu

m number of hops are stored

Page 10: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

– Search cost• Number of messages

– The goodness of a neighbor• The ratio between the number of documents availabl

e through that neighbor and the number of messages required to get those documents

– Regular tree with fanout F

– It takes Fh messages to find all documents at hop h

– Storage cost?

Page 11: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

• Exponentially aggregated RI– Store the result of applying the regular-tree cost

formula to a hop-count RI

– How to compute the goodness of a path for the query containing several topics?

Page 12: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Cycles in the P2P network (HW)

Page 13: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Improving Search in Peer-to-Peer Networks

ICDCS 2002

Beverly YangHector Garcia-Molina

Page 14: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Outline

• Introduction

• Techniques

• Experiment

Page 15: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Introduction

• We present three techniques for efficient search in P2P systems.– Basic idea is to reduce the number of nodes that

process a query

Page 16: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Current Techniques

• Gnutella– BFS with depth limit D.– Waste bandwidth and processing resources

• Freenet– DFS with depth limit D.– Poor response time.

Page 17: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Iterative Deepening

• Under policy P= { a, b, c} ;waiting time W

• See example.

Page 18: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Directed BFS

• A source send query messages to just a subset of its neighbors

• A node maintains simple statistics on its neighbors– Number of results received from each neighbor– Latency of connection

Page 19: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Candidate nodes

• Returned the Highest number of results

• Low hop-count

• High messages

Page 20: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Local Indices

• Each node n maintains an index over the data of all nodes within r hops radius.

• All nodes at depths not listed in the policy simply forward the query.

• Example: policy P= { 1, 5}

Page 21: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Experimental Setup

• For each response ,we log:– Number of hops took– IP from which the Response message came– Response time– Individual results

Page 22: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Experimental result

Page 23: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Efficient Content Location Using Interest-Based Locality in Peer-to-

Peer SystemsKunwadee Sripanidkulchai

Bruce Maggs

Hui Zhang

IEEE INFOCOM 2003

Page 24: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

motivation

• Although flooding is simple and robust, it is not scalable.

• A content location solution in which peers organized into an interest-based structure on top of Gnutella.

• The algorithm is called interest-based shortcuts

Page 25: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Interest-based locality

Page 26: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Shortcuts Architecture and Design Goals

• To create additional links on top of a peer-to-peer system’s overlay

• As a separate performance enhancement layer on top of existing content location mechanisms

Page 27: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Content location paths

Page 28: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Shortcut Discovery

• The first lookup returns a set of peers that store the content

• These are potential candidates.

• One peer is selected at random from the set and added

• For scalability, each peer allocates a fixed-size amount of storage to implement shortcuts.

Page 29: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Shortcut selection

• We rank shortcuts based on their perceived utility

• A peer sequentially asking all of the shortcuts on its list.

Page 30: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Ranking metrics

• Probability of providing content

• Latency of the path to the shortcut

• Load at the shortcut

• A combination of metrics can be used based on each peer’s preference

Page 31: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Performance indices

• Success rate

• Load characteristics

• Query scope

• Minimum reply path lengths

• Additional state

Page 32: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Potential and Limitations

• Adding 5 shortcuts at a time produces success rates that are close to the best possible.

• Slightly increase the shortest path length from 1 to 2 hops will perform better success rate.

Page 33: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Conclusion

• A simple and practical mechanism was proposed.

Page 34: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Similarity Discovery in structured P2P Overlays

ICPP

Page 35: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Introduction• Structured P2P network

– Only support search with a single keyword

• Similarity between two documents– Keyword sets– Vector space– Measure

• Problems– Search problem– New keyword?

||||cos 1

ba

baab

Page 36: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Meteorograph

• Absolute angle

Page 37: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Publishing and Searching

• Publish– Hash

– Publish the item to a node np with the hash key closest to hash value

Page 38: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

• Search problem– Nearest answers– K_nearest answers–

• Partial

• Comprehensive

• Search strategy

• Discussions

• What happened when keyword vector is represented by ?

Page 39: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Other issues

• Load balance

• Changes of vector space– Republished?– Comprehensive set of keywords– Other methods?

Page 40: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

SWAM: A Family of Access Methods for Similarity-Search in

Peer-to-Peer Data NetworksFarnoush Banaei-KashaniCyrus Shahabi

(CIKM04)

Page 41: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

PDN access method

• Defines

• How to organize the PDN topology to an index-like structure

• How to use the index structure

Page 42: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Hilbert space

• Hilbert space (V, Lp)• Key k = (a1,a2, … , ad)

– d: the dimension of a Vector space– The domain is a contiguous and finite interval o

f R

• The Lp norm with p belongs to Z+– The distance function to measure the dissimilari

ty

Page 43: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.
Page 44: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Topology

• Topology of a PDN can be modelled as a directed graph G(N, E)

• A(n) is the set of neighbors for node n

• A node maintains– A limited amount of information about its neigh

bors Includes • the key of the tuples maintained at neighbors

• The physical addresses of neighbors

Page 45: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

• The processing of the query is completed when all expected tuples in the relevant result set are visited

• Access methods– Join, leave for virtual nodes– Forward for using local information to process

queries and make forwarding decisions

Page 46: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

The small world example

• Grid component

• Random graph component

• The process of queries (exact, range, kNN) in the highly locality topology

Page 47: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.
Page 48: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Flat partitioning

• SWAM also employs the space partitioning idea: flat partitioning

Page 49: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Query Processing

• Exact-Match query processing

• Range query processing

• kNN Query processing

Page 50: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Data Indexing in Peer-to-Peer DHT Networks

ICDCS 2004

Page 51: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

• Locating data using incomplete information.– How to search data in a DHT

• Data descriptors and queries– Semi-structured XML data

Page 52: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

– Query• Most specific query for d

• Relationship between queries

Page 53: Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

• Given the most specific query, finding the location of the file is simple

• How about less specific queries

• Solution– Provide query-to-query service

• For a given query q, the index service returns a list of more specific queries, covered by q

– DHT storage system must be extended• Insert(q.qi), q->qi, adds a mapping (q;qi) to the index

of the node responsible for key q.