Top Banner

Click here to load reader

of 21

Searching in Unstructured Networks

Jan 19, 2016

ReportDownload

Documents

macy

Searching in Unstructured Networks. Joining Theory with P-P2P. What is P2P?. Distributed system where all nodes play an equal role. In the “Internet world” homogeneity cannot be guaranteed (ie. Bandwidth, storage, processing power). - PowerPoint PPT Presentation

  • Searching in Unstructured NetworksJoining Theory with P-P2P

  • What is P2P?Distributed system where all nodes play an equal role.In the Internet world homogeneity cannot be guaranteed (ie. Bandwidth, storage, processing power).Any such system requires an indexing, or searching scheme in order to function.

  • P2P ArchitecturesCentralized (Napster)

    Decentralized with Structure (all DHT architectures such as CAN, Chord, etc)

    Decentralized and Unstructured (Gnutella, Freenet). I call these...

  • Popular Peer to Peer (P-P2P)No centralized directory.No centralized control over object placement.No centralized control over network topology.Only most loose of guarantees and weakest assumptions can be made.

  • Current Search StrategiesFlooding (BFS)Reaches the most nodes (w/in depth D)Returns complete resultsFastest brute-force searchDoes not scaleCrawl (DFS)Successful searches terminateMinimal resources

    Poor response time: time increases exponential in D

  • OutlineCurrent Solutions

    A better approach by example

    A good first step

    Final thoughts

  • Current SolutionsAn Heuristic Approach to Heuristics

  • Directed BFS (DBFS)Keep state on search results returned from neighbours.Send query only to those neighbours with most successful history.Subset of neighbours revert to standard BFS (flooding).Intelligent selection of 1 neighbour produces similar success as BFS.

  • Iterative DeepeningGlobal policy, P={a, b, c,,W}AI technique called search over state space.Multiple BFSs with successively larger D.Requires globally unique identifierReduces processing?Increases bandwidth?No analysis or intuition: wheres the beef?

  • Hierarchical ApproachNodes with higher bandwidth and processing are designated super-nodesIn a graph, these nodes would have higher weight (or size)Connected via 2 types of edges:large weight edges to other super-nodessmaller weight edges to smaller nodes.Founded on experience (eg. DNS system)

  • Lead by Example:Replication in unstructured networks

  • Current Replication StrategiesOwner Replication: objects replicated by nodes requesting object

    Path Replication: object replicated at every node along path from object origin to object request.

  • The Problem

    In an unstructured network, how many copies of each object should there be in order to minimize the cost of a search(assuming fixed storage)?

  • First start with the simplest casem objects, n nodesEach object, i, replicated at ri random sites.

    Object i requested with rate qi ,

    Probability of successful search in k probes:

  • The simple case (cont)Average search size of i:

    The average messaging overhead during a query can be represented by, A, the average search size over all objects:

  • Replication StrategiesAssume average replicas per site

    Uniform:

    Proportional:

  • Square-root ReplicationPut as a minimization problem, what is the optimal allocation of replicas so that A is minimized [Kleinrock, 1976]?This occurs when

    So,

  • Searching in Unstructured Networks: Revisited

  • Random WalksA query, at every node, is forwarded to a randomly chosen neighbour until the query succeeds.A strong body of knowledge exists:understand and quantify resource trade-offs? (ie. How many walkers? How many hops?)gain insight into the merits of tweakshelps us avoid guessing, and trial & error

  • GossipShe tells 2 friends, and they tell 2 friends, and [VO5]randomly select k neighbours to whom queries are forwardedA young and relatively unexplored solution that (as yet) lacks significant analysis and understandingperhaps this is the perfect opportunity

  • Final ThoughtsDHT cannot succeed P-P2P applications without large breakthroughs in hashing (or ad-hoc solutions to search problems).P-P2P inherently suffers from unscalable or unfinished searches.An unstructured P2P network is still a graph; capitalise on prior work & knowledge.

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.