Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept. of Electronic & Computer Engineering Technical University of Crete, Greece http://www.intelligence.tuc.gr
24
Embed
Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR
Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR. Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept. of Electronic & Computer Engineering Technical University of Crete, Greece http://www.intelligence.tuc.gr. Overview. Motivation - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR
Christos Tryfonopoulos & Manolis Koubarakis
Intelligent Systems LabDept. of Electronic & Computer EngineeringTechnical University of Crete, Greece
http://www.intelligence.tuc.gr
Overview
Motivation Distributed resource sharing The DHTrie protocols Local filtering algorithms Conclusions
Motivation
Resource sharing is at the core of today’s computing (Web, P2P, Grid).
One-time as well as continuous querying functionality is needed.
Data models and languages based on Information Retrieval are useful for annotating and querying resources.
Many nice technologies to build on (e.g., overlay networks, agents etc.)
Subscribing with a continuous query Assume query q of the form:
Then for a random attribute Ai and a random word wj contained in either si or wpi , we create the string Aiwj and use it as the key to forward the query to peer with ID = H(Aiwj).
The DHTrie protocols
1 1 1 1( ) ... ( ) ( ) ... ( )m m m m n nA s A s A wp A wp õ õ
The DHTrie protocols (cont’d) Publishing a resource
Assume a publication p of the form:
Obtain a list of peer IDs by hashing string Aiwj for all words, and all attributes in p (necessary to ensure correctness). Use indirect message passing and the DHT infrastructure to forward the message.
The receiver node, contacts neighbors included in the recipients list, removes them from it and forwards the message.
1 1 2 2{( , ), ( , ),..., ( , )}m mA s A s A s
Traditional way to handle a message forwarding to more than one recipients.
Send a lookup() message for each recipient. For k recipients we need O(k log(N)) lookup
messages. Multicast techniques not applicable, since group
of peers to be contacted is not known a priori.
Direct message passing
Incorporate recipient list into message Avoid asking the same routing question more than
once Opportunistic forwarding
Increase in message size due to: publication size
process publication (remove stopwords, stemming) use inverted (and compressed) index
receipient list size use gap compression (avoid peer IDs)
Indirect message passing
The DHTrie protocols
Notifying interested subscribers To find all matching queries in a peer, we use filtering
algorithm BestFitTrie.
[Tryfonopoulos, Koubarakis, Drougas, SIGIR 2004]
Once all matching queries are found, a notification message is created and forwarded to peers using indirect message passing.
Filtering algorithms at each super-peer Query clustering algorithm BestFitTrie Data structure is a hash table of tries Hash table is used for fast access to trie roots We search for the best place to store query q, in
two phases:1. Best position trie-wise2. Best position forest-wise
Matching procedure examines only tries with roots contained in the incoming document
Filtering algorithms at each super-peer
PrefixTrie: Prefix-based clustering (handle a queryas a sequence of words)
BestFitTrie: Set-based clustering (handle a queryas a set of words)
Filtering algorithms at each super-peer
Filtering algorithms at each super-peer BestFitTrie 1M
PrefixTrie 1M
BestFitTrie 3M
PrefixTrie 3M
Other interesting issues
Load balancing Frequency of occurrence of words may overload
certain peers. Index queries under infrequent words. Use controlled replication.
Word frequency computation Also useful in other types of queries (VSM). Global vs Local ranking schemes. Propose a hybrid ranking scheme, with updating and