P2PIR'06: "Distributed Cache Table (DCT)" Gleb Skobeltsyn , Karl Aberer D D istributed istributed T T able: able: Efficient Query-Driven Efficient Query-Driven Processing of Multi-Term Processing of Multi-Term Queries in P2P Networks Queries in P2P Networks Cache Cache Hash Hash P2PIR’2006, collocated with CIKM’06, Arlington P2PIR’2006, collocated with CIKM’06, Arlington VA, USA VA, USA Gleb Skobeltsyn , Karl Aberer Nov 11, 2006 EPFL Ecole Polytechnique Fédérale de Lausanne, Switzerland
P2PIR’2006, collocated with CIKM’06, Arlington VA, USA. Cache. Hash. D istributed T able: Efficient Query-Driven Processing of Multi-Term Queries in P2P Networks. Nov 11, 2006. EPFL Ecole Polytechnique Fédérale de Lausanne, Switzerland. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
P2PIR'06: "Distributed Cache Table (DCT)" Gleb Skobeltsyn, Karl Aberer
DDistributed istributed TTable: able:
Efficient Query-Driven Efficient Query-Driven Processing of Multi-Term Queries Processing of Multi-Term Queries in P2P Networksin P2P Networks
CachCacheeHashHash
P2PIR’2006, collocated with CIKM’06, Arlington VA, USAP2PIR’2006, collocated with CIKM’06, Arlington VA, USA
Gleb Skobeltsyn, Karl Aberer
Nov 11, 2006
EPFL Ecole Polytechnique Fédérale de Lausanne, Switzerland
How the naïve approach works (1)?How the naïve approach works (1)?
• Naïve approach 1: store terms’ Inverted Lists in a DHT• An inverted lists contains document ids.
K I
K I
K I
K I
K I
K I
K I
K I
Query: “T1 AND T2”
{I1,I2}
{I2}
(h(T1), {I1,I2})
(h(T2), {I2,I3})(h(T3), {I4,I5})
K I
This slide was borrowed from B. T. Loo, J. M. Hellerstein, R. Huebsch, S. Shenker, I. Stoica presentation: Enhancing P2P File-Sharing with an Internet-Scale Query Processor
• Meta-index is based on the standard DHT indexing functionality.
• Index update: If a peer π caches a query q, it advertise the cache availability in the meta-index:
It inserts a tuple {q-> address(π)} at the peer responsible for a random term from q.
• Lookup: If a query q=t1&t2&…&tn is submitted, every peer responsible for t1,t2…tn is asked to provide a set of caches it indexes that subsume q. One of them (if any) is chosen randomly.
• Each peer provides some storage space s0 for caches
• Caches with low profits are evicted:
profit(q)=popularity(q) / (|RS(q)|+1)
• Every time a peer has to broadcast a query, it tries to cache it
• The query q with the result set size |RS(q)| is cached if:– There is enough free space to store |RS(q)|,– There is NOT enough free space but the least
• Distributed Cache Table: a (quite) large scale distributed cache for P2P IR applications based on both:– Query load– Data distribution
• Properties:– Efficiently utilizes and adapts to the available storage space – Trade off between huge index size and extra traffic costs
for broadcasting rare queries– Subsumption is important: resilient to query load changes– Sufficiently load balanced– Requires 1-2 orders of magnitude less traffic than the naive
approach– Requires substantially less storage then per-term index