Paraskevi Raftopoulou Paraskevi Raftopoulou 1,2 1,2 and Euripides G.M. Petrakis 2 1 Max-Planck Institute for Informatics, Saarbruecken, Germany http://www.mpi-inf.mpg.de/ 2 Technical University of Crete, Chania, Greece http://www.intelligence.tuc.gr/ A Measure for Cluster Cohesion in Semantic Overlay Networks
25
Embed
Paraskevi Raftopoulou 1,2 Paraskevi Raftopoulou 1,2 and Euripides G.M. Petrakis 2 1 Max-Planck Institute for Informatics, Saarbruecken, Germany
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Paraskevi RaftopoulouParaskevi Raftopoulou1,21,2 and Euripides G.M. Petrakis2
1Max-Planck Institute for Informatics, Saarbruecken, Germany http://www.mpi-inf.mpg.de/
2 Technical University of Crete, Chania, Greece http://www.intelligence.tuc.gr/
A Measure for Cluster Cohesion in Semantic Overlay Networks
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Outline Motivation & Related work Distributed resource sharing iCluster architecture Measuring clustering quality Experimental evaluation Conclusion
2 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Motivation & Related work
3 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Motivation Resource sharing is at the core of today’s
computing (Web, P2P, Grid) Information retrieval functionality is
needed Overlay networks is a nice technology to
built on Measures are used for evaluating network
organisation and retrieval efficiency
4 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Peer rewiring
A peer p1. computes its intra-cluster similarity
(average similarity with its neighbours)2. initiates rewiring if similarity < threshold θ 3. sends a message (msg) with its interest to m
neighbours
All peers receiving msg append their interest and forward msg to m neighbours
The message is sent back to p when TTL τR= 0
13 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Query processingA peer p
1. compares q against its interests & selects the interest int most similar to q
2. if similarity ≥ threshold θ forwards a message (msg)
including q to all its neighbours with TTL τb 3. if similarity < threshold θ forwards msg to the m of
its neighbours most similar to q
All peers receiving msg do the same process The message is forwarded until TTL τf = 0
14 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Measuring clustering quality
15 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Clustering coefficient The ratio of links between the
peers within pi’s neighborhood with the number of links that could possibly exist between them
pi
ci = 1/6ci = 1/2
pipi
ci = 1ci = 0
pi
Takes values in the interval [0, 1] if ci = 1, every peer
connected to pi is also connected to every other peer within the neighborhood
If ci = 0, no peer that is connected to pi connects to any other peer connected to pi
jkikj
kj
i RIpRIppss
ppc
,,,
)1(
,
Takes into account only the immediate neighbours of the peer Takes high values when there are cliques Loses the general view of the network
16 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Clustering efficiency A new measure that
quantifies network organisation and reflects retrieval effectiveness
Based on the network organisation and on the query processing protocols
Consider that a peer pi’ s neighborhood consists of all peers by radius τb around pi
17 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Takes values in the interval [0, 1] if κi = 1, the
neighborhood of pi contains all peers similar to pi
If κi = 0, the neighborhood of pi contains none peer similar to pi
N
1kkik
N
1jjibjiGj
i
)p,p(sim:p
)p,p(sim,t)p,p(d:p
Clustering efficiency The number of peers
similar to pi that can be reached from pi within τb hops divided by the total number of similar peers
pi
ci = 0
κi = 1
Gives information about the underlying network organisation involving more than just the immediate neighbors Looks at how the network is organised at a larger scale
18 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Experimental evaluation
19 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Experimental Evaluation Used different parameters:
Data corpus Similarity threshold Query TTL Forwarding strategies
Parameter Symbol Value
peers N 2,000
short-range links s 8
long-range links l 4
similarity threshold θ 0.9
rewiring TTL τR 4
fixed forwarding TTL τf 6
broadcast TTL τb 2
message fanout m 2
OHSUMED TREC30,000 medical articles10 categories
TREC-6556,000 documents100 categories
the start of the rewiring is randomly chosen from the time interval [0, 4K]
the periodicity is randomly selectedfrom a normal distribution of 2K
20 of 25
Looked into the: Network organisation Recall
The better the network organisation is, the better the performance of retrievals should be!
The experiments are intended to: associate the performance of retrievals with the
quality of network organisation recommend the clustering measure that better
represents this association
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Experimental Evaluation
Clustering coefficient ci for different forwarding strategies
21 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Experimental Evaluation
Clustering efficiency κi for different forwarding strategies
22 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Experimental Evaluation
Retrieval
23 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Outlook
24 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Conclusion
The idea focus on IR on top of SON look at how the network is organised at a large scale