April 14, 2009, Arizona State University Committee: Andrea W. Richa (Ch Goran Konje Rida Bazzi Christian Scheideler Overlay Network Construction in Highly Decentralized Networks Melih Onus PhD Thesis Defense
Dec 30, 2015
April 14, 2009, Arizona State University
Committee: Andrea W. Richa (Chair) Goran Konjevod Rida Bazzi Christian Scheideler
Overlay Network Construction in Highly Decentralized Networks
Melih OnusPhD Thesis Defense
Publish/Subscribe (Pub/Sub)
N1
Subscription(N1)={B,C,D}N2
{A,B,C,E,}
N3
{A,D}
N4
{A,B,X}
N5
{A,X}Message BusMessage Bus
Publish(M1, A)
M1
M1
M1
Scalability of Pub/Sub
Most traditional pub/sub systems are geared towards small scale deployment– E.g., Isis MDS, TIB, MQSeries, Gryphon
New generation of applications…– Large data centers: Amazon, Google, Yahoo, EBay,…– RSS, feed/news readers, on-line stock trading and banking– Web 2.0, Second Life
…drive dramatic growth in scale– 10,000s of nodes, 1000s of topics, Internet-wide distribution
Emerging systems address this trend using P2P techniques
Overlay-Based Pub/Sub
N1
{B,C,D}N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
(M1,
A)
(M1, A)
(M1, A)
(M1, A
)(M1, A)
•SCRIBE•Corona •Feedtree •Sub-2-Sub •TERA•...
Relay
Overlay Topologies for Pub/Sub
“Good” overlay will allow for efficient and simple publication routing– Small routing tables, low load on relays, – low latency
Ideally, overlay is topic-connected: i.e., one connected component for each topic-induced sub-graph– Most existing implementations construct topic-connected
overlays
Topic-Connectivity
Topics B,C,X,E are connected
Topics A and D are disconnected
N1
{B,C,D}N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
Topic-Connectivity: Simple Solution
N1
{B,C,D}N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
Node degree grows linearly with the subscription size Roughly twice as big as the subscription size for
rings/trees
Scalability of the Simple Solution
Negative impact on performance due to– CPU load: neighbor monitoring, message processing– Connection maintenance and header overhead– Memory overhead: per-link state associated with routing
and/or compression schemes being used, etc.
Scalability barrier for large systems offering a wide range of subscription choices
Can we do better?Can we do better?
Outline
Minimum Maximum Degree Publish-Subscribe Overlay Network Design
Parameterized Maximum and Average Degrees in Publish-Subscribe Overlay Network Design
Constant Diameter Publish-Subscribe Overlay Network Design
The MinMax-TCO Problem
Minimum Maximum Degree Topic-Connected Overlay (MinMax-TCO) problem:– For a set of nodes V, set of topics T, and Interest: V T
{true, false}– Construct a topic-connected overlay G with the minimum
possible maximum degree TCO (decision version):
– Decide whether there is a topic-connected overlay with maximum degree k (for a given k)
GM Algorithm
The GM algorithm can have maximum degree of (n), when constant maximum degree overlay network exists.
Complexity of MinMax-TCO
Lemma: MinMax-TCO(V,T,Interest,k)NPProof: Topic connectivity is verifyable in polynomial time
Lemma: MinMax-TCO(V,T,Interest,k) is NP-hardProof: 1. Define an auxiliary problem Single Node TCO (SN-TCO)
which is to decide if there is a topic-connected overlay in which the degree of single given node d
2. Set Cover is polynomially reducible to SN-TCO3. SN-TCO is polynomially reducible to TCO
Theorem: MinMax-TCO is NP-complete
Approximating MinMax-TCO
The idea: exploiting subscription overlaps– Connecting the nodes with overlapping interests improves
connectivity of several topics at once Overlay Design Algorithm (ODA):
– Start from a singleton connected component for each (v, t) V T
– At each iteration: add an edge that reduces the number of connected components for the biggest number of topics among the ones which increase maximum degree minimally
– Stop, once there is a single connected component for each topic
Overlay Design Algorithm
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
Topic # of conn. comps
A 4
B 3
C 2
D 2
X 2
E 1
Overlay Design Algorithm
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
Topic # of conn. comps
A 3
B 2
C 2
D 2
X 2
E 1
Overlay Design Algorithm
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
Topic # of conn. comps
A 3
B 2
C 2
D 1
X 2
E 1
Overlay Design Algorithm
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
Topic # of conn. comps
A 3
B 1
C 1
D 1
X 2
E 1
Overlay Design Algorithm
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
Topic # of conn. comps
A 2
B 1
C 1
D 1
X 1
E 1
Overlay Design Algorithm
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
Topic # of conn. comps
A 1
B 1
C 1
D 1
X 1
E 1
Maximum degree of 2 vs. almost 4 for ring-per-topic!
ODA Running Time
O(|V|4|T|)– At most |V|2 iterations – At most |V|2 edges inspected at each iteration– At most |T| steps to inspect an edge
Can be optimized to run in O(|V|2 |T|)– For each e V V, weight(e) = the number of connected
components merged by e– At each iteration, output the heaviest edge and adjust the other
edge weights accordingly– Stop once there are no more edges with weight > 0
Approximability Results
Lemma: The number of edges in the overlay constructed by GM log(|V||T|) OPT
Proof: Similar to that of the approximation ratio of the greedy algorithm for Set Cover
Uses Maximum Weighted Matching Uses Edge Coloring
Theorem: No algorithm can approximate MinMax-TCO within a constant factor (unless P=NP)
Proof: Existence of such an algorithm would imply existence of the constant factor approximation for Set Cover which is known to be impossible (unless P=NP)
Outline
Minimum Maximum Degree Publish-Subscribe Overlay Network Design
Parameterized Maximum and Average Degrees in Publish-Subscribe Overlay Network Design
Constant Diameter Publish-Subscribe Overlay Network Design
ODA Algorithm
The ODA algorithm can have average degree of (n), when constant average degree overlay network exists.
vn-1
v1
v2
v3
vn
…
v1
v2
v3
vn
vn-1
… …
v3
vn-1
v1
v2
vn
ODA and GM Algorithms
GM Algorithm: Choose edge with maximum benefit– Average Degree: O(log nt) approximation– Maximum Degree: O(n) approximation
ODA Algorithm: Choose edge with maximum benefit among the ones that increases maximum degree minimally– Average Degree: O(n) approximation– Maximum Degree: O(log nt) approximation
How to approximate both average and maximum degree?
Parameterized Algorithm
e1: Edge with maximum benefit
e2: Edge with maximum benefit among the ones that increases maximum degree minimally
If w(e2) > w(e1) / k, choose e2
Otherwise, choose e1 1 < k < n
Algorithms
GM Algorithm: – Average Degree: O(log nt) approximation– Maximum Degree: O(n) approximation
ODA Algorithm: – Average Degree: O(n) approximation– Maximum Degree: O(log nt) approximation
P-ODA Algorithm:– Average Degree: O(k * log nt) approximation– Maximum Degree: O((n/k)*log nt) approximation
Outline
Minimum Maximum Degree Publish-Subscribe Overlay Network Design
Parameterized Maximum and Average Degrees in Publish-Subscribe Overlay Network Design
Constant Diameter Publish-Subscribe Overlay Network Design
Constant Diameter Overlays
Constant Diameter Topic-Connected Overlay (CD-TCO) problem:– For a set of nodes V, set of topics T, and Interest: V T
{true, false}– Construct a topic-connected, constant diameter overlay G
with the minimum possible average degree
The GM algorithm can have diameter of (n), where n is number of nodes in the pub/sub system.
Constant Diameter Overlay Algorithm
Constant Diameter Overlay Design Algorithm:– At each iteration:
• Find number of neighbors for each node• Add a star which connects maximum number of nodes, • Remove topics which are connected by the star
– Stop, once there is a single connected component for each topic
Number of neighbors of node u:
Constant Diameter Overlay Algorithm I
Constant Diameter Overlay Design Algorithm I:– At each iteration:
• Find weight for each node• Add a star which connects the node with maximum weight, • Remove topics which are connected by the star
– Stop, once there is a single connected component for each topic
Weight of node u:
Constant Diameter Overlay Algorithm II
Constant Diameter Overlay Design Algorithm II:– At each iteration:
• Find number of neighbors for each node• Add a star which connects the node with maximum density, • Remove topics which are connected by the star
– Stop, once there is a single connected component for each topic
Density of node u:
Experimental Results I
Average Node DegreeVarying #nodes#topics: 100#subscription: 10Uniform distribution
Only 2.3 times more edge
Experimental Results II
Average Node DegreeVarying #topics#nodes: 100#subscription: 20Uniform distribution
Only 1.9 times more edge
Experimental Results III
Average Node DegreeVarying #subscription#nodes: 100#topics: 100Uniform distribution
Only 1.8 times more edge
Conclusions
Formal study of the problem of designing efficient and scalable overlay topologies for pub/sub
Defined the problem (MinMax-TCO) capturing the cost of constructing topic-connected overlays– NP-Completeness, polynomial approximation,
inapproximability results
Empirical evaluation showed effectiveness of our approximation algorithm on practical inputs
Parameterized algorithm with low maximum and average degree
Defined the problem (CD-TCO), empirical results
Future Directions
Study dynamic case Investigate other overlay design problems Study distributed case
– Partial knowledge of other node interest– Dynamically changing interest assignments
Proving diameter results theoretically
Publications Parameterized Maximum and Average Degrees in Topic-based Publish-Subscribe Overlay
Network Design, M. Onus and A. W. Richa,Submitted to 21st Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), August 2009.
Minimum Maximum Degree Publish-Subscribe Overlay Network Design, M. Onus and A. W. Richa, 28th Annual IEEE Conference on Computer Communications (INFOCOM), April 2009, Rio De Janeiro, Brazil.
Distributed Coloring with O(log n) bits, K. Kothapalli, M. Onus, C. Scheideler and C. Schindelhauer, To appear in Journal of Parallel and Distributed Computing (JPDC), 2008.
Linearization: Locally Self Stabilizing Sorting in Graphs, M. Onus, A. W. Richa, C. Scheideler, Workshop on Algorithm Engineering & Experiments (ALENEX), January 2007, New Orleans, Louisiana.
A Scalable Multilevel Algorithm for Community Structure Detection, H. Djidjev and M. Onus, 4th Workshop on Algorithms and Models for the Web-Graph (WAW), November 2006, Banff, Alberta.
Heuristics for Minimum Brauer Chain Problem, F.Gelgi and M.Onus, 21st International Symposium on Computer and Information Sciences (ISCIS), Springer LNCS 4263, November 2006, Istanbul, Turkey.
Distributed Coloring with O(log n) bits, K. Kothapalli, C. Scheideler, M. Onus and C. Schindelhauer, 20th IEEE Parallel & Distributed Processing Symposium (IPDPS), April 2006, Rhodes Island, Greece.
Efficient Broadcasting and Gathering in Wireless Ad-Hoc Networks, M. Onus, A. W. Richa, K. Kothapalli and C. Scheideler.International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN), December 2005, Las Vegas, Nevada.
Constant Density Spanners for Wireless Ad-Hoc Networks, K. Kothapalli, C. Scheideler, M. Onus and A. W. Richa. 17th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), July 2005, Las Vegas, Nevada.