Top Banner
A Hybrid Multicast-Unicast Infrastructure for Efficient Publish- Subscribe in Enterprise Networks Danny Bickson, Ezra N. Hoch, Nir Naaman and Yoav Tock IBM Haifa Research Lab, Israel
22

A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in Enterprise Networks

Dec 31, 2015

Download

Documents

chastity-battle

A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in Enterprise Networks. Danny Bickson, Ezra N. Hoch, Nir Naaman and Yoav Tock IBM Haifa Research Lab, Israel. Outline. Motivation The channelization problem Our hybrid approach Experimental results Conclusions. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in

Enterprise Networks

Danny Bickson, Ezra N. Hoch, Nir Naaman and Yoav Tock

IBM Haifa Research Lab, Israel

Page 2: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

2

Outline

Motivation The channelization problem Our hybrid approach Experimental results Conclusions

Page 3: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

3

Motivation: large scale publish subscribe application

Large number of information flows (topics) and subscribers

Each flow must be delivered to a subset of interested subscribers

Example: financial market data dissemination

Publisher divides data feed into a large number information flows, (~100K) e.g. stock symbols, futures, commodities

Many stand-alone subscribers (~1K) Subscribers display interest heterogeneity -

are interested in different yet overlapping subsets of the topics

Any single topic may be delivered to a large number of subscribers (hot / cold topics)

Subscribers

Publisher

Data VendorWAN

Enterprise LAN

Multiple information flows (Topics)

Page 4: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

4

Common approaches

Use unicast (point-to-point) connections Limitations: poor utilization of network resources (duplicate

transmissions) Use broadcast (single multicast channel)

Limitations: receivers filter unwanted content Utilize multicast to transmit data

Topics are mapped into multicast groups. Each user joins the groups that cover his topic-interest.

Reduces receiver filtering Limitations: limited amount of multicast addresses

Network element state problem Receiver resources (NICs)

Page 5: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

5

Our novel contribution

Create a hybrid approach that combines both multicast and unicast Flexible allocation of transmissions Topics with high interest enjoy efficiency of multicast Topics with low interest are transmitted in unicast

Formalize as an optimization problem Propose a two step alternating method for computing the resource

allocation

Page 6: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

6

The Channelization Problem

n flows Flow rates λ k multicast groups m users Interest matrix W

The task: find mapping matrices X,Y that minimizes the communication cost

The cost of transmission – take into account transmission to multiple groups

The cost of reception – minimize excess filtering

Page 7: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

7

The Hybrid Channelization Problem

F1

F2

Fn

F3

G1

G2

Gk

U1

U2

Um

U3

Flows

Users

Multicast Groups

F1 F2

F1 F2 F8

F3 F4 F6

F1 Fn

InterestExtraction (W)

F4

X – flow to group map

Y – user subscription map

T – unicast transmission map

Page 8: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

8

The Hybrid Channelization Problem

Modified cost function

Problem objective is

Cost of multicast reception

Cost of multicast transmission

Cost of unicast reception & transmission

Page 9: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

9

Proposed Solution

Unfortunately the hybrid problem is NP-hard We propose a two step heuristic solution

First step: solve the channelization problem (multicast mapping) Second step:

Choose flow-user pairs for unicast, Remove redundant assignments from multicast mapping Recalculate the cost

Iterate until convergence, or unicast BW limit exceeded

Page 10: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

10

First step: channelization problem solution

We have experimented with the following algorithms

K-Means (2005) performs best

Page 11: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

11

K-Means Mapping Algorithm

Input Interest matrix, topic rate vector

Basic insight Put “similar” topics in the same group “Similar” topics have a similar audience -

causes less filtering

Take the rate into account

Iterative Clustering Algorithm (K-means) Init: Topics are assigned into a fixed number of groups Move: In each step, remove a single topic, and move it to

the best group – the one producing the lowest cost Cost: After each epoch, compute total filtering cost Stop: cost doesn’t improve | time elapsed | max # iter.

T1

T2

T3 T4

T5

T6

T7

T8

T9

T5

?

?

?

v x x x x

x v v x xUsers

Topics

x x v v v

User’s Interest Vector

Topic’sAudience Vector

Interest Matrix =

R1 R2 … RKRate Vector =

Page 12: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

12

Second step: choosing user-flow pairs for unicast

Experimented with several heuristics Heavy users - all transmission to a specific heavy user is sent using

unicast Lightweight flows - flows with low bandwidth are sent using unicast Greedy flows - move to unicast the flow which best minimizes the

total cost Greedy users - move to unicast the user which best minimizes the

total cost An additional heuristic - Greedy user-flow pairs – move to unicast

the user-flow pair which best minimizes the total cost - very slow, impractical run-time

Page 13: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

13

Experimental results

Construction of user-interest matrix W Random, uniform Market distribution – based on a model of NYSE stock volume IBM WebSphere cell – a real system

Page 14: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

14

Channelization algorithms

K-Means (2005) performs best

Takes rate into account Gradient decent on the

true cost function

Page 15: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

15

Effect of the interest matrix on channelization performance

The interest and rate have a significant effect on channelization performance

Some interests have patterns that are easy to “channelize”

Interests with less entropy, more order, are easier

Page 16: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

16

Hybrid Algorithm Heuristics

Market dist. - Greedy users

Can use more unicast BW

WebSphere dist. - Greedy flows

Doesn’t need more than 20% unicast BW

Unicast BW limit – algorithm will use optimal amount up to the limit

Page 17: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

17

Hybrid using greedy flow – unicast / multicast tradeoff

Unicast BW allocation – exact amount of unicast BW used

Every interest and rate distribution has an optimal amount of unicast BW it can use

The hybrid approach improves upon both unicast-only and multicat-only

Page 18: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

18

Conclusions

We have presented a novel hybrid approach for publish subscribe We have shown using extensive and realistic simulation results that our

approach reduces consumed network and host resources K-Means (2005) performs best for channelization, from the selection of

algorithms we tested Greedy hybrid heuristics performed best in our tests Relative competitiveness of the greedy-flows & greedy-users heuristics

depends on the structure of the interest matrix and rate

~ The End ~

Page 19: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

19

Model based on statistical analysis of NYSE daily trade data

20K Topics 500 Subscribers Avg. ~70 flows / user Min 15 flows / user Max 115 flows / user Avg. message fan out

~10.1 clients

Multicast - message is transmitted once

Unicast transmitter data rate is x10 of multicast !

Real Life Messaging Load Model

Backup – Model

Page 20: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

20

Messaging Load Model – Based on Market Research Financial front office

Hundreds of users, requiring stock quotes and financial information from several markets

Topic space structureWithin each market, symbol popularity and

rate are exponentially distributed (NYSE market research)Several different markets, with Avg.

popularity and size prop. ~1/m (assumption).20K flows, 10 markets, 500 users

User interestEach user: selects some markets, selects a

percent of the symbols from each chosen market, according to the said distributions

0 1000 2000 3000 400010

0

101

102

103

104

105

NYSE daily trade

Symbol rank

Num

ber

of t

rade

s

Daily trade, July 7 2004Expo. fitDaily trade min/max in July

0 0.5 1 1.5 2

x 104

0

5

10

15

20

Symbols, by Market and Rank

Msg

/Sec

Avg. Message Rate

Market 1

Market 10 Market 2

~10% of Symbols~55% of trade

Backup – Model

Page 21: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

21

Mapping Algorithm Input

interest matrix, topic rate vector Basic insight

Put “similar” topics in the same group

“Similar” topics have a similar audience

A group with a homogenous audience causes less filtering

Take the rate into account The cost of putting two topics in

the same group The cost of adding a new topic

to a group of topics

v x x x xx v v x xUsers

Topics

x x v v v

Interest Matrix

Topics with identical audience

Topics with similar audience

v xv vx vx x

Users R20R10

Topics

1 2

1

23

4

R1+ R2

Filtering Cost

Rk – the rate of topic k

Backup – Algorithm

Page 22: A Hybrid Multicast-Unicast Infrastructure for Efficient Publish-Subscribe in  Enterprise Networks

IBM Haifa Research Lab

22

Iterative Clustering Algorithm (K-means) Init: Topics are assigned into a fixed number of groups Move: In each step, remove a single topic, and move it

to the best group – the one producing the lowest cost Cost: After each epoch, compute total filtering cost Stop: time elapsed | cost does not improve | exceeded

max number of iterations

Topic group

vvvxxx

vxvvxx

vvvxvx

xvvxxx

1 2 3

Users

vvvvxx

Groupaudience vector

Candidatetopic 5

R1+R2+R3

0

R5

0

R1+R2+R3+R5

The cost of adding topic 5 to topic group {1,2,3}

00

The best group for topic K

is the group

with the lowest cost

T1

T2

T3T4

T5

T6

T7

T8

T9

T5

?

?

?

Backup – Algorithm