Cristian LumezanuNeil Spring
Bobby Bhattacharjee
Decentralized Message Ordering for Publish/Subscribe Systems
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Ordering?
A B C D
P1 P2
m1< m2
Publishers
Subscribers
m2 < m1
Subscribers may observe an ambiguous order of messages
m1 m2
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Applications
Network Games Subscribers = players Messages = events in the region of the game world to which
the player belongs Common events must be seen in the same order for
consistency Messaging
Chat rooms, buddy lists Example of messages in a chat room
Alice: “Who wants to go to Sydney?”Bob: “I do”Connor: “Who wants to go to Melbourne?”Diane: “I am going”Bob goes to Sydney, Diane goes to MelbourneDiane goes to Sydney, Bob goes to Melbourne
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Naive solution
A B C D
P1 P2 Publishers
Subscribers
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Naive solution
A B C D
P1 P2 Publishers
Subscribers
SequencerNot scalable
Central point of failure
Distribute the task of ordering to many sequencers
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Our solution
A B C D
P1 P2 Publishers
Subscribers
SequencerNetwork
Scalable | Practical
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Groups
GROUP: all subscribers with the same subscription Order among messages is enforced across groups
RULE 1: A sequencer (ingress-only sequencer) is associated with each group and establishes order among all messages addressed to the group
except for…
A B C D
G0G1
E F G
m0, m1, … m0’, m1’, …
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Double Overlapped Groups
DOUBLE OVERLAPPED GROUPS: groups that have at least two subscribers in common
Receivers may make inconsistent decisions about message order when they belong to double overlaps
RULE 2: A sequencer is associated with each double overlap
A B C D
G0G1
E F G
m0, m1, … m0’, m1’, …
D: m0 < m0’ < m1 < m1’
E: m0 < m1 < m0’ < m1’
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencing scheme
SEQUENCING NETWORK A sequencer is created for each double overlap
between groups and for each group that has no double overlaps
MESSAGE TRANSMISSION Messages traverse the sequencing network and
receive sequence numbers from all sequencers associated with the destination group
MESSAGE RECEPTION Subscribers order messages unambiguously
according to the sequence numbers
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencing Network: Construction
Q0
Q1
G0 = {A, B, C}G1 = {B, C, E}G2 = {A, B, D}G3 = {B, E}
to G0 to G2 to G1
Q2
to G31. All members of the same group see the common
messages in the same order
2. All destinations can make an immediate decision of whether to deliver or buffer arriving messages
Properties
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencing Network: Operation
Q0
Q1
G0 = {A, B, C}G1 = {B, C}G2 = {A, B, D}
to G0 to G2 to G1
m0| |
When a message arrives, the receiver checks the sequence numbers assigned by the relevant sequencers and decides
whether to deliver or buffer the message
Q0 Q1
m0
m1
m2 1
2
1 2
m1| |
m2| |
m2| | 1
m0| 1 | 2
m#| Q0 | Q1
m0| 1 | m1| 2 |
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
C1: A single path must connect sequencers associated to each group
C2: The undirected sequencing graph must be loop free
Sequencing Network: Conditions
Q0
Q1 Q2
Conditions
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Loop-free sequencing network
Q0
Q1
G0 = {A, B, D}G1 = {A, B, C}G2 = {B, C, D}
to G0
to G2
to G1
Q2
Q0 Q1 Q2
m0
m1
m2
1
2
1
2
1
2
B: m0 < m1AMBIGUOUS
m0| | |
m1| | |
m1| 2 | |
m1| 2 | | 1 m2| | | m2| | 1 | m2| | 1 | 2
m0| 1 | 2 |
m0| 1 | |
m#| Q0 | Q1 | Q2
< m2< m0
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Loop-free sequencing network
Q0
Q1
G0 = {A, B, D}G1 = {A, B, C}G2 = {B, C, D}
to G0
to G2
to G1
Q2
Q0 Q1 Q2
m0
m1
m2
2
1
1
2
1
2
B: m2 < m0 < m1UNAMBIGUOUS
m0| | |
m1| | |
m2| | | m2| | 1 | m2| | 1 | 1
m0| 1 | 2 |
m0| 1 | |
m#| Q0 | Q1 | Q2
m1| 2 | | 2
m1| 2 | |
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Results
QUESTIONSWhat is the delay penalty incurred by the sequencing
network?How many sequence numbers does each message receive?
EXPERIMENT SETUP Packet-level simulator over a 10,000 node topology End-hosts arranged into similar sized clusters distributed
uniformly at random through the topology Each host belongs to zero or more groups The size of groups is generated from a Zipf distribution Sequencers are assigned to physical nodes using a heuristic that
minimizes the distance between sequencers on the same path
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Latency Stretch
Latency Stretch ratio between the time taken for a
message to traverse the sequencingnetwork and time taken using thedirect unicast path
expresses the delay penalty of anindividual node when unambiguousdelivery is required
worst case results since shortestunicast paths are rarely followed in publish/subscribe systems
How is the increase in delay distributed?
sub-linear growth
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Distribution of latency increase
The highest ratios correspond to pairs in
which sender and destination are very close to each other
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencers on a Path
How many sequence numbers a message must collect Vector timestamp approaches
Sender belongs to the destination group Append to a message information about the last message
received from all the other members of the group, for each group
O(n x g) information [n nodes, g groups] Our approach
Appends to a message information for each sequencer traversed
O(g)
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencers on a Path
The number of sequencers on a path is less than half of the total number of nodes that
participate
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Conclusions and Future Work
CONCLUSIONS Method for ordering messages in a publish/subscribe system Practical and scalable Key insight: only messages to groups with two or more
common members must be ordered
FUTURE WORK Scheme for optimizing the sequencing network and the
placement of sequencers on physical nodes Dynamic behavior Different models for group membership
Thank You!
Backup slides
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Sequencer state
State maintained by a sequencer Sequence number Group-local sequence number Forwarding table Reverse-path table Output retransmission buffer Buffer for messages from previous sequencers
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Placing sequencers
Co-locating sequencers on the same physical node1. Place on the same physical node any sequencers whose
corresponding overlaps have a subset relationship between them
2. Co-locate sequencers whose overlaps do not have a subset relationship but share at least a node
Mapping co-located sequencers (sequencing node) to physical machines
1. If no sequencing node associated with a group has been mapped, map one at random
2. If there are sequencing nodes already mapped to a physical node, pick the closest unassigned sequencing node on the path associated to the group and map it to neighboring physical nodes
Decentralized Message Ordering for Publish/Subscribe Systems Middleware 2006
Stress
Stress of a sequencing node – ratio between the number of groups for which it has to forward messages and the total number of groups