Top Banner
Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM Research Haifa HotNets, October 5, 2008
46

Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Mar 27, 2015

Download

Documents

John Cowan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Dr. Multicast    for Data Center Communication Scalability

Ymir Vigfusson   Hussam Abu-Libdeh   Mahesh Balakrishnan   Ken BirmanCornell University

Yoav TockIBM Research Haifa

HotNets, October 5, 2008

Page 2: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

IP Multicast in Data Centers

• IPMC is not used in data centers

Page 3: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

IP Multicast in Data Centers

• IPMC is not used in data centers• Would speed up products that use multicast

Page 4: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

IP Multicast in Data Centers

• Why is IP multicast rarely used?

Page 5: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

IP Multicast in Data Centers

• Why is IP multicast rarely used?o Limited IPMC scalability on switches/routers and

NICs

Page 6: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

IP Multicast in Data Centers

• Why is IP multicast rarely used?o Limited IPMC scalability on switches/routers and

NICso Broadcast storms: Loss triggers a horde of

NACKs, which triggers more loss, etc. o Disruptive even to non-IPMC applications.

Page 7: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

IP Multicast in Data Centers

• IP multicast has a bad reputation

Page 8: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

IP Multicast in Data Centers

• IP multicast has a bad reputationo Works great up to a point,                                

after which it breaks                                         catastrophically

Page 9: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

IP Multicast in Data Centers

• Bottom line:o Administrators have no control over multicast

use ...o Without control, they opt for never.

Page 10: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.
Page 11: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Dr. Multicast  

Page 12: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Dr. Multicast (MCMD)

• Policy: Permits data center operators to selectively enable and control IPMC

 • Transparency: Standard IPMC interface, system

calls are overloaded. • Performance: Uses IPMC when possible,

otherwise point-to-point unicast • Robustness: Distributed, fault-tolerant service

 

Page 13: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Terminology

• Process: Application that joins logical IPMC groups

• Logical IPMC group: A virtualized abstraction• Physical IPMC group: As usual• UDP multi-send: New kernel-level system-call 

  • Collection: Set of logical IPMC groups with

identical membership

Page 14: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Acceptable Use Policy

• Assume a higher-level network management tool compiles policy into primitives

• Explicitly allow a process to use IPMC groupso allow-join(process,logical IPMC)o allow-send(process,logical IPMC)

• UDP multi-send always permitted • Additional restraints

o max-groups(process,limit)o force-udp(process,logical IPMC)

Page 15: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

 Overview

• Library module• Mapping module• Gossip layer

 • Optimization

questions • Results

Page 16: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

• Transparent. Overloads the IPMC functions o setsockopt(), send(), etc.

 • Translation. Logical IPMC map to a

set of P-IPMC/unicast addresses.o Two extremes

MCMD Library Module

Page 17: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

• MCMD Agent runs on each machineo Contacted by the library modules  o Provides a mapping

  • One agent elected to be a leader:

o Allocates IPMC resources according to the current policy    

MCMD Mapping Role

Page 18: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

 • Allocating IPMC resources: An optimization problem

      

Procs 

L-IPMC

MCMD Mapping Role

This box intentionally left  

BLACK

Procs 

Collections

L-IPMC

Page 19: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

• Runs system-wide as part of the agent • Automatic failure detection 

 • Group membership fully replicated via gossip

o Node reports its own stateo Future: Replicate more selectively o Leader runs optimization algorithm on data and

reports the mapping    

MCMD Gossip Layer

Page 20: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

• But gossip is slow... • Implications:

o Slow propagation of group membershipo Slow propagation of new mapso We assume a low rate of membership churn

 • Remedy: Broadcast module

o Leader broadcasts urgent messages o Bounded bandwidth of urgent channelo Trade-off between latency and scalability

    

MCMD Gossip Layer

Page 21: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Overview

• Library module• Mapping module• Gossip layer

 • Optimization

questions • Results

Page 22: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Optimization Questions

Procs   L-IPMC

BLACK

Collections

Procs    L-IPMC

• First step: compress logical IPMC groups

Page 23: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

klk;l    Optimization Questions

• How compressible are subscriptions?o Multi-objective optimization: 

Minimize number of collectionsMinimize bandwidth overhead on network

 o Thm: The general problem is NP-completeo Thm: In uniform random allocation, "little"

compression opportunity. o Social preferences o Lots of duplicates due to replication (e.g. for

load balancing)   

Page 24: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

klk;l    Optimization Questions

• Which collections get an IPMC address?o Thm: Ordered by decreasing traffic*size, 

assign P-IPMC addresses greedily, we minimize bandwidth.

• Tiling heuristic:o Sort L-IPMC by traffic*sizeo Greedily collapse identical groupso Assign IPMC to collections in reverse order of

traffic*size, UDP-multisend to the rest• Building tilings incrementally

 

Page 25: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

klk;l    Experimental Results

Page 26: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

• Insignificant overhead when mapping L-IPMC to P-IPMC.

            

klk;l    Overhead (max. throughput)

Page 27: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

• Insignificant overhead when mapping L-IPMC to P-IPMC.

            

klk;l    Overhead (CPU utilization)

Page 28: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

klk;l    Network Overhead

• Gossip Layer uses constant background bandwidth, urgent channel behaves well

       

Page 29: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Latency

• Latency of propagation of joins/leaves and new maps

    

Page 30: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

• A malfunctioning node bombards an existing IPMC group.• MCMD policy prevents ill-effects

            

klk;l    Policy control

<Traffic starts<New policy

Page 31: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Conclusion

• IPMC has been a bad citizen...

 

Page 32: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Conclusion

• IPMC has been a bad citizen...

 • Dr. Multicast has the cure!

• Opportunity for big performance enhancements and policy control.

Page 33: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Thank you!

Page 34: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Thank you!  

Page 35: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

• Insignificant overhead when mapping L-IPMC to P-IPMC.            

klk;l    Overhead

Page 36: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

• A malfunctioning node bombards an existing IPMC group.• MCMD policy prevents ill-effects

            

klk;l    Policy control

Page 37: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

• A malfunctioning node bombards an existing IPMC group.• MCMD policy prevents ill-effects

            

klk;l    Policy control

Page 38: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

• Linux kernel module increases UDP-multisend throughput by 17% (compared to user-space UDP-multisend)

             

klk;l    Overhead

Page 39: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

klk;l    Latency of events

• Gossip: 99% of nodes aware of change within 9 epochs (now 1 sec)

    

Page 40: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Conclusions

• Policy: Allows data center operators to        enable and control IPMC

 • Transparency: Standard IPMC interface, system

calls are overloaded. • Performance: Uses IPMC when possible,

otherwise point-to-point UDP • Robustness: Distributed, fault-tolerant service

 

Page 41: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

klk;l    Results

• Library Moduleo Insignificant slowdown

    

o Linux Kernel module provides 17% speed-up for UDP multi-send

Page 42: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

klk;l    Optimization questions

Users

Topics

This box intentionally left  

BLACKUsers

 Groups

Topics

• Multi-objective: o Minimize number of groupso Minimize bandwidth overhead on network

• Thm: This problem is NP-completeo Reduction to Minimum Normal Set Basis

   

Page 43: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

MCMD Library Layer

• Overloads the IPMC functions o setsockopt(), send(), etc.

• Translates logical IPMC addresses to physical IPMC, or point-to-point UDP packets depending on policy

• Notifies MCMD immediately about joins/leaves

• Learns about new mappings from MCMD

• Keeps statistics about group traffic rates

Page 44: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

MCMD Library Layer

• Overloads the IPMC functions o setsockopt(), send(), etc.

• Translates logical IPMC addresses to physical IPMC, or point-to-point UDP packets depending on policy

 • Caches translation maps• Maintains a connection to MCMD for

updates

Page 45: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.
Page 46: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.

Overview

• Library module• Mapping module• Gossip layer

 • Optimization

questions • Results