Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University.
Post on 29-Mar-2015
215 Views
Preview:
Transcript
Reliable Multicast for Time-Critical Systems
Mahesh BalakrishnanKen Birman
Cornell University
Mission-Critical Datacenters
COTS DatacentersOnline e-tailers, search engines, corporate
applicationsWeb-services
Mission-Critical AppsNeed: Scalability, Availability, Fault-Tolerance
… Timeliness!
The Time-Critical Datacenter
Migrating time-critical applications to commodity datacenters…
… conversely, providing datacenter web-services with time-critical performance.
What’s a Time-Critical System?
Not ‘real time’, but ‘real fast’!
Financial calculators, military command and control… air traffic control (ATC)
… foobooks.com!
Technology Gap: Real-Time focuses on determinism, scale-up architectures
The French ATC System
Mid to Late 90’s Teams of 3-5 air traffic controllers on a
cluster of desktop consoles 50-200 of these console clusters in an air
traffic control center Why study the French ATC?
ATC Subsystems
Radar Image Weather Alert Track Updates Updates to Flight Plans Console to Console State Updates System Management and Monitoring ATC center to center Updates
Multicast ubiquitous…
Two Kinds of Multicast
Virtually Synchronous Multicast: very reliable, not particularly fast
Unreliable Multicast: very fast, not particularly reliable
Nothing in between!
Two Kinds of Subsystems
Category 1: Complete reliability (virtual synchrony) e.g: Routing decisions
Category 2: Careful application design + natural hardware properties + management policies. e.g: Radar
Multicast in the French ATC
Engineering Lessons: Structure application to tolerate partial failures Exploit natural hardware properties
Can we generalize to modern systems?
Research Direction: Time-Critical Reliability Can we design communication primitives that
encapsulate these lessons?
Anatomy of a Cloned Service
RACS
Updates multicast to whole group
Queries unicast to
single nodes
Services An Amazon web-page is constructed by
100s of co-operating services*
Multicast is used for:Updating Cloned ServicesPublish-Subscribe / EventingDatacenter Management/Monitoring
* Werner Vogels, CTO of amazon.com, at SOSP 2005
Multicast in the Datacenter
A node is in many multicast groups: One for each service it
hosts One for each topic it
subscribes to One or more
administration groups
Large Numbers of Overlapping Groups!
Service Semantics
Product Popularity Service
Shipping Scheduler
Store Inventory
User History Service
Product Recommendations
User Profile Data
Data Store Services: stale data can result in overselling / underselling loss of real-world dollars
Cache Services: updated
periodically by back-end data-stores
The Challenge
Datacenter Blades are failure-prone: Crash failures Byzantine behavior Bursty Packet Loss :
End-hosts kernels drop packets when subjected to traffic spikes.
A New Reliability Model
Rapid delivery is more important than perfect reliability
Probabilistic Timeliness Graceful Degradation
Wanted: a multicast primitive that
1. Scales to large numbers of arbitrarily overlapping multicast groups
2. Delivers multicasts quickly
3. Tolerates datacenter failure modes – bursty packet loss, node failures
4. Offers probabilistic properties
5. ‘Gives up’ on lost data after a threshold period
Ricochet: Lateral Error Correction
Receivers exchange error correction XORs of multicast traffic
Works very well with multiple groups – scales upto a thousand groups per node
Probabilistic Timeliness: probability distribution of delivery
latencies
Predictive Total Ordering (Plato)
Delivers messages to applications with no ordering delay in most cases
Orders messages only if there is a high probability of out-of-order delivery across different nodes
Probabilistic Timeliness: probability distribution of ordered delivery latency
Performance
SRM takes seconds to recover lost packets
Ricochet recovers almost all packets within ~70 milliseconds
Conclusion
Move from R/T to T/C yields huge benefits! Ricochet is faster… slashes latency… scalable… Clean delivery delay curve a powerful design tool,
replaced traditional hard (but conservative) limits We’re open for business:
Software and detailed paper available for download Give it a try… tell us what you think!
www.cs.cornell.edu/projects/quicksilver/ricochet.html
top related