Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University Borealis is a distributed stream processing system (DSPS) based on Aurora and Medusa Contract-Based Load Management HA Semantics and Algorithms Network Partitions Approach: 1 - Offline, participants negotiate and establish bilateral contracts that: • Fix or tightly bound price per unit-load • Are private and customizable (e.g., performance, availability guarantees, SLA) Properties: • Simple, efficient, and low overhead (provable small bounds) • Provable incentives to participate in mechanism • Experimental result: A small number of contracts and small price-ranges suffice to achieve acceptable allocation A C Approach: Favor availability. Use updates to achieve consistency • Use connection points to create replicas and stream versions • Downstream nodes • Monitor upstream nodes • Reconnect to available upstream replica • Continue processing with minimal Goal: Handle network partitions in a distributed stream processing system p p [p,p+e] 0.8p B’ B Contract at p Convex cost function Offered load (msgs/ sec) Total cost (delay, $) Task t moves from A to B if: • unit MC task t > p, at A • unit MC task t < p, at B B A C ACK Trim Upstream backup lowest runtime overhead B A C B’ Repla y Active Standby shortest recovery time B A C B’ ACK Trim Passive Standby most suitable for precise recovery Goal: Streaming applications can tolerate different types of failure recovery: • Gap recovery: may lose tuples • Rollback recovery: produces duplicates but does not lose tuples • Precise recovery: takes over precisely from the point of failure Repeatable Convergent Deterministic Filter, Map, Join BSort, Resample, Aggregate Union, operators with timeouts B A C B’ ACK Checkpoin t D A C B Goals: • Manage load through collaborations between autonomous participants • Ensure acceptable allocation where each node’s load is below threshold Particip ant Contract specifying that A will pay C, $p per unit of load Challenges: Operator and processing non- determinism 2 - At runtime, Load moves only between participants that have a contract Movements are based on marginal costs: • Each participant has a private convex cost function • Load moves when it’s cheaper to pay partner than to process locally Challenges: Incentives, efficiency, and customization Arbitrary load( t) MC(t) at A Challenges: • Maximize availability • Minimize reprocessing • Maintain consistency MC(t) at B
4
Embed
Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Load Management and High Availability in BorealisMagdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team
MIT, Brown University, and Brandeis University
Borealis is a distributed stream processing system (DSPS) based on Aurora and Medusa
Contract-Based Load Management HA Semantics and Algorithms
Network Partitions
Approach: 1 - Offline, participants negotiate and establish bilateral contracts that:• Fix or tightly bound price per unit-load• Are private and customizable (e.g., performance, availability guarantees, SLA)
Properties:• Simple, efficient, and low overhead (provable small bounds)• Provable incentives to participate in mechanism• Experimental result: A small number of contracts and small price-ranges suffice to achieve acceptable allocation
A C
Approach: Favor availability. Use updates to achieve consistency• Use connection points to create replicas and stream versions• Downstream nodes
• Monitor upstream nodes• Reconnect to available upstream replica• Continue processing with minimal disruptions
Goal: Handle network partitions in a distributed stream processing system
p p
[p,p+e]0.8p
B’
B
Contractat p
Convex cost function
Offered load(msgs/sec)
Total cost(delay, $)
Task t moves from A to B if:• unit MC task t > p, at A• unit MC task t < p, at B
BA C
ACKTrim
Upstream backup lowest runtime overhead
BA C
B’Replay
Active Standby shortest recovery time
BA C
B’
ACK
Trim
Passive Standby most suitable for precise recovery
Goal: Streaming applications can tolerate different types of failure recovery:• Gap recovery: may lose tuples• Rollback recovery: produces duplicates but does not lose tuples• Precise recovery: takes over precisely from the point of failure Repeatable
Convergent
Deterministic
Filter, Map, Join
BSort, Resample, Aggregate
Union, operators with timeouts
BA C
B’
ACK
Checkpoint
D
A
CB
Goals: • Manage load through collaborations between autonomous participants • Ensure acceptable allocation where each node’s load is below threshold
Participant
Contract specifying that A will pay C, $p per unit of load
Challenges: Operator and processing non-determinism
2 - At runtime,Load moves only between participants that have a contractMovements are based on marginal costs:• Each participant has a private convex cost function• Load moves when it’s cheaper to pay partner than to process locally
Challenges: Incentives, efficiency, and customizationArbitrary