Top Banner
1 Seneca: Remote Mirroring Done Write Minwen Ji , Alistair Veitch and John Wilkes HP Labs June 20, 2022
13

1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

Jan 14, 2016

Download

Documents

Annabel Ross
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

1

Seneca: Remote Mirroring Done Write

Minwen Ji ,

Alistair Veitch and John Wilkes

HP Labs

April 21, 2023

Page 2: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

2

Motivations: Reliability and Availability

• 2 out of 5 enterprises that experience a disaster go out of business within 5 years [Gartner Report]

• Outages cost >$250K/hour (25%) or >$5M/hour (4%) [Eagle Rock Online Survey]

Page 3: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

3

Our Contributions

• A taxonomy of the design choices for remote mirroring

• An asynchronous protocol that is designed to handle many kinds and sequences of failures

• Checking the correctness of the protocol using I/O automata-based simulation

Page 4: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

4

Remote Mirroring Overview

Competing goals:High performance, low cost, and low data loss

App App App

MirroringModule

MirroringModule

Local Remote

Wide Area Network

App App App

Primary

SecondarySecondary Primary

PrimaryLog

SecondaryLog

Page 5: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

5

Design Choices• Synchronous vs. asynchronous

– Propagate update to mirror before vs. after write request returned to application

• Divergence: zero bounded, op/byte/time bounded, resource bounded, unbounded– Amount of data allowed to be out-of-sync between

mirrors

• As-is logging vs. write coalescing– Store all versions vs. a subset of versions of overwritten

data in log

Page 6: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

6

Seneca’s Choices

• Synchronous vs. asynchronous– Low data loss vs. smooth traffic and high performance

• Divergence: zero bounded, op/byte/time bounded, resource bounded, unbounded– Low data loss vs. smooth traffic and high availability

• As-is logging vs. write coalescing– Little secondary log space vs. low primary log space

and low traffic

Page 7: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

7

A Taxonomy

0

1

2

3

4

Divergence Bound

Pro

paga

tion

VeritasIBM-PRRCIBM-XRCEMCNetAppsHP-XPHP-SV3000Seneca

4 Async- Bitmap 3 Async- Coalesce2 Async1 Sync

Avail+Cost –Loss +

Perf +Cost –Loss +

Page 8: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

8

Evaluation of Seneca’s Choices

Metrics:

Impact of asynchronous propagation and write coalescing on WAN traffic and log space

Traces Capacity Length Mean Write Rate

Cello2002 1.44TB 24 hours 0.78 MB/s

SAP 4TB 15 mins 1.95 MB/s

RDW 500GB 1.4 hours 0.34 MB/s

OpenMail 640GB 1 hour 1.70 MB/s

Page 9: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

9

Simulation Results

• Mean traffic: 5-40% reduction with write coalescing allowed within 30 sec intervals

• 95th percentile usage: reduced from 93% of 4 T3 lines to 85% of 3 T3 lines

• Log space: 100 GB log will cover a network outage for 14-81 hours

Mean Traffic vs. Coalescing Interval Traffic CDF w/ Coalescing On/Off

Log Size vs. Coalescing Interval

Page 10: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

10

How To Get Things Right

• Hard problems:– Rolling disasters

• Primary fails => secondary inconsistent => system inaccessible

– Failover dilemmas• Primary fails before propagation

• Secondary takes over and continues to update

• Old primary returns

• Our approach:– Finite state machines

Page 11: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

11

Local Seneca State Remote Seneca State

Primary State

Secondary State

Page 12: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

12

Checking Correctness• Simulation

– Started with Input/Output Automata (a model checking language) – Constrained random walks in the state space– Implemented in C

• Correctness criteria– Coverage, safety and liveness

• Latest results– Detected and fixed many non-trivial implementation bugs in a

relatively short time– Average failure injections before a bug is detected: 16435– Mean Time Between Failures for the protocol proper: 4100 years– The latest bug took 1.77M writes, 75.9K failures, 22.4K recoveries

and 6.6M internal events to detect

Page 13: 1 Seneca: Remote Mirroring Done Write Minwen Ji, Alistair Veitch and John Wilkes HP Labs December 2, 2015.

13

Summary

• A taxonomy of design space for remote mirroring

• Evaluation of Seneca’s design choices

• A finite state machine description of the Seneca remote mirroring protocol

• Checking the correctness of Seneca with simulations