Top Banner
1 © Copyright 2013 EMC Corporation. All rights reserved. Characterization of Incremental Data Changes for Efficient Data Protection Hyong Shim, Philip Shilane, & Windsor Hsu Backup Recovery Systems Division EMC Corporation
27

Characterization of Incremental Data Changes for Efficient Data Protection

Feb 24, 2016

Download

Documents

mimi

Characterization of Incremental Data Changes for Efficient Data Protection. Hyong Shim, Philip Shilane, & Windsor Hsu Backup Recovery Systems Division EMC Corporation. Data Protection Environment. Data Protection Storage. Primary Storage. Virtual Machines. Application Servers. WAN. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Characterization of Incremental Data Changes for Efficient Data Protection

1© Copyright 2013 EMC Corporation. All rights reserved.

Characterization of Incremental Data Changes for Efficient Data

Protection

Hyong Shim, Philip Shilane, & Windsor Hsu

Backup Recovery Systems DivisionEMC Corporation

Page 2: Characterization of Incremental Data Changes for Efficient Data Protection

2© Copyright 2013 EMC Corporation. All rights reserved.

Data Protection Environment

SAN or LANWAN

Application Servers

Primary Storage

Data ProtectionStorage

High I/O per sec.Medium Capacity Large Capacity

Medium I/O per sec.

Virtual Machines

Page 3: Characterization of Incremental Data Changes for Efficient Data Protection

3© Copyright 2013 EMC Corporation. All rights reserved.

Contributions Detailed analysis of data change

characteristics from enterprise customers Design for replication snapshots to lower

overheads on primary storage. Evaluation of overheads on data protection

storage Rules-of-thumb for storage engineers and

administrators

Page 4: Characterization of Incremental Data Changes for Efficient Data Protection

4© Copyright 2013 EMC Corporation. All rights reserved.

EMC Symmetrix VMAX Traces

Trace Set #Volume # Storage Systems

Duration hrs

Estimated Capacity (GB)

1hr_1Wrt 109,263 125 30.4 [78.3] 71 [203]1hr_1GBWrt 16,100 120 7.7 [6.7] 132 [262]24hr_1GBWrt 508 13 24.4 [1.2] 318 [439]

Collected from enterprise customer sites

Page 5: Characterization of Incremental Data Changes for Efficient Data Protection

5© Copyright 2013 EMC Corporation. All rights reserved.

Capacity and Write Footprint

Analysis for 1hr_1GBWrit Not collected: applications using each volume

Page 6: Characterization of Incremental Data Changes for Efficient Data Protection

6© Copyright 2013 EMC Corporation. All rights reserved.

I/O PropertiesTrace Set #Write

reqs (1000s)

Write size (GB)

#Read reqs (1000s)

Read size (GB)

1hr_1Wrt 72 [510]

2 [31]

167 [1963]

5 [66]

1hr_1GBWrt 429 [1270]

11[80]

796 [4987]

25[166]

24hr_1GBWrt 1803 [4839]

51[338]

7824[23875]

242[763]

1.9-4.3X more read I/Os than write I/Os 2.3-4.7X more GB read than written High variability More analysis in the paper

Page 7: Characterization of Incremental Data Changes for Efficient Data Protection

7© Copyright 2013 EMC Corporation. All rights reserved.

Sequential vs. Random Write I/O

We measure how much data are written, on average, after seeking to a non-consecutive sector.

Selected most sequential and most random for analysis

Storage Volume

w w w wTrace Timeline (w = Write I/O, r = Read I/O)

r w Sequential Write I/O(5 + 1+ 3)/ 3 = 3

Page 8: Characterization of Incremental Data Changes for Efficient Data Protection

8© Copyright 2013 EMC Corporation. All rights reserved.

r w ww r wr w w www w r w …

Replication Interval 1

TransferPeriod

may require snapshot storage and I/O

Trace Timeline (w = Write I/O, r = Read I/O)

Storage VolumeSectors

Replication Interval 2

Block

Trace Analysis Methodology

Create a snapshot to protect block data

Page 9: Characterization of Incremental Data Changes for Efficient Data Protection

9© Copyright 2013 EMC Corporation. All rights reserved.

Replication Snapshot

0

Storage Volume state before transfer takes place

1 2 3 4

Block:

= Modified block to be transferred

Trace Timeline (w = Write I/O)

Goal: Create a snapshot technique that is integrated with replication that decreases overheads on primary storage

Change block tracking records modified blocks for next replication interval, possibly with a bit vector.

A snapshot has to maintain block values against overwrites.

Page 10: Characterization of Incremental Data Changes for Efficient Data Protection

10© Copyright 2013 EMC Corporation. All rights reserved.

Replication Snapshot

Baseline Snapshot: All writes cause copy-on-write

0

Storage Volume state before transfer takes place

1 2 3 4

Block:

= Modified block to be transferred

Snapshot AreaTrace Timeline (w = Write I/O)

w w w Baseline

Transfer in progress

Page 11: Characterization of Incremental Data Changes for Efficient Data Protection

11© Copyright 2013 EMC Corporation. All rights reserved.

Replication Snapshot

Changed Block Replication Snapshot (CB): Only writes to tracked blocks cause copy-on-write

0 1 2 3 4

Block:

Snapshot Areaw w w Baseline

Transfer in progress

CB

Page 12: Characterization of Incremental Data Changes for Efficient Data Protection

12© Copyright 2013 EMC Corporation. All rights reserved.

Replication Snapshot

Changed Block with Early Release Replication Snapshot (CBER): Only writes to tracked blocks cause copy-on-write, and blocks are released once transferred

0 1 2 3 4

Block:

Snapshot Areaw w w Baseline

Transfer in progress

CB

CBER

Page 13: Characterization of Incremental Data Changes for Efficient Data Protection

13© Copyright 2013 EMC Corporation. All rights reserved.

Replication Snapshot

0 1 2 3 4

Block:

Snapshot Areaw w w Baseline

CB

CBER

Baseline Snapshot: All writes cause copy-on-write Changed Block Replication Snapshot (CB): Only

writes to tracked blocks cause copy-on-write Changed Block with Early Release Replication

Snapshot (CBER): Only writes to tracked blocks cause copy-on-write, and blocks are released once transferred

= Modified block to be transferred

Page 14: Characterization of Incremental Data Changes for Efficient Data Protection

14© Copyright 2013 EMC Corporation. All rights reserved.

Snapshot Storage OverheadsRule-of-thumb: Over-provision primary capacity by 8% for snapshots

Page 15: Characterization of Incremental Data Changes for Efficient Data Protection

15© Copyright 2013 EMC Corporation. All rights reserved.

Snapshot I/O OverheadsRule-of-thumb: Over-provision primary I/O by 100% to support copy-on-write related write-amplification

Page 16: Characterization of Incremental Data Changes for Efficient Data Protection

16© Copyright 2013 EMC Corporation. All rights reserved.

Snapshot I/O OverheadsRule-of-thumb: Over-provision primary I/O by 100% to support copy-on-write related write-amplification

Page 17: Characterization of Incremental Data Changes for Efficient Data Protection

17© Copyright 2013 EMC Corporation. All rights reserved.

Transfer Size to Protection Storage Rule-of-thumb: 40% of written bytes are transferred to protection storage

Page 18: Characterization of Incremental Data Changes for Efficient Data Protection

18© Copyright 2013 EMC Corporation. All rights reserved.

IOPS Requirements for Protection StorageRule-of-thumb: Protection storage must support 20% of the I/O per second capabilities of primary storage

Page 19: Characterization of Incremental Data Changes for Efficient Data Protection

19© Copyright 2013 EMC Corporation. All rights reserved.

Related Work Trace analysis

– Numerous publications Most closely related is Patterson [2002]

Snapshots– Common paradigm for storage but rarely integrated with

incremental transfer techniques– Storage overheads Azagury [2002] and Shah [2006]

Synchronous Mirroring– Effective when change rates are low and geographic

distance is small– We are focused on periodic, asynchronous replication

Page 20: Characterization of Incremental Data Changes for Efficient Data Protection

20© Copyright 2013 EMC Corporation. All rights reserved.

Conclusion

SAN or LANWAN

Application Servers

Primary Storage

Data ProtectionStorage

High I/O per sec.Medium Capacity Large Capacity

Medium I/O per sec.

Page 21: Characterization of Incremental Data Changes for Efficient Data Protection

21© Copyright 2013 EMC Corporation. All rights reserved.

Conclusion Trace analysis shows diversity of storage characteristics Snapshot overheads on primary storage can be decreased by

improved integration with network transfer Sequential versus random access patterns affect incremental

change patterns on both primary and protection storage

Page 22: Characterization of Incremental Data Changes for Efficient Data Protection

22© Copyright 2013 EMC Corporation. All rights reserved.

Rules-of-Thumb Over-provision primary capacity by 8% for snapshots Over-provision primary I/O by 100% to support copy-on-write

related write-amplification A write buffer decreases snapshot I/O overheads but has little

impact on storage overheads 40% of written bytes are transferred to protection storage Schedule at least 6 hours between transfers to minimize clean

data in transferred blocks Schedule at least 12 hours between transfers to minimize peak

network bandwidth requirements Protection storage must support 20% of the I/O per second

capabilities of primary storage

Page 23: Characterization of Incremental Data Changes for Efficient Data Protection

23© Copyright 2013 EMC Corporation. All rights reserved.

Questions?

Page 24: Characterization of Incremental Data Changes for Efficient Data Protection
Page 25: Characterization of Incremental Data Changes for Efficient Data Protection

25© Copyright 2013 EMC Corporation. All rights reserved.

Trace Analysis: Replication of SnapshotsThe amount of data to replicate drops in half with 12 hours between snapshots. 4KB results are compared to Patterson 2002.

Page 26: Characterization of Incremental Data Changes for Efficient Data Protection

26© Copyright 2013 EMC Corporation. All rights reserved.

I/O Per Second (IOPS) Request RateTrace Set Average

Write RatePeak Write Rate 10ms

Average Read Rate

Peak Read Rate 10 ms

1hr_1Wrt 0.7[8]

1762[2602]

2[25]

1693[2457]

1hr_1GBWrt 15[37]

4360[4379]

29[118]

3603[4135]

24hr_1GBWrt 20[55]

9004[8165]

89[269]

5647[7012]

Peak Values: IOPS are calculated every 10ms period, and the peaks for each volume are averaged.More analysis in the paper

Page 27: Characterization of Incremental Data Changes for Efficient Data Protection

27© Copyright 2013 EMC Corporation. All rights reserved.

Snapshot I/O OverheadsRule-of-thumb: Over-provision primary I/O by 100% to support copy-on-write related write-amplification