Top Banner
Digital Preservation Partners Meeting July 21, 2010 Andrew Woods [email protected] DuraCloud: Data Integrity Monitoring in the Cloud
16

Digital Preservation Partners Meeting July 21, 2010 Andrew Woods [email protected] DuraCloud: Data Integrity Monitoring in the Cloud.

Dec 17, 2015

Download

Documents

Merilyn Paul
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

Digital Preservation Partners MeetingJuly 21, 2010

Andrew [email protected]

DuraCloud: Data Integrity Monitoring in the Cloud

Page 2: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

• What is DuraCloud?• Fixity service use case• Basic flow• Cost and performance• Next steps

Overview

Page 3: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

What is it?• Cloud-based service offered by the not for profit organization,

DuraSpace

• An open source, cloud storage/compute application– Focused on preservation support and – Data access for reuse and sharing

• Cloud storage across multiple commercial & non-commercial providers

• An open canvas for cloud-based services

Page 4: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

Fixity use case

• DuraCloud user has replicated content across one or more cloud stores

• Need for periodic verification of bit integrity

• Seeking balance between cost & trust

Page 5: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

0: Content Topology

Page 6: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

1: Data load

Page 7: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

1a: Replicate

Page 8: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

1b: MD5 export

Page 9: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

2: Determine MD5s*

...running fixity service

Page 10: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

3: Compare & Report

Page 11: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

0: Trust vs. Cost

Trust in...– Underlying storage providers

– DuraCloud and opensource software

– Requester of service (administrator)

Page 12: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

1: Trust vs. Cost

Three approaches:– Request stored value

• [inexpensive & fast]

– Stream out content & re-calculate• [compute intensive & slow]

– Stream out content & re-calculate with salt• [user intensive, compute intensive & slow]

Page 13: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

2: Determine MD5s*

Options for providing expected MD5

With initial listing After MD5 calculation

Page 14: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

2a: MD5 at non-primary

Additional cost of processing content not local to compute

Page 15: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

Next steps

• Scalability– MD5 calculation across Hadoop cluster

• Multi-administration efficiency– On-demand compute at secondary provider

• Event logging

Page 16: Digital Preservation Partners Meeting July 21, 2010 Andrew Woods awoods@duraspace.org DuraCloud: Data Integrity Monitoring in the Cloud.

Thank you

Requesting comments & review

https://wiki.duraspace.org/display/duracloud/Fixity+Service

http://duracloud.org

https://wiki.duraspace.org/display/duracloud/DuraCloud

https://svn.duraspace.org/duracloud/trunk/