RACS: A Case for Cloud Storage Diversity Paper by Hussam Abu-Libdeh Cornell University Lonnie Princehouse Cornell University Hakim Weatherspoon Cornell University (1st ACM Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10-11, 2010. ACM 2010) Presented by Chayutra Pailom CS Department, UTSA
RACS: A Case for Cloud Storage Diversity. Paper by Hussam Abu-LibdehCornell University Lonnie PrincehouseCornell University Hakim Weatherspoon Cornell University ( 1st ACM Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10-11, 2010. ACM 2010 ) . Presented by - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
RACS: A Case for Cloud Storage Diversity
Paper byHussam Abu-Libdeh Cornell UniversityLonnie Princehouse Cornell UniversityHakim Weatherspoon Cornell University
(1st ACM Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10-11, 2010. ACM 2010)
Presented byChayutra Pailom CS Department, UTSA
What is this topic about?• Motivation: Diversify cloud storage providers • Key issues: The risk on being service on solely particular
cloud storage provider• Solutions: Strip data across vendors • Comparisons: Overhead expense and vendor mobility• Possible improvement: Avoid vendor lock-in, reduce the
cost of switching providers and better tolerate provider outages or failures.
Error Correction Code Use Reed-Solomon error correction code. Oversampling a polynomial constructed from the data. The polynomial is evaluated at several points, and these
values are sent or recorded. Sampling the polynomial more often than is necessary
makes the polynomial over-determined. As long as it receives "many" of the points correctly, the
receiver can recover the original polynomial even in the presence of a "few" bad points.
Used in many different kinds of commercial applications, for example in CDs, DVDs and Blu-ray Discs.
Background The Cloud Storage Market Cloud storage providers expose simple interfaces to
developers. Amazon S3's data model provides at namespaces (buckets) into which named objects can be uploaded for later retrieval.
Other storage services can be mounted as network file systems. There is no widely agreed-upon standard interface, but S3's REST API has been adopted by smaller providers and by the open-source Eucalyptus server software.
Background The Cloud Storage Market These interfaces differ, but are similar enough to be
considered interchangeable. Storage providers are forced to compete on price rather than by offering unique services.
Cloud storage is a highly competitive market. These are simplified pricing schemes for the top two cloud storage providers; Amazon S3 and Rackspace.
Introduction
Pricing scheme of different cloud storage providers
Background Vendor lock-in Expensive to switch. Subject to possibility of data loss. For business, it is very big issue; go cloud or not. Ways to fix, e.g. Economical approach spread the
data across multiples providers. But high storage and bandwidth cost. Use this paper’s approach!
Background Why should we diversify… Outages and operational failures. Economic failures. By striping data across providers and adding
appropriate redundancy, clients can tolerate outages and operational failures, as well as adapt to changes in the economic matters.
Avoid vendor lock-in!
Background Incentives for Redundant Striping RACS uses Reed-Solomon error correcting codes to tolerate
failures without data loss. Starting with m equal-size disks of original data, we fill k additional disks with redundant data. Any combination of m disks (data or redundant) is sufficient to reconstruct the original data. We write (m, n) to indicate that there are n = m + k total disks, any m of which are sufficient to reconstruct all original data
Credit: http://pubs.0xff.co/posters/racs.pdf
Background Incentives for Redundant Striping (cont.) The choice of parameters m and n is a trade-off:
Overhead for data storage and write operations is increased by the ratio n : m. Interestingly, read operations are not significantly more expensive, since only m disks must be read under normal operating condition
RAID-5 uses a similar strategy to tolerate up to one failure in an array of hard disks.
The goal of RACS is slightly different than RAID-5. Cloud storage is assumed to be much more reliable than hard disks, so data loss prevention is a much less compelling reason to use error correcting codes.
RACS lowers the cost of switching providers, e.g., as a result of economic failure. Only 1/m of all data needs to be moved to leave a vendor. By reducing the impact of vendor lock-in, RACS increases the
leverage of customers when negotiating contracts with cloud providers.
RACS is implemented as an HTTP proxy with the same interface as Amazon S3.
Redundant Array of Cloud Storage
Redundant Array of Cloud Storage
Multiple RACS proxies coordinate their actions using ZooKeeper
Trace represents 18 months of activity on the Internet Archive’s FTP site.
What are the estimated costs associated with storing library type content such as the Library of Congress and the Internet Archive on the cloud?
What is the cost of changing storage providers for large organizations? What is the added cost of a DuraCloud-like (replicate data across multiples cloud providers)
What is the cost of avoiding vendor lock-in using RACS? And what is the added cost overhead of using RACS (depends on the coding configuration = additional data to be written)?
Evaluation
Inbound and outbound data transfers in the Internet Archive trace
for Individual IA data transfers
Read/write requests in the Internet Archive trace for Individual IA
read/write requests
Evaluation Cost of Moving to The Cloud
Estimated monthly and cumulative costs of hosting on the cloud with different storage providers
Evaluation Cost of Switching Vendors
The cost of switching the Internet Archive’s storage provider for various configurations
Conclusion The commoditization of cloud services has brought
with it the characteristics of an economy, good and bad. RACS: A simple application of technology to change
the structure of a market. Erasure coding is applied to different types of failure
(economic) than in the storage systems. Microbenchmarks and larger experiments are
simulated for the real-world traces. RACS enables cloud storage customers to explore
trade-offs between overhead and mobility.
Sample References[16] J. Bloemer, M. Kalfane, M. Karpinski, R. Karp, M. Luby,
and D. Zuckerman. An XOR-based erasure-resilient codingscheme. Technical Report TR-95-048, The InternationalComputer Science Institute, Berkeley, CA, 1995.
[30] D. Patterson, G. Gibson, and R. Katz. The case for RAID:Redundant arrays of inexpensive disks. In Proc. of ACMSIGMOD Conf., pages 106–113, May 1988.
[38] M. Vrable, S. Savage, and G. M. Voelker. Cumulus:Filesystem backup to the cloud. Trans. Storage, 5(4):1–28,2009.