Evaluating Cloud Storage Strategies James Bottomley; CTO, Server Virtualization
Profit from the cloud™ | 2
• Attachments: - Local (Direct – cheap)
• SAS, SATA - Remote (SAN, NAS – expensive)
• FC • net
• Types - Block
• Spinning Disk Drive • SSD • RAID unit
- File • NFS • CEPH
- Object • RADOS • PCS
Introduction to Storage
Profit from the cloud™ | 5
• Block device - A unit of storage - May be divided inflexibly (by partitioning) - Usually locally attached, but may be on a SAN
• File based Storage - Exports views of a filesystem via NFS, CIFS or other protocols - Is flexible
• storage in views can be expanded and contracted on the fly - Suffers from metadata issues on the server
• Object Storage - Really just means a flexible block device - May be expanded and contracted on the fly - Easily administrable (unlike LUN partitioning in SANs)
A Closer Look at the Terms
Profit from the cloud™ | 6
Storage Types Comparison
• Inelastic • Hard to Aggregate • Attached to
individual systems
• Slightly Elastic • Fixed size • Good B/W • Dedicated network
• Based on SAN • Limited Scaling
• Tuned to disk image size objects
• Designed for rapid update
• Scalable B/W
• Simple Web API • No easy way to
update objects • Slow
• CEPH, Gluster • Object Size tuning
problem
Cloud U
tility
Hosting Utility
Profit from the cloud™ | 7
• A large number of Cloud storage systems are file based - CEPH, Gluster
• The specific problem is that updating any file requires a change in the metadata
- This produces both a hotness in the journal - As well as locking hierarchy issues - And communication with the metadata server - All of which slow the operations down
• Object storage only uses metadata when objects are
resized, created or destroyed - Using a fixed size object incurs no metadata overhead whatsoever
• So objects providing virtual environment roots allows efficient embedded filesystems with zero metadata overhead
Object vs File and the Metadata Problem
Profit from the cloud™ | 8
• Fuse is the Linux Userspace Filesystem • Main problem is it’s incredibly SLOW • However, it is very useful, so a large number
of cloud filesystems use it - Gluster
• Parallels originally avoided using it. • However, now we’ve decided we’ll fix it for everyone • Parallels engineers are currently interacting with the linux
filesystems and fuse lists • Object is to add write caching and mtime fixes to accelerate
fuse • Tests show we can get ~95% of the performance of a
natively written filesystem
FUSE Issues
Profit from the cloud™ | 9
• Strong Consistency is hard to achieve in clusters - Strong Consistency means that all updates are seen immediately
after they are committed - Strong consistency is most often violated across cluster
reconfigurations - Ironically, this is precisely when you usually need it (HA) - Sheepdog, CEPH, PStorage
• Eventual Consistency is the usual norm - Means that all updates are eventually seen, but may not be
immediately visible after they are committed - SWIFT, Gluster (does have a much slower strong
consistency quorum enforcement mode) • Weak Consistency
- Does not guarantee write ordering and visibility - Too weak to be useful for most cloud storage
Consistency
Profit from the cloud™ | 10
• Cloud storage must be designed to scale not just per node, but also per Virtual Environment per node
• This requires there be no bottlenecks connecting a virtual environment to storage
- Sheepdog problem: it uses a single threaded per-node gateway process causing its scalability per VE to be poor
• Ideally, a direct connection should be made between the virtual environment using the object and the storage providing it with no intermediate broker
- Or using an intermediate broker tuned for scalability • Chunking (large block size for objects) also improves
performance
Performance and Scalability
Profit from the cloud™ | 11
• The Cardinal hosting requirement is that existing local storage should be repurposed as generic object based storage for
1. Supporting Existing Hosting Environments and additional services 2. Enabling the provision of Cloud Services
• Equating to the technical requirements 1. Performance must be wire speed SATA (100MB/s)
• Tuned exactly for GB objects containing small files 2. Storage must be object based to avoid metadata issues 3. Objects should be capable of rapid random read/write updates 4. Storage bandwidth should scale linearly with the cluster
Requirements for Hosting Storage
Profit from the cloud™ | 12
• Hosting Enhancements 1. Free storage from individual nodes
• Easy, fast migration of Virtual Environments • High Availability
2. Simple and Efficient resizing with assist for legacy roots (ext3) • Makes storage easier to sell in increments
3. Cloning and Snapshotting • Value add for templating block based roots • Permits easy backup
4. Redundancy • Allows different storage SLAs for different prices
• Cloud Enhancements (Ideal Storage Solution) 1. Dropbox like services 2. Storage as a Service (like S3) 3. Storage on Demand 4. Tiered Storage Pricing
Simple Requirements for Additional Benefits
Profit from the cloud™ | 13
• Technical Specs - Metadata is the key to improving performance - Large Static objects with rapid updates have fixed metadata - 100MB/s performance over gigabit ethernet (no 10GE requirement)
• Avoid - Anything like a filesystem (CEPH, Gluster) because of
• Locking problems • Speed issues with per file need to consult metadata
- Anything using FUSE (Gluster) • At least anything using FUSE without the Parallels acceleration patches
- Anything with a single threaded connection multiplexor (sheepdog) • Per cluster is worse (kills all scalability) • Per node is still bad (kills VE scalability)
Ideal solution
Profit from the cloud™ | 14
• Why Choose Us? - We’re the experts in the field (we studied the problem) - We fixed FUSE - We redid the Linux loop device to work efficiently for virtual
environment roots • In collaboration with Oracle who did the Direct I/O patches
- Loop device also modified to do snapshotting and legacy filesystem resizing.
- All the necessary infrastructure patches are upstream in linux • Or are moving that way
• What we provide - Complete leverage of existing local node storage - Strong Consistency and Redundancy - Wire speed transfers because of optimised data architecture
• Up to 100MB/s/node over 1GigE - Hot object tiering and SSD caching
Introducing Parallels Cloud Storage
Profit from the cloud™ | 16
• Chunk Server based snapshotting • De-duplication • Thin Provisioning
- Actual storage size can appear much larger than in-use backing store because of sparsity of objects
- Also provides ability to do dynamic in-place upgrades of actual storage capacity
• Innovative redundancy algorithms • Geographic Object Replication for advanced disaster
recovery
Future Features
Profit from the cloud™ | 17
• Getting Cloud storage right for current hosting needs is not a simple problem
- The basic construction of many cloud storage offerings is unsuitable to hosting provider environments
• Parallels has devoted considerable study and effort to mapping the needs of hosters on to cloud storage
• Parallels has studied the strengths and weaknesses of current cloud storage offerings and incorporated the best into our cloud storage offerings
- While attempting to eliminate all the negative issues - And improve performance
• Parallels will leverage (and enhance) open source to achieve the best cloud storage system for hosters
Conclusions