Storage Virtualization Seminar Stephen Foskett Director of Data Practice, Contoural
May 20, 2015
Storage Virtualization Seminar
Stephen FoskettDirector of Data Practice, Contoural
Part 1:Breaking the Connections
Storage virtualization is here, breaking the connection between physical storage infrastructure and the logical way we use it
Agenda
What is storage virtualization?
Volume management
Advanced file systems
Virtualizing the SAN
Virtual NAS
Poll: Who is Already Using Storage Virtualization?
We talk about virtualization like it is new or strange…
…but your storage is already virtualized!• Disk drives map blocks
• RAID is as old as storage (conceived 1978-1988)
• Modern OSes include volume management and path management
• Network-attached storage (NAS) redirectors and DFS
• Storage arrays are highly virtualized (clustering, LUN carving,
relocation, tiering, etc…)
According to ESG, 52% have already implemented
storage virtualization and 48% plan to! (ESG 2008)
1. The act of abstracting, hiding, or isolating the internal
function of a storage (sub)system or service from
applications, compute servers or general network
resources for the purpose of enabling application and
network independent management of storage or data.
2. The application of virtualization to storage services or
devices for the purpose of aggregating, hiding complexity
or adding new capabilities to lower level storage resources.
Storage can be virtualized simultaneously in multiple
layers of a system, for instance to create HSM like systems.
SNIA Defines Storage Virtualization
1. The act of abstracting, hiding, or isolating the internal
function of a storage (sub)system or service from
applications, compute servers or general network
resources for the purpose of enabling application and
network independent management of storage or data.
2. The application of virtualization to storage services or
devices for the purpose of aggregating, hiding complexity
or adding new capabilities to lower level storage resources.
Storage can be virtualized simultaneously in multiple
layers of a system, for instance to create HSM like systems.
What and Why?
Virtualization removes the hard connection
between storage hardware and users• Address space is mapped to logical rather than physical
locations
• The virtualizing service consistently maintains this meta-data
• I/O can be redirected to a new physical location
We gain by virtualizing• Efficiency, flexibility, and scalability
• Stability, availability, and recoverability
The Non-Revolution:Storage Virtualization
We’ve been talking about storage
virtualization for 15 years!
Virtualization exists for both block
and file storage networks
Can be located in server-based
software, on network-based
appliances, SAN switches, or
integrated with the storage array
Software
SwitchAppliance
Array
Introducing Volume Management
Volume management = server-based storage virtualization
Volume managers abstract block storage (LUNs, disks, partitions)
into virtual “volumes”
Very common – all* modern OSes have volume managers built in• Windows Logical Disk Manager, Linux LVM/EVMS, AIX LVM, HP-UX LVM, Solaris Solstice,
Veritas Volume Manager
Mostly used for flexibility• Resize volumes
• Protect data (RAID)
• Add capacity (concatenate or expand stripe or RAID)
• Mirror, snapshot, replicate
• Migrate data
Logical Volume Managers
Platform Volume Manager Notes
AIX Logical Volume Manager OSF LVM, no RAID 5, no copy-on-write snapshots
HP-UX 9.0+ HP Logical Volume Manager OSF LVM, no RAID 5
FreeBSD Vinum Volume Manager No copy-on-write snapshots
Linux 2.2+Logical Volume Manager and Enterprise Volume Management System
Based on OSF LVM, no RAID 5
SolarisSolaris Volume Manager (was Solstice DiskSuite)
Limited allocation options, no copy-on-write snapshots
AIX, HP-UX, Linux, Solaris, Windows
Symantec Veritas Volume Manager (VxVM), Storage Foundation
Full-featured multi-platform volume manager
Windows 2000+ Logical Disk ManagerCo-developed with Veritas, limited allocation options, copy-on-write snapshots introduced in Server 2003
Solaris, BSD, Mac OS X 10.6+
ZFS Combined filesystem and volume manager
ZFS: Super File System!
ZFS (originally “zettabyte file system”) is a combined file system,
volume manager, disk/partition manager• Open source (CDDL) project managed by Sun
• Will probably replace UFS (Sun), HFS+ (Apple OS X Snow Leopard Server)
ZFS creates a truly flexible, extensible, and full-featured pool of
storage across systems and disks• Filesystems contained in “zpools” on “vdevs” with striping and optional RAID-Z/Z2
• 128-bit addresses mean near-infinite capacity (in theory)
• Blocks are “copy-on-write” with checksums for snapshots, clones, authentication
…but there are some limitations• Adding (and especially removing) vdevs is hard/impossible
• Stacked RAID is impossible
• There is no clustering (until Sun adds Lustre)
Path Management Software
Path management virtualizes the connection from a
server to a storage system• Failover
• Load balancing strategies
A few choices• Veritas DMP (cross-platform, with Storage Foundation)
• EMC PowerPath (supports EMC, HDS, IBM, HP)
• IBM SDD (free for IBM)
• HDS (HDLM)
• Microsoft MPIO (Windows, supports iSCSI and most FC)
• VMware Failover Paths
Virtualizing the SAN
The storage area network (SAN) is a popular
location for virtualizationCan require less reconfiguration and server work
Works with all servers and storage (potentially)
Resides on appliance or switch placed in the
storage networkSome are in the data path, others are less so
Brocade and Cisco switches have application blades
Some use dedicated storage services modules (SSMs)
In-Band vs. Out-of-Band
In-band devices intercept traffic Out-of-band devices redirect traffic
Where’s my data?
I got yer data right
here!
Where’s my data?It’s over
there!
SAN Virtualization Products
Product Architecture LocationThin Prov.
Repl. Notes
DataCore SANsymphony In-band IP Generic x86 Yes YesSupports SCSI, FC, ATA drives over IP on a Wintel server
EMC Invista Out-of-band FC FC switch No NoNo caching; supports Cisco & Brocade FC switch blades and SSPs
FalconStor IPStor NSSIn- or out-of-band IP
Generic x86 No YesBlock-based; supports a variety of drive types on a Wintel server or Cisco FC blade
IBM SVC In-band FC Appliance No YesSupports most FC storage; large caches; IBM hardware
Incipient iNSP Out-of-band FC FC switch No NoNo caching; supports Cisco FC blades
LSI StoreAge SVM Out-of-band FCAppliance & host SW
No YesNo caching; split-path FC with proprietary SSM
Reldata Unified Storage In-Band Appliance No Yes NAS and IP SAN
Sanrad V-Switch In-band IP Appliance No Yes Bridges FC to iSCSI
Virtual NAS
File-based network-attached storage (NAS) lends
itself to virtualization• IP network connectivity and host processing possibilities
Multitude of file servers? Virtualize!• Global namespace across all NAS and servers
• Share excess capacity
• Transparently migrate data (easier than redirecting users!)
• Tier files on large “shares” with variety of data
• Create multiple virtual file servers
NAS Virtualization ProductsProduct Architecture Location Notes
Attune Maestro In-band ApplianceWindows-focused with replication and snapshots
BlueArc In-band Clustered NASClustered integrated NAS with global namespace
Brocade FME In-band Appliance DFS/CIFS initially with NFS in the works
Brocade StorageX Out-of-band Host SWDFS and NIS; also does data migration; also NetApp VFM and HDS
Data Domain In-band Appliance or host SWDeduplication NAS/VTL/OST target for block storage
EMC Rainfinity In-band Appliance or host SW DFS management
F5 Acopia In-band Appliance Split-path architecture, non-DFS
Microsoft DFS Out-of-band Host SWWindows/SMB only; Server 2003 R2+ enhanced management
Network Appliance vFiler In-band Clustered NASClustered NAS “head” with global namespace
ONStor GNS In-band Clustered NAS & DFSCombines clustered NAS with DFS into a single global namespace
Reldata Unified Storage In-Band Clustered NAS NAS and IP SAN
Transformed Storage Systems
Virtualization technology is common in storage array
controllers• Arrays create large RAID sets and “carve out” virtual LUNs for use by
servers
• Controller clusters (and grids) redirect activity based on workload and
availability
• Snapshots/mirrors and replication are common features
A new generation arrays with virtualization features is
appearing, with tiered storage, thin provisioning,
migration, de-duplicationSub-disk RAID = the end of RAID as we know it?
Virtual Tiered Storage
Array controllers can transparently move data from
low-cost to high-performance disk
Most arrays support multiple drive types• “Bulk” SATA or SAS drives are common (500 GB - 1 TB)
• Solid-state drives are the latest innovation
Some arrays can dynamically load balance
A few can “hide” other arrays “behind”SAN: HDS USP-V and similar from Sun, HP
NAS: Network Appliance vFiler, ONStor Bobcat
Thin Provisioning
Storage is commonly over-allocated to servers
Some arrays can “thinly” provision just the capacity that actually
contains data• 500 GB request for new project, but only 2 GB of initial data is written – array only
allocates 2 GB and expands as data is written
• Symantec API, thin-unprovisioning capabilities
What’s not to love?• Oops – we provisioned a petabyte and ran out of storage
• Chunk sizes and formatting conflicts
• Can it thin unprovision?
• Can it replicate to and from thin provisioned volumes?
Thin provisioning is an abdication of our responsibilities!
De-Duplication
The next frontier – efficiently storing duplicate content• More appropriate to some applications than others
Software or appliance (and now array!) analyzes files or
blocks, saving duplicates just once• Block-based reduce capacity more by looking inside files
• Once common only for archives, now available for production data
Serious implications for performance and capacity
utilization• In-line devices process all data before it is written
• Post-processing systems scan written data for duplicates
“Cloud” Storage
Many companies are choosing managed
services for servers and storage
Lots of managed archive and backup providers• Zantaz, Google Postini, EMC Mozy, Symantec SPN, etc
Managed storage services is coming into its
own (finally!)• Amazon S3 and Nirvanix
• EMC “Fortress”
The Next-Generation Data Center
Virtualization of server and storage will
transform the data center• Clusters of capability host virtual servers
• Cradle to grave integrated management
SAN/network convergence is next• InfiniBand offers converged virtual connectivity today
• iSCSI and FCoE become datacenter Ethernet (DCE) with
converged network adapters (CNAs)
Question?
Audience Response
Break sponsored by
Part 2:Storage in the Virtual World
Responding to the demands of server, application, and business users with new flexible technologies
Agenda
Why virtual storage for virtual servers?
The real world impact and benefits
Best practices for implementation
Poll: Who Is Using VMware?
VMwareNone
Microsoft Other
Virtualization Users
Poll: Does Server Virtualization Improve Storage Utilization?
Why Use Virtual Storage For Virtual Servers?
1. Mobility of virtual machines
between physical servers for
load balancing
2. Improved disaster recovery
3. Higher availability
4. Enabling physical server
upgrades
5. Operational recovery of
virtual machine images
Server Virtualization = SAN and NAS
Server virtualization has transformed the
data center and storage requirements• VMware is the #1 driver of SAN adoption today!
• 60% of virtual server storage is on SAN or NAS (ESG 2008)
• 86% have implemented some server virtualization (ESG 2008)
Server virtualization has enabled and
demanded centralization and sharing of
storage on arrays like never before!
Three Pillars of VM Performance
Server Virtualization Recoil
Dramatically increased I/O
Patchwork of support, few standards“VMware mode” on storage arrays
Virtual HBA/N_Port ID Virtualization (NPIV)
Everyone is qualifying everyone and jockeying for position
Can be “detrimental” to storage utilization
Befuddled traditional backup, replication,
reporting
VMware Storage Options:Shared Storage
Shared storage - the common/ workstation approach• Stores VMDK image in VMFS
datastores
• DAS or FC/iSCSI SAN
• Hyper-V VHD is similar
Why?• Traditional, familiar, common (~90%)
• Prime features (Storage VMotion, etc)
• Multipathing, load balancing, failover*
But…• Overhead of two storage stacks (5-8%)
• Harder to leverage storage features
• Often shares storage LUN and queue
• Difficult storage management
VMHost
GuestOS
DAS or SANStorage
VMFS VMDK
VMware Storage Options:Shared Storage on NFS
Shared storage on NFS – skip VMFS and use NAS• NTFS is the datastore
Wow!• Simple – no SAN
• Multiple queues
• Flexible (on-the-fly changes)
• Simple snap and replicate*
• Enables full Vmotion
• Use fixed LACP for trunking
But…• Less familiar (3.0+)
• CPU load questions
• Default limited to 8 NFS datastores
• Will multi-VMDK snaps be consistent?
VMHost
GuestOS
NFSStorage
VMDK
VMware Storage Options:Raw Device Mapping (RDM)
Raw device mapping (RDM) - guest VM’s access storage directly over iSCSI or FC• VM’s can even boot from raw devices
• Hyper-V pass-through LUN is similar
Great!• Per-server queues for performance
• Easier measurement
• The only method for clustering
But…• Tricky VMotion and DRS
• No storage VMotion
• More management overhead
• Limited to 256 LUNs per data center
VMHost
GuestOS
SAN Storage
Mapping File
I/O
Physical vs. Virtual RDM
Virtual Compatibility Mode
Appears the same as a
VMDK on VMFS
Retains file locking for
clustering
Allows VM snapshots,
clones, VMotion
Retains same
characteristics if storage is
moved
Physical Compatibility Mode
Appears as a LUN on a
“hard” host
Allows V-to-P clustering,a
VMware locking
No VM snapshots, VCB,
VMotion
All characteristics and SCSI
commands (except “Report
LUN”) are passed through –
required for some SAN
management software
Physical vs. Virtual RDM
Poll: Which VMware Storage Method Performs Best?
Mixed Random I/O CPU Cost Per I/O
Source: “Performance Characterization of VMFS and RDM Using a SAN”, VMware Inc., 2008
VMFS,RDM (p), or RDM (v)
Which Storage Protocol is For You?
FC, iSCSI, NFS all work well
• Most production VM data is on FC
• Either/or? - 50% use a combination (ESG 2008)
• Leverage what you have and are familiar with
For IP storage
• Use TOE cards/iSCSI HBAs
• Use a separate network or VLAN
• Is your switch backplane fast?
• No VM Cluster support with iSCSI*
For FC storage
• 4 Gb FC is awesome for VM’s
• Get NPIV (if you can)
Fibre Channel47%
Direct-At-tached Storage (DAS)24%
Network-At-tached Storage (NAS)22%
iSCSI SAN7%
Storage Protocols for Server Virtualization
(IDC, Dec 2007)
Poll: Which Storage Protocol Performs Best?
Throughput by I/O Size CPU Cost Per I/O
Source: “Comparison of Storage Protocol Performance”, VMware Inc., 2008
Fibre Channel,NFS,iSCSI (sw),iSCSI (TOE)
Storage Configuration Best Practices
Separate operating system and application data• OS volumes (C: or /) on a different VMFS or LUN from applications (D: etc)
• Heavy apps get their own VMFS or raw LUN(s)
Optimize storage by application• Consider different tiers or RAID levels for OS, data, transaction logs -
automated tiering can help
• No more than one VMFS per LUN
• Less than 16 production ESX .VMDKs per VMFS
Get thin• Deduplication can have a huge impact on VMDKs created from a template!
• Thin provisioning can be very useful – Thin disk is in Server, not ESX!?!
Why NPIV Matters
N_Port ID Virtualization (NPIV) gives
each server a unique WWN
• Easier to move and clone* virtual servers
• Better handling of fabric login
• Virtual servers can have their own LUNs,
QoS, and zoning
• Just like a real server!
When looking at NPIV, consider:
• How many virtual WWNs does it support?
T11 spec says “up to 256”
• OS, virtualization software, HBA, FC
switch, and array support and licensing
• Can’t upgrade some old hardware for
NPIV, especially HBAs
Virtual Server
Virtual Server
Virtual Server
21:00:00:e0:8b:05:05:04
Without NPIV
Virtual Server
Virtual Server
Virtual Server
…05:05:05
With NPIV
…05:05:06 …05:05:07
Virtualization-Enabled Disaster Recovery
DR is a prime beneficiary of server and storage
virtualization• Fewer remote machines idling
• No need for identical equipment
• Quicker recovery (RTO) through preparation and automation
Who’s doing it?• 26% are replicating server images, an additional 39% plan to (ESG 2008)
• Half have never used replication before (ESG 2008)
News: VMware Site Recovery Manager (SRM) integrates
storage replication with DR
Enhancing Virtual Servers with Storage Virtualization
Mobility of server and storage images enhances load balancing,
availability, and maintenance• SAN and NAS arrays can snap and replicate server images
• VMotion moves the server, Storage VMotion (new in 3.5) moves the storage
between shared storage locations
Virtualization-optimized storage• Pillar and HDS claim to tweak allocation per VM
• Many vendors announcing compatibility with VMware SRM
• Most new arrays are NPIV-capable
Virtual storage appliances• LeftHand VSA – A virtual virtualized storage array
• FalconStor CDP – a virtual CDP system
Enabling Virtual Backup
Virtual servers cause havoc for traditional client/server
backups• I/O crunch as schedules kick off – load is consolidated instead of balanced
• Difficult to manage and administer (or even comprehend!)
Storage virtualization can help• Add disk to handle the load (VTL)
• Switch to alternative mechanisms (snapshots, CDP)
Consider VMware consolidated backup (VCB)• Snapshot-based backup of shared VMware storage
• Block-based backup of all VMDKs on a physical server
Question?
Audience Response
Break sponsored by
Part 3:Should You Virtualize?
A look at the practical benefits of virtualized storage
Agenda
Pooling for efficiency, flexibility, and scalability
Performance
Stability, availability, and recoverability
The down side
Cost benefit analysis
Where will you virtualize?
Pooling:Flexibility, and Scalability
Effective allocation of resources• The right amount of storage for the application
• The right type (tiered storage)
Quickly add and remove on demand
Move storage from device to another• Tiering, expansion, retirement
Larger systems have fewer capacity limitations
How Green Am I?
Server virtualization can dramatically reduce power,
cooling, and space requirements• Fewer physical servers
• Better (any) power management
Storage virtualization offers fewer green benefits• Does not normally reduce equipment footprint
• Enterprise storage systems not very energy efficient
Transformed storage systems might help• De-duplication, tiered storage, and archiving can slow growth
• New MAID and spin-down devices offer power/cooling savings
Performance
A battle royale between in- and out-of-band!• In-band virtualization can improve performance with caching
• Out-of-band stays out of the way, relying on caching at the device
level
• Split-path adds scalability to in-band
Large arrays perform better (usually) than lots of
tiny RAIDs or disks• First rule of performance: Spindles
• Second rule of performance: Cache
• Third rule of performance: I/O Bottlenecks
Solid State Drives (and Myths)
The new (old) buzz
• RAM vs. NAND flash vs. disk
• EMC added flash drives to the DMX
(CX?) as “tier-0”, CEO Joe Tucci
claims flash will displace high-end
disk after 2010
• Sun, HP adding flash to the server as
a cache
• Gear6 caches NAS with RAM
But…
• Are they reliable?
• Do they really perform that well?
• Will you be able to use them?
• Is the 10x-30x cost justified?
• Do they really save power?
1 GB Thumb Drive
64 GB MacBook
Air
146 GB Symmetrix
SSD
Capacity (MB) 977 62,500 142,578
Max. Write Rate (MB/s) 1 14 42 115
Min. Endurance (writes) 2 100,000 100,000 1,000,000
Capacity x EnduranceWrite Rate
7,000,000 seconds
149 million seconds
1.24 billion seconds
Minimum Lifespan at Max. Write Rate 81 days 4.7 years 39 years
Notes: 1 – No one writes this fast 24x7 2 – Manufacturers claim 2x to 10x better endurance
Stability, Availability, and Recoverability
Replication creates copies of storage in
other locations• Local replicas (mirrors and snapshots) are usually
frequent and focused on restoring data in daily use
• Remote replicas are used to recover from disasters
Virtualization can ease replication• Single point of configuration and monitoring
• Can support different hardware at each location
We Love It!
Efficiency, scalability, performance,
availability, recoverability, etc…
Without virtualization, none of this can
happen!
The Down Side
Consolidation and centralization creates bigger
baskets for your precious data
Downtime and performance affect more systems
Harder to back out if unsatisfied
Additional complexity and interoperability
concerns
Scalability issues - ever-bigger systems
Implementation Issues
Many virtualization systems require additional
software loaded on servers• Device drivers, path managers, agents, “shims”
Additional maintenance and configuration can offset
“single pane” benefits
Organizational issues can crop up• Virtualization blurs the lines between who owns what
• Future datacenter combines server, storage, network
• What about application?
Cost Benefit Analysis
Benefits Improved utilization
Tiering lowers per-GB cost
Reduced need for
proprietary technologies
Potential reduction of
administrative/ staffing costs
Flexibility boosts IT response
time
Performance boosts
operational efficiency
Costs Additional hardware and
software cost
Added complexity, vendors
Training and daily
management
Reporting and
incomprehensibility
Possible negative
performance impact
Stability and reliability
concerns
Where Will You Virtualize?
Pro Con Best For…
Host volume manager
•Full featured•Proven•Widely available
•Impacts on CPU, I/O, RAM•Can’t benefit across systems
Smaller shops or anyone seeking flexibility, especially with recent storage investment
In-band network
•Consolidates assets•Replication•Caching
•Sticky•High cost•Performance hit?
Large environments with heterogeneous storage looking to consolidate and add flexibility
Out-of-band network
•Consolidates assets•Low impact•Medium cost
•Client impact•Support matrix
Large Windows and other supported technology environments concerned about adding a “choke point”
Storage system
•Performance•Familiarity•Enhanced tiered storage
•Support matrix•High cost•Scalability concerns
Shops standardized on the high-end systems offering this kind of virtualization
Closing Thought:What Is Virtualization Good For?
Virtualization is a technology not a product
What will you get from using it?• Better DR?
• Improved service levels and availability?
• Better performance?
• Shortened provisioning time?
The cost must be justified based on business
benefit, not cool technology
Audience Response
Questions?
Stephen Foskett
Contoural, Inc.
http://blog.fosketts.net