1© Copyright 2010 EMC Corporation. All rights reserved.
Indianapolis VMUGNext Generation Best Practices for Storage and VMware
Brian LewisvSpecialist – Central US
2© Copyright 2010 EMC Corporation. All rights reserved.
The “Great” Protocol Debate• Every protocol can Be Highly Available, and generally, every
protocol can meet a broad performance band
• Each protocol has different configuration considerations
• Each Protocol has a VMware “super-power”, and also a “kryponite”
• In vSphere, there is core feature equality across protocols
Conclusion: there is no debate – pick what works for you!
The best flexibility comes from a combination of
VMFS and NFS
3© Copyright 2010 EMC Corporation. All rights reserved.
Key things to know – “A – F”Best Practices circa 2010/2011
4© Copyright 2010 EMC Corporation. All rights reserved.
“A”
Best Practices circa 2010/2011Leverage Key Documentation
5© Copyright 2010 EMC Corporation. All rights reserved.
Key Papers
Key VMware Docs:• Fibre Channel SAN Configuration
Guide
• iSCSI SAN Configuration Guide
• Storage/SAN Compatibility Guide
…Understand VMware Storage Taxonomy:
• Active/Active (LUN ownership)
• Active/Passive (LUN ownership)
• Virtual Port (iSCSI only)
Highly Recommended Reading:
6© Copyright 2010 EMC Corporation. All rights reserved.
Use your storage partners docs:• Each Array is very different, arrays vary more vendor to vendor
than servers do
• Find, read, and stay current on your array’s Best Practices Doc – most are excellent
• Even if you’re NOT the storage team, read them – it will help you
http://www.emc.com/collateral/hardware/solution-overview/h2529-vmware-esx-svr-w-symmetrix-wp-ldv.pdfhttp://www.emc.com/collateral/hardware/technical-documentation/h5536-vmware-esx-srvr-using-celerra-stor-sys-wp.pdfhttp://www.emc.com/collateral/software/solution-overview/h2197-vmware-esx-clariion-stor-syst-ldv.pdf
Techbooks:Highly Recommended Reading:
7© Copyright 2010 EMC Corporation. All rights reserved.
“B”
Best Practices circa 2010/2011Configure Multipathing
8© Copyright 2010 EMC Corporation. All rights reserved.
Understanding the vSphere Pluggable Storage Architecture (PSA)
9© Copyright 2010 EMC Corporation. All rights reserved.
What’s “out of the box” in vSphere 4.1?[root@esxi ~]# vmware -vVMware ESX 4.1.0 build-260247 [root@esxi ~]# esxcli nmp satp listName Default PSP DescriptionVMW_SATP_SYMM VMW_PSP_FIXED Placeholder (plugin not loaded)VMW_SATP_SVC VMW_PSP_FIXED Placeholder (plugin not loaded)VMW_SATP_MSA VMW_PSP_MRU Placeholder (plugin not loaded)VMW_SATP_LSI VMW_PSP_MRU Placeholder (plugin not loaded)VMW_SATP_INV VMW_PSP_FIXED Placeholder (plugin not loaded)VMW_SATP_EVA VMW_PSP_FIXED Placeholder (plugin not loaded)VMW_SATP_EQL VMW_PSP_FIXED Placeholder (plugin not loaded)VMW_SATP_DEFAULT_AP VMW_PSP_MRU Placeholder (plugin not loaded)VMW_SATP_ALUA_CX VMW_PSP_FIXED_AP Placeholder (plugin not loaded)VMW_SATP_CX VMW_PSP_MRU Supports EMC CX that do not use the ALUA protocolVMW_SATP_ALUA VMW_PSP_RR Supports non-specific arrays that use the ALUA protocolVMW_SATP_DEFAULT_AA VMW_PSP_FIXED Supports non-specific active/active arraysVMW_SATP_LOCAL VMW_PSP_FIXED Supports direct attached devices
10© Copyright 2010 EMC Corporation. All rights reserved.
What’s “out of the box” in vSphere?
PSPs:
•Fixed (Default for Active-Active LUN ownership models)
– All IO goes down preferred path, reverts to preferred path after original path restore
•MRU (Default for Active-Passive LUN ownership models)
– All IO goes down active path, stays after original path restore
•Round Robin– n IO operations goes down active path then rotate (default is
1000)
11© Copyright 2010 EMC Corporation. All rights reserved.
What’s “out of the box” in vSphere?
HOWTO – setting PSP for a specific device (can override
default selected by SATP detected ARRAYID):
esxcli nmp device setpolicy --device <device UID> --psp
VMW_PSP_RR (check with your vendor first!)
12© Copyright 2010 EMC Corporation. All rights reserved.
Or the New Way…
13© Copyright 2010 EMC Corporation. All rights reserved.
Changing Round Robin IOOperationLimit
esxcli nmp roundrobin setconfig --device <device UID> –iops
check with your storage vendor first! This setting can cause problems on arrays. Has been validated ok, but not necessary in most cases
14© Copyright 2010 EMC Corporation. All rights reserved.
Effect of different RR IOOperationLimit settings
NOTE: This is with a SINGLE LUN.
This is the case where the larger IOOperationLimit
default is the worst
In a real-world environment – lots of LUNs and VMs results in decent
overall loadbalancing
Recommendation – if you can, stick with the default
15© Copyright 2010 EMC Corporation. All rights reserved.
What is Asymmetric Logical Unit (ALUA)?
• Many storage arrays have Active/Passive LUN ownership
• All paths show in the vSphere Client as:
– Active (can be used for I/O)– I/O is accepted on all ports– All I/O for a LUN is serviced on its
owning storage processor
In reality some paths are preferred over
others
Enter ALUA to solve this issue– Supported introduced in vSphere 4.0
SP A SP B
LUN
16© Copyright 2010 EMC Corporation. All rights reserved.
What is Asymmetric Logical Unit (ALUA)?
ALUA Allows for paths to be profiled–Active (can be used for I/O)–Active (non-optimized – not normally used for I/O)
–Standby–Dead
• Ensures optimal path selection/usage by vSphere PSP and 3rd Party MPPs
–Supports Fixed, MRU, & RR PSP–Supports EMC PowerPath/VE
• ALUA is not supported in ESX 3.5
SP A SP B
LUN
19© Copyright 2010 EMC Corporation. All rights reserved.
PowerPath – a Multipathing Plugin (MPP)• Simple Storage manageability
– Simple Provisioning = “Pool of Connectivity”
– Predictable and consistent– Optimize server, storage, and data-
path utilization
• Performance and Scale– Tune infrastructure performance,
LUN/Path Prioritization– Predictive Array Specific Load
Balancing Algorithms– Automatic HBA, Path, and storage
processor fault recovery
• Other 3rd party MPPs:– Dell/Equalogic PSP
• Uses a “least deep queue” algorithm rather than basic round robin
• Can redirect IO to different peer storage nodesSTORAGE
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
PowerPath PowerPath PowerPath PowerPath
SharedStorage
20© Copyright 2010 EMC Corporation. All rights reserved.
NFS Considerations
21© Copyright 2010 EMC Corporation. All rights reserved.
General NFS Best Practices Start with Vendor Best Practices:
– EMC Celerra H5536 & NetApp TR-3749– While these are constantly being updated, at any given time, they are
authoritative
Use the EMC & NetApp vCenter plug-ins, automates best practices
Use Multiple NFS datastores & 10GbE 1GbE requires more complexity to address I/O scaling due to
one data session per connection with NFSv3
22© Copyright 2010 EMC Corporation. All rights reserved.
General NFS Best Practices - Timeouts Configure the following on
each ESX server (automated by vCenter plugins):
NFS.HeartbeatFrequency = 12 NFS.HeartbeatTimeout = 5 NFS.HeartbeatMaxFailures = 10
Increase Guest OS time-out values to match
– Back up your Windows registry. – Select Start>Run, regedit– In the left‐panel hierarchy view,
double‐click HKEY_LOCAL_MACHINE> System> CurrentControlSet> Services> Disk.
– Select the TimeOutValue and set the data value to 125 (decimal).
– Note: this is not reset when VMtools are updated
Increase Net.TcpIpHeapSize (follow vendor recommendation)
23© Copyright 2010 EMC Corporation. All rights reserved.
General NFS Best Practices – Traditional
Ethernet switches Mostly seen with older 1GbE switching platforms
Each switch operates independently
More complex network design
Depends on routing, requires two (or more) IP subnets for datastore trafficMultiple Ethernet options based on Etherchannel capabilities and preferencesSome links may be passive standby links
24© Copyright 2010 EMC Corporation. All rights reserved.
General NFS Best Practices – Multi-
Switch Link Aggregation Allows two physical switches to operate as a single logical fabric
Much simpler network design
Single IP subnetProvides multiple active connections to each storage controllerEasily scales to more connections by adding NICs and aliasesStorage controller connection load balancing is automatically managed by the EtherChannel IP load-balancing policy
26© Copyright 2010 EMC Corporation. All rights reserved.
• What is an Ethernet Jumbo Frame?– Ethernet frames with more than 1500 bytes of payload (9000 is common)– Commonly ‘thought of’ as having better performance due to greater
payload per packet / reduction of packets
Should I use Jumbo Frames?– Supported by all major storage vendors & VMware– Adds complexity & performance gains are marginal with common block
sizes– FCoE uses MTU of 2240 which is auto-configured via switch and CAN
handshake• All IP traffic transfers at default MTU size
Stick with the defaults when you can
iSCSI & NFS – Ethernet Jumbo Frames
28© Copyright 2010 EMC Corporation. All rights reserved.
Summary of “Setup Multipathing Right”• VMFS/RDMs
– Round Robin policy for NMP is default BP on most storage platforms
– PowerPath/VE further simplifies/automates multipathing on all EMC (and many non-EMC) platforms.
• Notably supports MSCS/WSFC including vMotion and VM HA
• NFS– For load balancing, distribute VMs across multiple
datastores on multiple I/O paths. Follow the resiliency procedure in the TechBook to ensure VM resiliency to storage failover and reboot over NFS
29© Copyright 2010 EMC Corporation. All rights reserved.
“C”
Best Practices circa 2010/2011Track Alignment
30© Copyright 2010 EMC Corporation. All rights reserved.
“Alignment = good hygiene”• Misalignment of filesystems results in additional
work on storage controller to satisfy IO request
• Affects every protocol, and every storage array– VMFS on iSCSI, FC, & FCoE LUNs– NFS– VMDKs & RDMs with NTFS, EXT3, etc
• Filesystems exist in the datastore and VMDK
Chunk ChunkChunk
VMFS 1MB-8MB
Array 4KB-64KB
Block
Datastore Alignment
VMFS 1MB-8MB
Array 4KB-64KB
31© Copyright 2010 EMC Corporation. All rights reserved.
“Alignment = good hygiene”• Misalignment of filesystems results in additional
work on storage controller to satisfy IO request
• Affects every protocol, and every storage array– VMFS on iSCSI, FC, & FCoE LUNs– NFS– VMDKs & RDMs with NTFS, EXT3, etc
• Filesystems exist in the datastore and VMDK
VMFS 1MB-8MB
Array 4KB-64KB
Datastore Alignment
Chunk ChunkChunk
BlockVMFS 1MB-8MB
Array 4KB-64KB
32© Copyright 2010 EMC Corporation. All rights reserved.
“Alignment = good hygiene”• Misalignment of filesystems results in additional
work on storage controller to satisfy IO request
• Affects every protocol, and every storage array– VMFS on iSCSI, FC, & FCoE LUNs– NFS– VMDKs & RDMs with NTFS, EXT3, etc
• Filesystems exist in the datastore and VMDK
VMFS 1MB-8MB
Array 4KB-64KB
GuestAlignment
Cluster
Chunk
Cluster
Chunk
Cluster
Chunk
Block
FS 4KB-1MB
VMFS 1MB-8MB
Array 4KB-64KB
33© Copyright 2010 EMC Corporation. All rights reserved.
“Alignment = good hygiene”• Misalignment of filesystems results in additional
work on storage controller to satisfy IO request
• Affects every protocol, and every storage array– VMFS on iSCSI, FC, & FCoE LUNs– NFS– VMDKs & RDMs with NTFS, EXT3, etc
• Filesystems exist in the datastore and VMDK
Cluster
Chunk
Cluster
Chunk
Cluster
Chunk
BlockVMFS 1MB-8MB
Array 4KB-64KB
GuestAlignment
FS 4KB-1MB
34© Copyright 2010 EMC Corporation. All rights reserved.
Alignment – Best Solution: “Align VMs”• VMware, Microsoft, Citrix, EMC all agree, align partitions
– Plug-n-Play Guest Operating Systems• Windows 2008, Vista, & Win7
They just work as their partitions start at 1MB
– Guest Operating Systems requiring manual alignment• Windows NT, 2000, 2003, & XP (use diskpart to set to 1MB)• Linux (use fdisk expert mode and align on 2048 = 1MB)
35© Copyright 2010 EMC Corporation. All rights reserved.
Alignment – “Fixing after the fact”• VMFS is misaligned
– Occurs If you created the VMFS via CLI and not via vSphere client and didn’t specify an offset.
– Resolution:• Step 1: Take an array snapshot/backup• Step 2: Create new datastore & migrate VMs using SVMotion
• Filesystem in the VMDK is misaligned– Occurs If you are are using older OSes and didn’t align when you created
the guest filesystem– Resolution:
• Step 1: Take an array snapshot/backup• Step 2: Use tools to realign (all VM to be shutdown)
GParted (free, but some assembly required) Quest vOptimizer (good mass scheduling and reporting)
36© Copyright 2010 EMC Corporation. All rights reserved.
“D”
Best Practices circa 2010/2011
Utilize free vCenter plugins and VAAI
37© Copyright 2010 EMC Corporation. All rights reserved.
“Leverage Free Plugins and VAAI” • Use Vendor plug-ins for VMware vSphere
– All provide better visibility– Some provide integrated provisioning– Some integrate array features like VM snapshots,
dedupe, compression and more– Some automate multipathing setup– Some automate best practices and remediation– Most are FREE
38© Copyright 2010 EMC Corporation. All rights reserved.
VAAIvStorage APIs for Array Integration
Block Zero
What: 10x less IO for common tasks
How: Eliminating redundant and repetitive write commands – just tell the array to repeat via SCSI commands
Block Zero
What: 10x less IO for common tasks
How: Eliminating redundant and repetitive write commands – just tell the array to repeat via SCSI commands
Full Copy
What: 10x faster VM deployment, clone, snapshot, and Storage VMotion
How: leveraging array ability to mass copy, snapshot, and move blocks via SCSI commands
Full Copy
What: 10x faster VM deployment, clone, snapshot, and Storage VMotion
How: leveraging array ability to mass copy, snapshot, and move blocks via SCSI commands
Hardware Assisted Locking
What: 10x more VMs per datastore
How: stop locking LUNs and start only locking blocks.
Hardware Assisted Locking
What: 10x more VMs per datastore
How: stop locking LUNs and start only locking blocks.
39© Copyright 2010 EMC Corporation. All rights reserved.
“What? VAAI isn’t working….”
How do I know?Testing Storage VMotion/Cloning with no-offload versus Offload
What do I do: –Ensure running on VAAI compliant Array code level–Ensure the storage initiators for the ESX host are configured with ALUA on,– look at IO bandwidth in vSphere client and storage array–Benefit tends to be higher when svmotion across SPs–Biggest benefit isn’t any single operation being faster, but rather overall system (vSphere, network, storage) load lightened
40© Copyright 2010 EMC Corporation. All rights reserved.
“E”
Best Practices circa 2010/2011Keep it Simple
41© Copyright 2010 EMC Corporation. All rights reserved.
“Keep it Simple on Layout”• Use VMFS and NFS together – no reason not to
• Strongly consider 10GbE, particularly for new deployments
• Avoid RDMs, use “Pools” (VMFS or NFS)
• Make the datastores big– VMFS – make them ~1.9TB in size (2TB – 512 bytes is the max for a single
volume), 64TB for a single filesystem– NFS – make them what you want (16TB is the max)
• With vSphere 4.0 and later, you can have ___ VMs per VMFS datastore
• On the array, default to Storage Pools, not traditional RAID Groups / Hypers / Metas
• Default to single extent VMFS datastores
• Default to Thin Provisioning models at the array level, optionally at the VMware level.
– Make sure you enable vCenter managed datastore alerts– Make sure you enable Unisphere/SMC thin provisioning alerts and auto-
expansion
• Use “broad” data services – i.e. FAST, FAST Cache (things that are “set in one place”)
42© Copyright 2010 EMC Corporation. All rights reserved.
“F”
Best Practices circa 2010/2011Use SIOC (if you can)
43© Copyright 2010 EMC Corporation. All rights reserved.
“Use SIOC if you can”• “If you can” equals:
– vSphere 4.1, Enterprise Plus– VMFS (NFS targeted for future vSphere releases – not purely a qual)
• Enable it (not on by default), even if you don’t use shares – will ensure no VM swamps the others
• Bonus is you will get guest-level latency alerting!
• Default threshold is 30ms– Leave it at 30ms for 10K/15K, increase to 50ms for 7.2K, decrease to 10ms for
SSD– Fully supported with array auto-tiering - leave it at 30ms for FAST pools
• Hard IO limits are handy for View use cases
• Some good recommended reading:– http://www.vmware.com/files/pdf/techpaper/VMW-vSphere41-SIOC.pdf – http://virtualgeek.typepad.com/virtual_geek/2010/07/vsphere-41-sioc-and-array-
auto-tiering.html– http://virtualgeek.typepad.com/virtual_geek/2010/08/drs-for-storage.html – http://www.yellow-bricks.com/2010/09/29/storage-io-fairness/
44© Copyright 2010 EMC Corporation. All rights reserved.
Best Practices circa 2010/2011General ‘Gotchas’
45© Copyright 2010 EMC Corporation. All rights reserved.
“My storage team gives me tiny devices”
How do I know?“My storage team can only give us 240GB”
What do I do:•This means you have an “oldey timey” storage team •Symmetrix uses hyper devices, and hypers are assembled into meta devices (which then are presented to hosts)•Hyper devices have a maximum of 240GB•Configuring meta devices is EASY
46© Copyright 2010 EMC Corporation. All rights reserved.
“My NFS based VM is impacted following a storage reboot or failover”
How do I know?
VM freezes or, even worse, crashes
What do I do:– Check your ESX NFS timeout settings compared
to your storage partners recommendations (EMC – see the techbook)
– Review your VM and guest OS settings for resiliency. See TechBook for detailed procedure on VM resiliency
47© Copyright 2010 EMC Corporation. All rights reserved.
Best Practices circa 2010/2011
When do the best practices not apply?
48© Copyright 2010 EMC Corporation. All rights reserved.
5 Exceptions to the rules1. Create “planned datastore designs” (rather than big pools and correct
after the fact) for larger IO use cases (View, SAP, Oracle, Exchange)– Use the VMware + Array Vendor reference architectures – bake the cake– Over time, SIOC may prove to be a good approach– Some relatively rare cases where large spanned VMFS datastores make
sense
2. When NOT to used “datastore pools”, but pRDMs (narrow use cases!)– MSCS/WSFC– Oracle – pRDMs and NFS can do rapid VtoP with array snapshots
3. When NOT to use NMP Round Robin– Arrays that are not active/active AND use ALUA using only SCSI-2
4. When NOT to use array thin-provisioned devices– Datastores with extremely high amount of small block random IO – In FLARE 30, always use storage pools, LUN migrate to thick devices if
needed
5. When NOT to use the vCenter plugins? Trick question – always “yes”
49© Copyright 2010 EMC Corporation. All rights reserved.
THANK YOU – AND COME & PLAY!
Win a 320GB eGo Drive at the booth!
Lab 1: EMC vCenter Plugin TourLab 2: Virtual Storage IntegratorLab 3: vStorage APIs (VAAI) with CLARiiONLab 4: VPLEX GUI TourLab 5: UIM v2 TourLab 6: Unisphere GUI Tour
Hands on Labs in Room 101 H