Networking: Virtual SAN's Backbone Bhumik Patel, VMware, Inc John Kennedy, Cisco STO4474 #STO4474
Networking:Virtual SAN's Backbone
Bhumik Patel, VMware, IncJohn Kennedy, Cisco
STO4474
#STO4474
CONFIDENTIAL 2
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
CONFIDENTIAL 3
Agenda
1. Virtual SAN Networking Requirements
2. How Multicasting Works
3. Best Practices
4. Switch Configurations
5. Troubleshooting Tips
CONFIDENTIAL 4
Unprecedented Customer Momentum for Virtual SAN
2000+ Customers since launch
In my experience VMware solutions are rock solid…we’re ready to nearly double our VSAN deployment.“
”It really did work as advertised…the fact that I have been able to set it and forget it is huge!“
”
CONFIDENTIAL 5
Networking for Virtual SAN:
The Implications
You can find yourself responsible for creating and maintaining Virtual SAN, without authority or control over the networking!!
The Complications
• Virtual SAN is heavily reliant on networking
• Virtual SAN uses multicasting, which even network admins may find daunting
• As a server/virtualization admin, you may not have control of the network equipment
CONFIDENTIAL 6
Virtual SAN Implementation RequirementsVirtual SAN requires:
– Minimum of 3 hosts in a cluster configuration• 2-NODE ROBO Introduced with 6.1
– All 3 host MUST!!! contribute storage– Maximum of 64 hosts
– Locally and HDDAS attached devices• Magnetic devices • Flash-based devices
esxi-01
vSphere Cluster
esxi-02 esxi-03
local storage
disk group
SSD
HDD
local storage
disk group
SSD
HDD
local storage
disk group
SSD
HDD
CONFIDENTIAL 7
Roles within Virtual SAN & Communications • Three roles in VSAN – Master, Agent & backup• No explicit control over roles by admins
• Master– Elected during cluster discovery – all nodes participate in electing a master– 1 Master responsible for getting CMMDS (clustering service) updates from all nodes– Distributes these updates to agents
• Backup– Assumes master role if master fails; contains full copy of directory contents making failover seamless
• Communications– Between all hosts – via Multicasting & heartbeat is sent from master to all hosts once every second– Between master & backup & agent nodes – Multicasting but relatively light traffic– VM Disk I/O – Majority of traffic - Unicast
CONFIDENTIAL 8
Virtual SAN Networking Requirements
10G network for All-Flash
Virtual Switch & vmnic Multicasting Traffic Isolation & QoS
CONFIDENTIAL 9
Firewall Ports
Virtual SAN Vendor Provider (VSANVP)– Inbound and outbound – TCP 8080 – Unicast; between vCenter and ESXi
Virtual SAN Clustering Service (CMMDS) – Inbound and outbound UDP 12345, 23451 – Multicast;
RDT (Reliable Datagram Transport)– Inbound and outbound – TCP 2233 – Unicast;
VSAN Observer – TCP 8010 Unicast
Multicasting
CONFIDENTIAL 11
Multicasting• Like broadcasting, but only to “subscribers”
• Hosts “register” for multicast traffic acceptance– Done through IGMP protocol– Hosts send a registration message, MROUTER registers and gets packets required from upstream
router ports
• Uses “Class D” address type (any address that starts with 1110)– 224.0.0.1 – 239.255.255.255
• Switch identifies multicast packet by MAC address– Switch then sends multicast packets to all hosts in host group
CONFIDENTIAL 12
IGMP “Snooping” switches were invented. • Switch listens for multicast
registrations, remembers those ports
• Now packets only go to ports that have registered hosts
Multicasting Snooping
In the beginning, multicast packets were sent to the entire switch, all ports• Too much traffic ensued
CONFIDENTIAL 13
Multicast IGMP Snooping ProblemsSnooping gets complicated with port channels (VPC, EtherChannel, LACP). • If a registration is sent on port of switch A, port on switch B never gets listed by IGMP snooping
CISCO NEXUS 3048TPSTAT
ID
13 14 15 16 17 18 19 20 21 22 23 241 2 3 4 5 6 7 8 9 10 11 12 37 38 39 40 41 42 43 44 45 46 47 4825 26 27 28 29 30 31 32 33 34 35 36 3 41 2
CISCO NEXUS 3048TPSTAT
ID
13 14 15 16 17 18 19 20 21 22 23 241 2 3 4 5 6 7 8 9 10 11 12 37 38 39 40 41 42 43 44 45 46 47 4825 26 27 28 29 30 31 32 33 34 35 36 3 41 2
PSU
450W
S A S
PCIe 1
CIM
C
POR
T 1
POR
T 0
PCIe 2
1 2 M
ï¡ ï¡ ï
PSU
450W
S A S
PCIe 1
CIM
C
POR
T 1
POR
T 0
PCIe 2
1 2 M
ï¡ ï¡ ï
IGMP portsNon IGMP ports
CISCO NEXUS 3048TPSTAT
ID
13 14 15 16 17 18 19 20 21 22 23 241 2 3 4 5 6 7 8 9 10 11 12 37 38 39 40 41 42 43 44 45 46 47 4825 26 27 28 29 30 31 32 33 34 35 36 3 41 2
CISCO NEXUS 3048TPSTAT
ID
13 14 15 16 17 18 19 20 21 22 23 241 2 3 4 5 6 7 8 9 10 11 12 37 38 39 40 41 42 43 44 45 46 47 4825 26 27 28 29 30 31 32 33 34 35 36 3 41 2
PSU
450W
S A S
PCIe 1
CIM
C
POR
T 1
PORT 0
PCIe 2
1 2 M
ï¡ ï¡ ï
PSU
450W
S A S
PCIe 1
CIM
C
POR
T 1
POR
T 0
PCIe 2
1 2 M
ï¡ ï¡ ï
IGMP portsNon IGMP ports
RESULT: packets originating on switch A may or may not go to receiving hosts on switch B
OTHER RESULT: uplinks may not allow packets from switch B to traverse to switch A, if switch A uplinks never get listed as being part of a multicast group.
Never gets to host B
Never gets to host A
Multicast packet
Multicast packet
CONFIDENTIAL 14
IGMP Snooping Solution - Querier
Since some of those queries come from
other fabric, uplinks get marked as IGMP
registered ports, allowing all traffic
flow
Sends Query messages, which
cause hosts to re-register
IGMP Querier – acts like a MROUTER
CONFIDENTIAL 15
Spine/leaf vs. core-agg-access
CONFIDENTIAL 16
Port channels/VPCs
Best Practices
CONFIDENTIAL 18
Network Design ConsiderationsBandwidth Requirements• Network bandwidth performance has more impact on host evacuation, rebuild times than
on workload performance
• 10Gb shared with NIOC for QoS is recommended for most environments
• 10Gb shared with NIOC for QoS will support most environments
NIC Teaming for redundancy• Virtual SAN traffic NOT designed to load balance across multiple network interfaces
teamed together
CONFIDENTIAL 19
Network Design ConsiderationsMTU & Jumbo frames • Testing shows reduced CPU utilization & improved
throughput
• Nominal gains due to TCP Segmentation Offload (TSO) & Large Receive Offload (LRO) already used by vSphere
• Ensure end to end configuration – otherwise it can cause high latencies if any element is chopping the frame into 1500 chunks
CONFIDENTIAL 20
Multicast Considerations – L2• Recap: L2 Multicast is required for VMkernel ports utilized by Virtual SAN
• Configuration Options:– IGMP Snooping Enabled with IGMP Snooping Querier Enabled– You can disable IGMP Snooping FOR
• Dedicated & non-routed Virtual SAN VLAN
TIPAsk your networking team to:• Isolate Virtual SAN traffic using VLANs• “trunk” your Virtual SAN VLAN• allow the Virtual SAN VLAN on both Virtual SAN switches with redundancy
CONFIDENTIAL 21
Multicast Considerations – L3
• Support with vSphere 5.5 Patch-04 onwards
• Topology– On L3 switch:
• Enable Multicast routing• Static routes between subnets
– On ESXi host• IP configurations• Static route
CONFIDENTIAL 22
Virtual SAN – Stretched Cluster• Requires 3 Fault domains (Preferred, Secondary & a witness)
– More than 3 Fault Domains not supported
• Fault Domains are expected to mimic sites; Witness Fault domain contains metadata
• Read Locality supported– All Reads are preferred to be local within a fault domain – expected to be faster
• Writes will be replicated as usual – across both sites before a write ack
• Only FTT=1 supported
CONFIDENTIAL 23
Virtual SAN – Stretched Cluster• Network Requirements between data fault domains/sites
– 10Gbps connectivity or greater– Latency: Up to 5ms RTT– L2 or L3 network connectivity with multicast
• Network Requirements to witness fault domain – 100Mbps connectivity – Latency: Up to 100ms RTT– L3 connectivity without multicast
• Network Bandwidth requirements – based on write operations between fault domains– Kbps = (Nodes * Writes * 125)– Deployment of 5+5+5 and ~300 VM would be ~4Gbps
CONFIDENTIAL 24
NIC Teaming for Redundancy – Uplinked Switches• Virtual SAN traffic has NOT been designed
to load balance across multiple NICs when teamed
• Multiple VMkernel adapters on different networks (VLANs, separate physical fabric) are supported but not recommended
• Separate port group for each traffic type & teaming & failover policy for each port group– Allows to use a different active adapter within
the team if possible– Exception: IP Hash-based policy
• vSphere Network I/O Control is supported
CONFIDENTIAL 25
NIC Teaming for Redundancy – Stacked Switches
• IP Hash-based Network adapter teaming policy is supported– Active-Active mode only– All physical switch ports connected to the
acvtive vmnic uplinks must be configured with static EtherChannel or LACP
– vSphere Network I/O Control is supported
CONFIDENTIAL 26
Leaf-Spine• Leaf switches are typically oversubscribed
• For example:– A fully utilized 10Gbe uplink used by VSAN
network might achieve 2.5Gbps throughput with a 4:1 oversubscription ratio
• Check for multicasting traffic configuration between leaf and the spine switches– Check/add static routes to permit traffic
CONFIDENTIAL 27
QoS• Leverage vSphere Network I/O Control to set QoS for Virtual SAN traffic
– Over the same network adapter uplink in a DS shared by other vSphere traffic– Also leverage network adapter teaming to maximize network capacity utilization
• For BW allocation, use “shares” instead of “limits”
• Do not set a limit on the Virtual SAN traffic– Unlimited by default
• Note– This will require use of VDS
Configurations
CONFIDENTIAL 29
Multicast UCS IGMP Config
CONFIDENTIAL 30
Multicast UCS IGMP Config
CONFIDENTIAL 31
Multicast Nexus IGMP Config• switch# configure terminal
• switch(config)# vlan configuration 5
• switch(config-vlan-config)# ip igmp snooping last-member-query-interval 3
• switch(config-vlan-config)# ip igmp snooping querier 172.20.52.106
• switch(config-vlan-config)# ip igmp snooping explicit-tracking
• switch(config-vlan-config)# ip igmp snooping fast-leave
• switch(config-vlan-config)# ip igmp snooping report-suppression
• switch(config-vlan-config)# ip igmp snooping mrouter interface ethernet 1/10
• switch(config-vlan-config)# ip igmp snooping static-group 230.0.0.1 interface ethernet 1/10
• switch(config-vlan-config)# end
• switch#
CONFIDENTIAL 32
Multicast Routing Configuration for L3• switch# configure terminal
• switch(config)# int vlan 401
• switch(config-if)# ip pim dr-priority 4000000000
• switch(config-if)# ip pim dense-mode
• switch(config-if)# ip igmp version 3
• switch# configure terminal
• switch(config)# int vlan 402
• switch(config-if)# ip pim dr-priority 4000000000
• switch(config-if)# ip pim dense-mode
• switch(config-if)# ip igmp version 3
Troubleshooting
CONFIDENTIAL 34
Network Failure – 60 Minute Delay
• Absent – will wait the default time setting of 60 minutes before starting the copy of objects and components onto other disk, disk groups, or hosts
• NIC failures, physical network failures can lead to network partitions– Multiple hosts could be
impacted in the cluster
vsan network
vmdkvmdk witness
esxi-01 esxi-02 esxi-03 esxi-04
vmdk
new mirror copy60 minute wait
Network failure, 60 minutes wait copy of impacted component
raid-1
XSSD SSDSSDSSD
CONFIDENTIAL 35
Issue: Network Misconfiguration Detected
Problem
36
Network Misconfiguration Detected (cont’d)
Verify VSAN configuration• Ensure all hosts have VSAN vmknic configured
– WebClient: Host -> Manage -> Networking -> VMkernel adapters
– On ESX: esxcli vsan network list– In RVC: vsan.cluster_info <cluster>
• Ensure VSAN vmknics are on the right IP subnet (and VLAN)– WebClient: Host -> Manage -> Networking -> VMkernel
adapters– On ESX: esxcli network ip interfaces ipv4 get– In RVC: vsan.cluster_info <cluster>
CONFIDENTIAL CONFIDENTIAL
• Ensure all hosts can ping each other (size 9000 for MTU check):– On ESX: vmkping -I <vmknic> -s 9000
<OtherEsxHostVSANVmknicIP>
• Ensure MTU is set correctly:– On ESX: ‘esxcli network ip interfaces list’ and ‘esxcli
network vswitch standard list’
• Check VSAN multicast settings:– On ESX: esxcli vsan network list
CONFIDENTIAL 37
Steps for Testing Multicast Config
Note: The above commands assume we are using default multicast addresses. If custom addresses have been specified, you will need to substitute the applicable addresses.
Configure each host using VSI to respond to ICMP echo requests that come in via multicast• # vsish –e set
/net/tcpip/instances/defaultTCPipStack/sysctl/_net_inet_icmp_bmcastecho 1
− You must do this on EVERY host in the VSAN cluster
Ping the Multicast group with specifying VSAN vmk interface • # Ping –I <VSAN
vmknic> 224.2.3.4− If the multicast
connectivity is OK, all the hosts should reply
Ping the Multicast master group• # ping –I <VSAN
vmknic> 224.1.2.3− If the multicast
connectivity is OK, only master and backup should reply
CONFIDENTIAL 38
Multicast Configuration Test – Packet Captures
Ensure Multicast is correctly configured by running this packet capture command:
You should see Master & Backup nodes sending the heartbeat to Agent Group Multicast IP
tcpdump–uw –I <Vmkernel port configured for Virtual SAN> -s0 udp port 23451
CONFIDENTIAL 39
Changing Multicast address used for a Virtual SAN ClusterWhen is this required?• If there are multiple Virtual SAN clusters on the same Layer 2 network • In test environments where multiple Virtual SAN are managed by single vCenter
Display output by esxcli vsan network list
Change multicast address on each host using following commands (KB 2075451)
CONFIDENTIAL 40
Virtual SAN Network Performance
• Leverage Virtual SAN Observer along with ESXTOP measurements
• Understand application latency requirements– For VDI testing, < 20ms latency is expected
• IOPS will not be uniform across hosts based on VM placement within the cluster
• Abnormal Latency values usually corresponds with networking issues in the environment
CONFIDENTIAL 41
Key Takeaways
• Make sure vSphere & Virtual SAN networking requirements and best practices are followed for a solid deployment
• Understand the role of multicasting in your Virtual SAN deployment
• Evaluate whether Virtual SAN over L3 &/or over Stretched Cluster is a fit
• Leverage NOT just Virtual SAN networking & troubleshooting guides but also your networking vendor best practices
Networking:Virtual SAN's Backbone
Bhumik Patel, VMware, IncJohn Kennedy, Cisco
STO4474
#STO4474