Tom Adelmeyer, Principal Engineer, Intel Richard A. Brunner, Principal Engineer, VMware FUT3056BU #VMworld #FUT3056BU VMware vSphere Scales on the Amazing Next-Gen Intel Xeon Architecture VMworld 2017 Content: Not for publication or distribution
Tom Adelmeyer, Principal Engineer, IntelRichard A. Brunner, Principal Engineer, VMware
FUT3056BU
#VMworld #FUT3056BU
VMware vSphere Scales on the Amazing Next-Gen Intel Xeon Architecture
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
#FUT3056BU CONFIDENTIAL 2
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
1 Introduction to the Intel® Xeon® Scalable Processor
2 Modernize Your Datacenter
3 Simplify Your Datacenter
4 Accelerate Your Datacenter
5 Summary
#FUT3056BU CONFIDENTIAL 3
VMworld 2017 Content: Not fo
r publication or distri
bution
July 2017: Intel® Xeon® Scalable platformSupported now by VMware vSphere 6.0.u3 and vSphere 6.5
Delivers 1.65x average performance boost over prior Generation1
1 Up to 1.65x Geomean based on Normalized Generational Performance going from Intel® Xeon® processor E5-26xx v4 to Intel® Xeon® Scalable processor (estimated based on Intel internal testing of OLTP Brokerage, SAP SD 2-Tier, HammerDB, Server-side Java, SPEC*int_rate_base2006, SPEC*fp_rate_base2006, Server Virtualization, STREAM* triad, LAMMPS, DPDK L3 Packet Forwarding, Black-Scholes, Intel Distribution for LINPACK
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase. . Configurations: see slides 32, 33
Simplify
Deployment / Provisioning
Modernize
TCO / Security
Accelerate
Compute, Storage, and Network
#FUT3056BU CONFIDENTIAL 4
VMworld 2017 Content: Not fo
r publication or distri
bution
Intel Processor Development*
Intel® Xeon® ProcessorIntel®
Xeon® Scalable Processor
2009 2017
Nehalem
45nm
New Micro-
architecture
Westmere
32nm
New Process
Technology
Sandy
Bridge
32nm
New Micro-
architecture
Ivy Bridge
22nm
New Process
Technology
Haswell
22nm
New Micro-
architecture
Broadwell
14nm
New Process
Technology
Skylake
14nm
New Micro-
architecture
“Purley” Platform“Grantley” Platform“Romley” Platform“Thurley” Platform
* Source: Intel, edited by VMware
Rapid Technology Innovation
#FUT3056BU CONFIDENTIAL 5
VMworld 2017 Content: Not fo
r publication or distri
bution
Intel® Xeon®
Platinum 8100Intel® Xeon®
Silver 4100Intel® Xeon®
Bronze 3100Intel® Xeon®
Gold 6100/5100
Day 0 Support by VMware vSphere 6.0.u3, 6.5.ga, and vSAN 6.2 and 6.6
#FUT3056BU CONFIDENTIAL 6
VMworld 2017 Content: Not fo
r publication or distri
bution
7
• Intel® AVX-512 with 32 DP flops per core
• Data center optimized cache hierarchy: 1MB L2 per core, non-inclusive L3
• New mesh interconnect architecture
• Enhanced memory subsystem
• Intel® Speed Shift Technology
• Optional Integrated Intel® Omni-Path Fabric (Intel® OPA)
• Modular IO with integrated devices
FeaturesIntel® Xeon® Processor
E5-2600 v4Intel® Xeon® Scalable
Processor
Cores Per Socket Up to 22 Up to 28
Threads Per Socket Up to 44 threads Up to 56 threads
Last-level Cache Up to 55 MB Up to 38.5 MB (non-inclusive)
QPI/UPI Speed (GT/s) 2x QPI channels @ 9.6 GT/s Up to 3x UPI @ 10.4 GT/s
PCIe* Lanes/ Controllers/Speed(GT/s)
40 / 10 / PCIe* 3.0 (2.5, 5, 8 GT/s)
48 / 12 / PCIe 3.0 (2.5, 5, 8
GT/s)
Memory Population4 channels up to 3 RDIMMs, LRDIMMs, or 3DS LRDIMMs
6 channels up to 2 RDIMMs, LRDIMMs, or 3DS LRDIMMs
Max Memory Speed Up to 2400 Up to 2666
TDP (W) 55W-145W 70W-205W
Core Core
Core Core
Core Core
Shared L3
UPI
UPI
2 or 3 UPI
6 Channels DDR4
48 Lanes
PCIe* 3.0
DMI3
DDR4
DDR4
DDR4
DDR4
DDR4
DDR4
UPI
Omni-Path HFIOmni-Path
Intel® Xeon® Scalable ProcessorRe-architected from the Ground Up (Code name “Skylake-SP”)
VMworld 2017 Content: Not fo
r publication or distri
bution
Breakthrough CPU Design: Intel® Mesh Architecture
8
✓ Maximizes performance
✓ Enables consistent, low latencies
✓ Optimized for data sharing and memory access between all CPU cores/threads for ideal memory
bandwidth and capacity
✓ Data flows scale efficiently for 2, 4 & 8+ socket configurations
✓ Designed for modern virtualized and hybrid cloud implementations
Designed for next-generation Data Centers
Ring Architecture Mesh Architecture
VMworld 2017 Content: Not fo
r publication or distri
bution
9
Intel® Mesh Architecture: Distributed Caching Home Agent • Intel® UPI caching and home agents are now distributed with each LLC bank
• Distributed CHA benefits
• Reduces traffic on mesh by eliminating home agent to LLC interaction
• Reduces latency by launching snoops earlier; obviates need for different snoop modes
Source: Intel
CHA / Core
CHA / Core
CHA / Core
CHA / Core
CHA / Core
CHA / Core
IMC 0 CHA / Core
CHA / Core
CHA / Core
CHA / Core
IMC 1
UPI0/1
PCIe-2(x16)
PCIe-3(x16)
MeshStop
MeshStop
PCIe-3(x16)
DMI
DIMM0 DIMM1
DIMM0 DIMM1
DIMM0 DIMM1
DIMM0DIMM1
DIMM0DIMM1
DIMM0DIMM1
Core
LLC: 1.375 MB
CHA
(10-core, 2 UPI Example)
VMworld 2017 Content: Not fo
r publication or distri
bution
10
Re-Architected L2 & LLC (Last-Level) Cache Hierarchy
Shared LLC
2.5MB/core
(inclusive)
Core
L2
(256KB
private)
Core
L2
(256KB
private)
Core
L2
(256KB
private)
Shared LLC
1.375MB/core
(non-inclusive)
Core
L2
(1MB
private)
Core
L2
(1MB
private)
Core
L2
(1MB
private)
Previous Architectures Intel® Xeon® Scalable Processor
• On-chip cache balance (Previous) shifted from shared-distributed (Skylake-SP):
• Shared-distributed shared-distributed LLC is primary cache
• Private-local private L2 becomes primary cache with shared LLC used as overflow cache
• Shared LLC changed from inclusive to non-inclusive:
• Inclusive (prior architectures) LLC has copies of all lines in L2
• Non-inclusive (Skylake architecture) lines in L2 may not exist in LLC
“Skylake-SP” cache hierarchy architected specifically for Data center use case
VMworld 2017 Content: Not fo
r publication or distri
bution
Modernize Your Datacenter: Lower Operational Expenses
• Considerations
– Deployment ease
– Server utilization
– Energy costs
– Space
• Up to 4.2x more VMs per server compared to 4 year old server1
• Up to 65% lower total cost of ownership2
– Upgrading from a 4 year old server to the Intel® Xeon® Scalable processor
– Reduced software and OS licensing fees, acquisition, maintenance and infrastructure costs
4-5 Year Old SystemIntel® Xeon® processor
E5-2690[Sandy Bridge: Launch Q1’12]
Intel® Xeon® Platinum
8180
processor
4.2 : 1consolidation1
Support More Virtual Machines Per Server
#FUT3056BU CONFIDENTIAL 12
VMworld 2017 Content: Not fo
r publication or distri
bution
Modernize Your DC w/ Intel® Solid State Drives NewFor Intel® Xeon® Scalable Processors
New line of Intel® 3D NAND
SATA SSDs with the same
rock-solid reliability, enterprise
RAS features, and consistent
performance
Packed with a deep feature
set, Intel® 3D NAND SSDs for
data centers are optimized for
the data caching needs of
cloud storage and software-
defined infrastructures.
Intel® Optane™ SSDs help
eliminate data center storage
bottlenecks and allows bigger,
more affordable data sets. Its
can accelerate applications,
and reduce transactions costs
for latency sensitive workloads
INTEL SOLID-STATE DRIVE
3D NANDIntel® SSD
DC P4600 SeriesTM
Intel® SSD
DC S3700 Series
3D NANDIntel® Optane™ SSD
DC P4800X SeriesTM
3D XPoint™ Technology
#FUT3056BU CONFIDENTIAL 13
VMworld 2017 Content: Not fo
r publication or distri
bution
The Right Solution with Intel® SSDs
Intel® SSD DC P4500 Series
Intel® SSD DC P4600 Series
DRAM + Intel® Optane™ SSD Intel® Memory Drive Technology SW
Memory
Expansion
Caching &
Fast Storage
Mainstream
Storage
Data Tiers Solution
Intel® Optane™SSD DC P4800X
New caching or fast storage tier for the most latency
sensitive applications
Accelerate cache tier and mixed workloads for faster
results and increased storage scaling
High performance, massively scalable storage
Bigger memory for new insights from larger
working sets
Dual Port versions of the Intel® Data Center SSD
family for PCIe
Hot tier and mixed workloads for Enterprise
High Availability Storage Solutions
Enterprise High Availability Storage Solutions
Dual Port versions of the Intel® Data Center SSD
family for PCIe
#FUT3056BU CONFIDENTIAL 14
VMworld 2017 Content: Not fo
r publication or distri
bution
Secure your DC with Intel® Trusted Execution Technology (Intel® TXT)
2. Hypervisor
measure
does not match
POSSIBLE
EXPLOIT! MATCH!
2. Hypervisor
measure matches+
Server with
TPM/Intel® TXT
3. OS and
applications
are launched,
known trusted
3. Policy action
enforced,
known untrusted
1. System powers on and
Intel® TXT verifies
BIOS/OS
+
OSAPPS
• System boot stack gets crypto-
hashed before execution
• Hash values get safely stored in a
Trusted Platform Module (TPM)
• Match to known-good values
determines system trust status
• One-Touch Activation: OOB
TXT/TPM remote discovery,
enablement, activation independent
of OEM/OS
NEW
#FUT3056BU CONFIDENTIAL 15
VMworld 2017 Content: Not fo
r publication or distri
bution
Simplifying Deployment
For any specific workload or application that I manage ….
How do I get everything
integrated and working well
together?
What are the best
hardware and
software components
to deploy?
How do I optimize and
tune settings for best
performance?
How can I go
faster?
What happens when
something
changes?
?
?
?
?
?#FUT3056BU CONFIDENTIAL 17
VMworld 2017 Content: Not fo
r publication or distri
bution
Simplify Deployment with Intel® Select Solutions
Tightly-specified HW and SW components, eliminating guesswork
Simplified
evaluation
Designed and benchmarked to perform optimally for specific workloads
Workload
optimized
Pre-defined settings and system-wide tuning, enabling smooth deployment
Fast & easy
to deploy
All Intel® Select Solution configurations and benchmark results are
verified by Intel
Intel® Select
Solutions for
VMware vSAN
Visit www.intel.com/selectsolutions#FUT3056BU CONFIDENTIAL 18
VMworld 2017 Content: Not fo
r publication or distri
bution
Delivering Performance Beyond Benchmarks
#FUT3056BU CONFIDENTIAL 20
VMworld 2017 Content: Not fo
r publication or distri
bution
Intel® Advanced Vector Extensions-512 (AVX-512)
21
• AVX-512 performance:
– Achieves more work per cycle (doubles width of data registers versus Prior CPU)
– Minimizes latency & overhead (doubles the number of registers)
– 2x FMA processing engines are available on Intel® Xeon® Platinum and Intel® Xeon® Gold
+
=
x7 + y7 x6 + y6 x5 + y5 x4 + y4 x3 + y3 x2 + y2 x1 + y1 x0 + y0
y7 y6 y5 y4 y3 y2 y1 y0
x7 x6 x5 x4 x3 x2 x1 x0
AVX-512 (512-bit)
SSE (128-bit)AVX (256-bit)
double *x,*y, *z;
for (i=0; i<N; i++) { z[i] = x[i] + y[i]; }
Vectorized loop using SSE, AVX, and AVX-512
Family Instruction SetSP FLOPs
/ cycleDP FLOPs
/ cycle
SkylakeIntel® AVX-512 &
FMA64 32
Haswell / Broadwell
Intel AVX2 & FMA 32 16
Sandybridge Intel AVX (256b) 16 8
Nehalem SSE4.2 (128b) 8 4
Accelerates performance for demanding computational tasks
669
1178
2034
3259
0
500
1000
1500
2000
2500
3000
3500
SSE4.2 AVX AVX2 AVX512
GF
LO
Ps
LINPACK Performance1https://www.codeproject.com/Articles/1182515/Vectorization-Opportunities-for-Improved-Perform
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. (1) Source as of June 2017: Intel internal measurements on platform with Xeon Platinum 8180, Turbo enabled, UPI=10.4, SNC1, 6x32GB DDR4-2666 per CPU, 1 DPC
VMworld 2017 Content: Not fo
r publication or distri
bution
VMmark 3.0 Improvement
Xeon ~19% Improvement From
Previous Xeon
VMmark 3.0 Results CPU Configuration
Intel Xeon Platinum 8180,
2-socket, 56-cores/112-threads,
2.5 GHz
Intel Xeon E5-2699A v4,
2-socket, 44-cores / 88-threads,
2.4 GHz
For two very similar high-performance 2-socket servers
New Intel Xeon
Platinum 8180
Previous Intel
Xeon E5-2699A v4
~19% VMmark 3.0
Improvement Over
www.vmware.com/products/vmmark/results3x.html
#FUT3056BU CONFIDENTIAL 22
VMworld 2017 Content: Not fo
r publication or distri
bution
Accelerate VMware vSAN with Intel® 3D NAND SSDs
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. System Configuration: 4 Node vSAN* Cluster. Per Node configuration: Supermicro* SuperServer 2028U-TN24R4T+ Dual Intel® Xeon® E5-2687Wv4 (12 Core @ 3.0 Ghz), Supermicro* Server Board, 256 GB DDR4 RAM, Boot Drive, 1x Intel® SSD DC S3710 Series (200 GB, 2.5”), vSAN Intel 3D NAND Cluster: Virtual SAN SSDs - 2 Disk Groups comprised of 2x Intel® SSD DC P4600 Series (1.6TB, 2.5” SFF), 8x Intel® SSD DC P4500 Series (4 TB, 2.5” SFF), vSAN Intel 2D NAND Cluster: Virtual SAN SSDs - 2 Disk Groups comprised of 2x Intel® SSD DC P3700 Series (800GB, 2.5” SFF), 8x Intel® SSD DC P3500 Series (2 TB, 2.5” SFF), Intel® Ethernet Server Adapter X540-DA2
vSAN NodeIntel® 2D NAND SSDs
Server: 2 x Intel® Xeon® E5 2xup to
63%up to
Increase Storage
Capacity
Lower Cost/GB
Transition
to Intel®
3D NAND
SSDs™
Cache Tier: 2x Intel® SSD DC P3700 @ 800 GB
Capacity Tier: 8x Intel® SSD DC P3500 @ 2TB
vSAN NodeIntel® 3D NAND SSDs
Cache Tier: 2x Intel® SSD DC P4600 @ 1.6 TB
Capacity Tier: 8x Intel® SSD DC P4500 @ 4TB
6%up toIncrease
IOPs
Increased Storage Capacity & Efficiency with Lower Cost
Server: 2 x Intel® Xeon® E5
#FUT3056BU CONFIDENTIAL 23
VMworld 2017 Content: Not fo
r publication or distri
bution
Intel S2600WF (Wolf Pass)
Per-socket:
2 Memory Controllers,
6 Memory Channels
12 DIMM slots
4 Occulinks for
NVMe U.2 drives
Intel “Purley” Design Recommendation: Support a minimum of four U.2 NVMe SSDs in cabled PCIe topology directly from CPU
#FUT3056BU CONFIDENTIAL 24
VMworld 2017 Content: Not fo
r publication or distri
bution
• VMD is a mode of the Skylake-SP integrated PCIe Root Complex. It requires a Special VMD Driver.
• VMD allows vSAN to support NVMe Hotplug and LED management
– VMD intercepts all PCIe hotplug events
– Hotplug looks like SAS LUN add/subtract
– LED management uses a script or ESXcli tool
• vSphere ESXi sees only a VMD PCIe “HW RAID” device and loads the VMD driver
• VMD driver for ESXi scans PCI, finds the nvme controllers and enumerate the nvme devices
– Performed by VMD driver built into NVMe driver
• VMD driver exposes each NVMe devices’ namespace as a SCSI LUN on the VMD controller
– Similar to the LUNs on a SAS controller
– Only the VMD driver knows how to access the NVMe controller memory and config space
Intel “Skylake-SP” Volume Management Device (VMD)
NVMe
SSD
NVMe
SSDNVMe
SSD
NVMe
SSD
PCIe ConfigErrors & Events I/OIntel VMD HW
Switch
vSphere Storage Interface
vSphere Storage Interface
Intel VMD Driver
PCIe Domain Mgmt
Intel NVMeDriver
PCIeConfig
I/O
Block I/O
#FUT3056BU CONFIDENTIAL 25
VMworld 2017 Content: Not fo
r publication or distri
bution
Intel® QuickAssist Technology (Intel® QAT)
• Intel® QAT is a set of scalable hardware accelerators first introduced in 2012.
• Intel® QAT offers higher performance than software for:
– Symmetric cryptography functions including cipher operations and authentication operations
– Public key functions including RSA, Diffie-Hellman, and elliptic curve cryptography
– Compression and decompression functions including DEFLATE and LZS
• Intel® QAT on “Skylake-SP” offers outstanding capabilities:
– 100Gbs Crypto, 100Gbs Compression, 100kops RSA , 2k Decrypt.
• Offered in two forms:
– Integrated in a SKU of the C620 Series Platform Chipset (“Lewisburg”), and
– Add-in PCIe card using the same “Lewisburg” chipset.
26
HOW IS IT OFFERED?
Chipset
Option
PCIe* Card Option
www.intel.com/content/www/us/en/embedded/technology/quickassist/overview.html
#FUT3056BU CONFIDENTIAL 26
VMworld 2017 Content: Not fo
r publication or distri
bution
Accelerate with Intel® QuickAssist Technology
27Revision 0.6
0102030405060708090
100110
RSA 2KDecrypt(kOps/s)
IPSecForwarding
(Gbps)
SSLWebProxy
(Gbps)
Software-based OpenSSL with Intel® QAT
Security Benchmarks
1. NGINX* and OpenSSL* connections/second. Conducted by Intel Applications Integration Team. Claim is actual performance measurement. Intel® microprocessor. Processor: Intel® Xeon® processor Scalable family with C6xxB0 ES2. Performance tests use cores from a single CPU, Memory configuration:, DDR4–2400. Populated with 1 (16 GB) DIMM per channel, total of 6 DIMMs. Intel® QuickAssist Technology driver: QAT1.7.Upstream.L.0.8.0-37 Fedora* 22 (Kernel 4.2.7) BIOS: PLYDCRB1.86B.0088.D09.16060117363. Cloudera* 5.4.2 with Snappy* Software vs. Intel® QuickAssist Technology hardware solution. Conducted by Intel Applications Integration Team. Claim is actual performance measurement. Intel® Xeon® processor E5-2699 v4 (56 cores enabled) 256 GB DDR4 1.6 TB NVMe SSD 1 Intel® C6xxx-based card (24x). 10 Gbps CentOS* 6.7 w/ 2.6.32 kernel Cloudera* 5.4.2; QAT driver 0.9.1 Snappy* 1.1.2 (popular, fast compression codec); One NameNode Eight DataNodes 10 Gbps network2. 24 Core Intel(r) Xeon Scalable Platform -SP @1.8GHz, Single (UP) Processor configuration. Intel(r) C627 PCH with crypto acceleration capability (in x16 mode) Neon City platform. DDR4 2400MHz RDIMMs 6x16GB(total 96 GB), 6 Channels, 1 x Intel®Corporation Red Rock Canyon 100GbE Ethernet Switch in the x16 PCIe slot on Socket 0. 8 cache ways allocated for DDIO.
1 13
VMworld 2017 Content: Not fo
r publication or distri
bution
Broad VMware Support For The Intel® Xeon® Scalable Processor
• Long History of Collaborating Together
• VMware vSphere 6.5 and 6.0.u3 provided Day 0 (launch) Support For Servers using the Intel®
Xeon® Scalable Processor.
– vSphere 6.5 supports using AVX-512 in a Virtual Machine.
• 56 Server Designs were listed on Day 0 of launch.
– Currently at 103 Server Designs across 15 Server OEMs
• vSAN 6.6 provided support for Ready Node Models using the Intel Xeon® Scalable® Processor running VMware vSphere 6.5.u1 and 6.0.u3
– 30 vSAN Ready Node Models listed today
#FUT3056BU CONFIDENTIAL 28
VMworld 2017 Content: Not fo
r publication or distri
bution
Summary: Time to Upgrade!
29
Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and
MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance/datacenter. Configurations: see slide 35. *Other names and brands may be claimed as the property of others.
VMworld 2017 Content: Not fo
r publication or distri
bution