Top Banner
© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 52 White Paper Integrate Cisco UCS C240 M4 Rack Server with NVIDIA GRID Graphics Processing Unit Card and Citrix XenDesktop 7.6 on VMware vSphere 6
52

Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

Apr 16, 2018

Download

Documents

lamnguyet
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 52

White Paper

Integrate Cisco UCS C240 M4 Rack Server with NVIDIA GRID Graphics Processing Unit Card and Citrix XenDesktop 7.6 on VMware vSphere 6

Page 2: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 2 of 52

Contents

What You Will Learn ........................................................................................... 3

Cisco Unified Computing System ..................................................................... 3 Cisco UCS Manager ........................................................................................ 5 Cisco UCS Mini ................................................................................................ 5 Cisco UCS Fabric Interconnect ........................................................................ 6 Cisco UCS 6324 Fabric Interconnect ............................................................... 7 Cisco UCS C-Series Rack Servers .................................................................. 7 Cisco UCS C240 M4 Rack Server ................................................................... 8 Cisco UCS VIC 1227 Modular LOM ................................................................. 9

NVIDIA GRID Cards........................................................................................... 11 NVIDIA GRID Technology .............................................................................. 11 NVIDIA GRID GPU ........................................................................................ 11 NVIDIA GRID Accelerated Remoting ............................................................. 12 NVIDIA GRID Virtualization ............................................................................ 12 NVIDIA GRID vGPU ...................................................................................... 12

VMware vSphere 6.0 ......................................................................................... 12

Graphics Acceleration in Citrix XenDesktop and XenApp ............................ 14 Outstanding User Experience over Any Bandwidth: ...................................... 14 GPU Acceleration for Microsoft Windows Desktops ...................................... 14 GPU Acceleration for Microsoft Windows Server ........................................... 17 GPU Sharing for Citrix XenApp RDS Workloads ........................................... 17 HDX 3D Pro Requirements ............................................................................ 17

Solution Configuration ..................................................................................... 19 Configure Cisco UCS ..................................................................................... 19 Configure the GPU Card ................................................................................ 22

Configure vDGA Pass-Through GPU Deployment ......................................... 25 Prepare a Virtual Machine for vDGA Configuration ........................................ 27 Download and Install Virtual Machine GPU Drivers ....................................... 30 Verify That Applications Are Ready to Support the vGPU ............................. 33

Configure vGPU Deployment ........................................................................... 35 Configure the VMware ESXi Host Server for vGPU Configuration ................. 35 Prepare a Virtual Machine for vGPU Configuration ........................................ 37 Verify That Applications Are Ready to Support the vGPU ............................. 44 Why Use NVIDIA GRID vGPU for Graphic Deployments .............................. 46

Additional Configurations ................................................................................ 48 Install the HDX 3D Pro Virtual Desktop Agent Using the CLI ......................... 48 Install and Upgrade NVIDIA Drivers ............................................................... 48 Use HDX Monitor ........................................................................................... 48 Optimize the HDX 3D Pro User Experience ................................................... 48 Use GPU Acceleration for Microsoft Windows Server DirectX, Direct3D, and WPF Rendering ...................................................................................... 49 Use GPU Acceleration for Windows Server: Experimental GPU Acceleration for NVIDIA CUDA or OpenCL Applications ............................... 49 Use OpenGL Software Accelerator ................................................................ 50

Conclusion ........................................................................................................ 50

For More Information ........................................................................................ 51

Page 3: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 3 of 52

What You Will Learn

With the increased processing power of today’s Cisco UCS® B-Series Blade Servers and C-Series Rack Servers,

applications with demanding graphics components are now being considered for virtualization. To enhance the

capability to deliver these high-performance and graphics-intensive applications, Cisco offers support for the

NVIDIA GRID K1 and K2 cards in the Cisco Unified Computing System™

(Cisco UCS) portfolio of PCI Express

(PCIe) cards for the Cisco UCS C-Series Rack Servers.

With the addition of the new graphics processing capabilities, the engineering, design, imaging, and marketing

departments of organizations can now experience the benefits that desktop virtualization brings to these

applications.

This new graphics capability enables organizations to centralize their graphics workloads and data in the data

center. This capability greatly benefits organizations that need to be able to shift work geographically. Until now,

graphics files have been too large to move, and the files have had to be local to the person using them to be

usable.

The PCIe graphics cards in the Cisco UCS C-Series offer these benefits:

● Support for full-length, full-power NVIDIA GRID cards in a 2-rack-unit (2RU) form factor

● Cisco UCS Manager integration for management of the servers and GRID cards

● End-to-end integration with Cisco UCS management solutions, including Cisco UCS Central Software and

Cisco UCS Director

● More efficient use of rack space with Cisco UCS C240 M4 Rack Servers with two NVIDIA GRID cards than

with the 2-slot, 2.5-inch equivalent rack unit: the HP WS460c workstation blade with the GRID card in a

second slot

An important element of this document’s design is VMware’s support for the NVIDIA GRID virtual graphics

processing unit (vGPU) in VMware vSphere 6. Prior versions of vSphere supported only virtual direct graphics

acceleration (vDGA) and virtual shared graphics acceleration (vSGA), so support for vGPU in vSphere 6 greatly

expands the range of deployment scenarios using the most versatile and efficient configuration of the GRID cards.

The purpose of this document is to help our partners and customers integrate NVIDIA GRID graphics processing

cards and Cisco UCS C240 M4 servers on VMware vSphere and Citrix XenDesktop in vDGA and vGPU modes.

Please contact our partners, NVIDIA, Citrix, and VMware, for lists of applications that are supported by the card,

hypervisor, and desktop broker in each mode.

Our objective is to provide the reader with specific methods for integrating Cisco UCS C240 M4 servers with GRID

K1 and K2 cards with VMware vSphere and Citrix products so that the servers, hypervisor, and desktop broker are

ready for installation of graphics applications.

Cisco Unified Computing System

Cisco UCS is a next-generation data center platform that unites computing, networking, and storage access. The

platform, optimized for virtual environments, is designed using open industry-standard technologies and aims to

reduce total cost of ownership (TCO) and increase business agility. The system integrates a low-latency; lossless

10 Gigabit Ethernet unified network fabric with enterprise-class, x86-architecture servers. It is an integrated,

scalable, multichassis platform in which all resources participate in a unified management domain.

Page 4: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 4 of 52

Figure 1. Cisco UCS Components

The main components of Cisco UCS (Figure 1) are:

● Computing: The system is based on an entirely new class of computing system that incorporates blade

servers based on Intel® Xeon

® processor E5-2600 and 4600 v3 and E7-2800 v3 family CPUs.

● Network: The system is integrated onto a low-latency, lossless, 10-Gbps unified network fabric. This

network foundation consolidates LANs, SANs, and high-performance computing (HPC) networks, which are

separate networks today. The unified fabric lowers costs by reducing the number of network adapters,

switches, and cables, and by decreasing the power and cooling requirements.

● Virtualization: The system unleashes the full potential of virtualization by enhancing the scalability,

performance, and operational control of virtual environments. Cisco security, policy enforcement, and

diagnostic features are now extended into virtualized environments to better support changing business and

IT requirements.

● Storage access: The system provides consolidated access to local storage, SAN storage, and network-

attached storage (NAS) over the unified fabric. With storage access unified, Cisco UCS can access storage

Page 5: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 5 of 52

over Ethernet, Fibre Channel, Fibre Channel over Ethernet (FCoE), and Small Computer System Interface

over IP (iSCSI). This capability provides customers with choice for storage access and investment

protection. In addition, server administrators can preassign storage-access policies for system connectivity

to storage resources, simplifying storage connectivity and management and helping increase productivity.

● Management: Cisco UCS uniquely integrates all system components, enabling the entire solution to be

managed as a single entity by Cisco UCS Manager. The manager has an intuitive GUI, a command-line

interface (CLI), and a robust API for managing all system configuration processes and operations.

Cisco UCS is designed to deliver:

● Reduced TCO and increased business agility

● Increased IT staff productivity through just-in-time provisioning and mobility support

● A cohesive, integrated system that unifies the technology in the data center; the system is managed,

serviced, and tested as a whole

● Scalability through a design for hundreds of discrete servers and thousands of virtual machines and the

capability to scale I/O bandwidth to match demand

● Industry standards supported by a partner ecosystem of industry leaders

Cisco UCS Manager

Cisco UCS Manager provides unified, embedded management of all software and hardware components of Cisco

UCS through an intuitive GUI, a CLI, and an XML API. The manager provides a unified management domain with

centralized management capabilities and can control multiple chassis and thousands of virtual machines.

Cisco UCS Mini

Cisco UCS Mini was incorporated into this solution to manage the Cisco UCS C240 M4 server and the NVIDIA

GRID cards. In addition, Cisco UCS Mini hosted the virtual infrastructure components such as domain controllers

and desktop broker using Cisco UCS B200 M4 Blade Servers. Cisco UCS Mini is an optional part of this reference

architecture. Another choice for managing the Cisco UCS and NVIDIA equipment is the Cisco UCS 6200 Series

Fabric Interconnects.

Cisco UCS Mini is designed for customers who need fewer servers but still want the robust management

capabilities provided by Cisco UCS Manager. This solution delivers servers, storage, and 10 Gigabit networking in

an easy-to-deploy, compact form factor (Figure 2).

Page 6: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 6 of 52

Figure 2. Cisco UCS Mini

Cisco UCS Mini consists of the following components:

● Cisco UCS 5108 Blade Server Chassis: A chassis can accommodate up to eight half-width Cisco UCS

B200 M4 Blade Servers.

● Cisco UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the

Cisco UCS B200 M4 addresses the broadest set of workloads.

● Cisco UCS 6324 Fabric Interconnect: The Cisco UCS 6324 provides the same unified server and

networking capabilities as the top-of-rack Cisco UCS 6200 Series Fabric Interconnects embedded in the

Cisco UCS 5108 Blade Server Chassis.

● Cisco UCS Manager: Cisco UCS Manager provides unified, embedded management of all software and

hardware components in a Cisco UCS Mini solution.

Cisco UCS Fabric Interconnect

Cisco UCS 6300 Series Fabric Interconnects provide the management and communication backbone for the Cisco

UCS B-Series Blade Servers, 5100 Series Blade Server Chassis, and C-Series Rack Servers. In addition, the

firmware for the NVIDIA GRID cards can be managed using both the 6300 and 6200 Series Fabric Interconnects,

an exclusive feature of the Cisco UCS portfolio.

The chassis, blades, and rack-mount servers that are attached to the interconnects are part of a single highly

available management domain. By supporting unified fabric, the fabric interconnects provide the flexibility to

support LAN and storage connectivity for all blades within the domain at configuration time.

Typically deployed in redundant pairs, the 6300 Series Fabric Interconnects deliver uniform access to both

networks and storage to help create a fully virtualized environment.

Page 7: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 7 of 52

Cisco UCS 6324 Fabric Interconnect

Cisco UCS 6324 (Figure 3) offers several major features and benefits that reduce TCO, including:

● Bandwidth of up to 500 Gbps

● Ports capable of line-rate, low-latency, lossless 1 and 10 Gigabit Ethernet, FCoE, and 8-, 4-, and 2-Gbps

Fibre Channel

● Centralized management with Cisco UCS Manager software

● Cisco UCS 5108 Blade Server Chassis capabilities for cooling and serviceability

● Quad Enhanced Small Form-Factor Pluggable (QSFP+) port for rack-mount server connectivity

Figure 3. Cisco UCS 6324 Fabric Interconnect

Cisco UCS C-Series Rack Servers

Cisco UCS C-Series Rack Servers keep pace with Intel Xeon processor innovation by offering the latest

processors with an increase in processor frequency and improved security and availability features. With the

increased performance provided by the Intel Xeon processor E5-2600 and E5-2600 v2 product families, C-Series

servers offer an improved price-to-performance ratio. They also extend Cisco UCS innovations to an industry-

standard rack-mount form factor, including a standards-based unified network fabric, Cisco® VN-Link virtualization

support, and Cisco Extended Memory Technology.

Designed to operate both in standalone environments and as part of Cisco UCS, these servers enable

organizations to deploy systems incrementally—using as many or as few servers as needed—on a schedule that

best meets the organization’s timing and budget. C-Series servers offer investment protection through the

capability to deploy them either as standalone servers or as part of Cisco UCS.

One compelling reason that many organizations prefer rack-mount servers is the wide range of I/O options

available in the form of PCIe adapters. C-Series servers support a broad range of I/O options, including interfaces

supported by Cisco as well as adapters from third parties.

Page 8: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 8 of 52

Cisco UCS C240 M4 Rack Server

The Cisco UCS C240 M4 (Figures 4 and 5 and Table 1) is designed for both performance and expandability over a

wide range of storage-intensive infrastructure workloads, from big data to collaboration. The enterprise-class C240

M4 server further extends the capabilities of the Cisco UCS portfolio in a 2RU form factor with the addition of the

Intel Xeon processor E5-2600 and E5-2600 v2 product families, which deliver an outstanding combination of

performance, flexibility, and efficiency gains.

The C240 M4 offers up to two Intel Xeon processor E5-2600 or E5-2600 v2 CPUs, 24 DIMM slots, 24 disk drives,

and four 1 Gigabit Ethernet LAN-on-motherboard (LOM) ports to provide exceptional levels of internal memory and

storage expandability and exceptional performance.

The C240 M4 interfaces with the Cisco UCS virtual interface card (VIC). The VIC is a virtualization-optimized FCoE

PCIe 2.0 x8 10-Gbps adapter designed for use with C-Series Rack Servers. The VIC is a dual-port 10 Gigabit

Ethernet PCIe adapter that can support up to 256 PCIe standards-compliant virtual interfaces, which can be

dynamically configured so that both their interface type (network interface card [NIC] or host bus adapter [HBA])

and identity (MAC address and worldwide name [WWN]) are established using just-in-time provisioning. An

additional five PCIe slots are available for certified third-party PCIe cards. The server is equipped to handle 24 on-

board serial-attached SCSI (SAS) or solid-state disk (SSD) drives along with shared storage solutions offered by

our partners.

The C240 M4 server's disk configuration delivers balanced performance and expandability to best meet individual

workload requirements. With up to 12 large form-factor (LFF) or 24 small form-factor (SFF) internal drives, the

C240 M4 optionally offers 10,000- and 15,000-rpm SAS drives to deliver a high number of I/O operations per

second (IOPS) for transactional workloads such as database management systems. In addition, high-capacity

SATA drives provide an economical, large-capacity solution. Superfast SSD drives are a third option for workloads

that demand extremely fast access to smaller amounts of data. A choice of RAID controller options also helps

increase disk performance and reliability.

The C240 M4 further increases performance and customer choice over many types of storage-intensive

applications such as:

● Collaboration

● Small and medium-sized business (SMB) databases

● Big data infrastructure

● Virtualization and consolidation

● Storage servers

● High-performance appliances

The C240 M4 can be deployed as a standalone server or as part of Cisco UCS, which unifies computing,

networking, management, virtualization, and storage access into a single integrated architecture that enables end-

to-end server visibility, management, and control in both bare-metal and virtualized environments. Within a Cisco

UCS deployment, the C240 M4 takes advantage of Cisco’s standards-based unified computing innovations, which

significantly reduce customers’ TCO and increase business agility.

For more information about the Cisco UCS C240 M4 Rack Server, see:

● http://www.cisco.com/c/en/us/products/servers-unified-computing/ucs-c240-m4-rack-server/index.html

Page 9: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 9 of 52

● http://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-c240-m4-rack-

server/datasheet-c78-732455.html

Figure 4. Cisco UCS C240 M4 Rack Server Front View

!

Intel

Inside

XEON

UCSC240 M4

Con

sole

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

1 6 12 18 24

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

UC

S-H

DD

30

0G

I2F

10

5

15

K S

AS

30

0 G

B

!

Figure 5. Cisco UCS C240 M4 Rack Server Rear View

Table 1. Cisco UCS C240 M4 PCIe Slots

PCIe Slot Length Lane

1 ¾ x8

2 Full X16

3 Full X8

4 ¾ X8

5 Full X16

6 Full X8

Cisco UCS VIC 1227 Modular LOM

A Cisco innovation, the Cisco UCS VIC 1227 (Figures 6 and 7) is a dual-port, SFP+, 10 Gigabit Ethernet and

FCoE–capable, PCIe modular LOM (mLOM) adapter. It is designed exclusively for the M4 generation of Cisco

UCS C-Series Rack Servers and for the Cisco UCS C3160 Rack Server, which provides dense storage capacity.

New to Cisco rack servers, the mLOM slot can be used to install a VIC without consuming a PCIe slot, providing

greater I/O expandability. The VIC 1227 incorporates next-generation converged network adapter (CNA)

technology from Cisco, providing investment protection for future feature releases. The card enables a policy-

based, stateless, agile server infrastructure that can present up to 256 PCIe standards-compliant interfaces to the

host, which can be dynamically configured as either NICs or HBAs. In addition, the VIC 1227 supports Cisco Data

Center Virtual Machine Fabric Extender (VM-FEX) technology, which extends the Cisco UCS fabric interconnect

ports to virtual machines, simplifying server virtualization deployment.

Page 10: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 10 of 52

For more information about the VIC, see:

● http://www.cisco.com/c/en/us/products/interfaces-modules/ucs-virtual-interface-card-1227/index.html

● http://www.cisco.com/c/en/us/products/collateral/interfaces-modules/unified-computing-system-

adapters/datasheet-c78-732515.html

Figure 6. Cisco UCS VIC 1227 CNA

Figure 7. Cisco UCS VIC 1227 CNA Architecture

Page 11: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 11 of 52

NVIDIA GRID Cards

For desktop virtualization applications, the GRID K1 and K2 cards are an optimal choice for high graphics

performance (Table 2).

Table 2. Technical Specification for NVIDIA GRID Cards

NVIDIA GRID K1 NVIDIA GRID K2

GPU 4 NVIDIA Kepler GK107 2 high-end NVIDIA Kepler GK104

Compute Unified Device Architecture (CUDA) Cores 768 (192 per GPU) 3072 (1536 per GPU)

Memory Size 16-GB DDR3 (4 GB per GPU) 8-GB GDDR5

Maximum Power 130 watts (W) 225W

Auxiliary Power Requirement 6-pin connector 8-pin connector

PCIe x16 x16

OpenGL Release 4.0 Release 4.0

Microsoft DirectX Release 11 Release 11

vGPU Support Yes Yes

Number of Users 4 to 1001 2 to 641

NVIDIA GRID Technology

The NVIDIA GRID virtualization solution is built on more than 20 years of software and hardware innovations in the

accelerated graphics field to deliver a rich graphics experience to users running virtual desktops and applications.

For more information about NVIDIA GRID technology, see http://www.nvidia.com/object/grid-technology.html.

NVIDIA GRID GPU

NVIDIA Kepler-based GRID K1 and K2 boards are specifically designed to enable rich graphics in virtualized

environments. They offer these main features:

● High user density: GRID boards have an optimized multiple-GPU design that helps increase user density.

The GRID K1 board has four GPUs and 16 GB of graphics memory. In combination with NVIDIA GRID

vGPU technology, the GRID K1 supports up to 32 users on a single board.

● Power efficiency: GRID boards are designed to provide data center–class power efficiency, including the

revolutionary new streaming multiprocessor, SMX. The result is an innovative, proven solution that delivers

revolutionary performance per watt for the enterprise data center.

● Reliability 24 hours a day, 7 days a week: GRID boards are designed, built, and tested by NVIDIA for

operation all day, every day. Working closely with Cisco helps ensure that GRID cards perform optimally

and reliably for the life of the system.

For more information about GRID boards, see http://www.nvidia.com/object/grid-technology.html.

Page 12: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 12 of 52

NVIDIA GRID Accelerated Remoting

NVIDIA's patented low-latency remote display technology greatly improves the user experience by reducing the lag

that users feel when interacting with a virtual machine. With this technology, the virtual desktop screen is encoded

and pushed directly to the remoting protocol.

Built into every Kepler GPU is a high-performance H.264 encoding engine capable of encoding simultaneous

streams with superior quality. This feature greatly enhances cloud server efficiency by offloading encoding

functions from the CPU and allowing the encoding function to scale with the number of GPUs in a server.

The GRID Accelerated Remoting technology is available in industry-leading remote display protocols.

NVIDIA GRID Virtualization

GRID cards enable GPU-capable virtualization solutions from Microsoft and VMware, delivering the flexibility to

choose from a wide range of proven solutions. GRID GPUs can be dedicated to a single high-end user or shared

among multiple users.

NVIDIA GRID vGPU

GRID boards use NVIDIA Kepler-based GPUs that, for the first time, allow hardware virtualization of the GPU.

Thus, multiple users can share a single GPU, improving user density while providing true PC performance and

application compatibility.

For more information about the GRID vGPU, see http://www.nvidia.com/object/virtual-gpus.html.

VMware vSphere 6.0

VMware provides virtualization software. VMware’s enterprise software hypervisors for servers—VMware vSphere

ESX, vSphere ESXi, and VSphere—are bare-metal hypervisors that run directly on server hardware without

requiring an additional underlying operating system. VMware vCenter Server for vSphere provides central

management and complete control and visibility into clusters, hosts, virtual machines, storage, networking, and

other critical elements of your virtual infrastructure.

vSphere 6.0 introduces many enhancements to vSphere Hypervisor, VMware virtual machines, vCenter Server,

virtual storage, and virtual networking, further extending the core capabilities of the vSphere platform.

The vSphere 6.0 platform includes these new features:

● Computing

◦ Increased scalability: vSphere 6.0 supports larger maximum configuration sizes. Virtual machines support

up to 128 virtual CPUs (vCPUs) and 4 TB of virtual RAM (vRAM). Hosts support up to 480 CPUs and 12 TB

of RAM, 1024 virtual machines per host, and 64 nodes per cluster.

◦ Expanded support: Get expanded support for the latest x86 chip sets, devices, drivers, and guest

operating systems. For a complete list of guest operating systems supported, see the VMware Compatibility

Guide.

◦ Outstanding graphics: The GRID vGPU delivers the full benefits of NVIDIA hardware-accelerated

graphics to virtualized solutions.

◦ Instant cloning: Technology built in to vSphere 6.0 lays that foundation for rapid cloning and deployment

of virtual machines—up to 10 times faster than what is possible today.

Page 13: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 13 of 52

● Storage

◦ Transformation of virtual machine storage: vSphere Virtual Volumes enables your external storage arrays

to become virtual machine aware. Storage policy–based management (SPBM) enables common

management across storage tiers and dynamic storage class-of-service (CoS) automation. Together these

features enable exact combinations of data services (such as clones and snapshots) to be instantiated

more efficiently on a per–virtual machine basis.

● Network

◦ Network I/O control: New support for per–virtual machine VMware Distributed Switch (DVS) bandwidth

reservation helps ensure isolation and enforce limits on bandwidth.

◦ Multicast snooping: Support for Internet Group Management Protocol (IGMP) snooping for IPv4 packets

and Multicast Listener Discovery (MLD) snooping for IPv6 packets in VDS improves performance and

scalability with multicast traffic.

◦ Multiple TCP/IP stacks for VMware vMotion: Implement a dedicated networking stack for vMotion traffic,

simplifying IP address management with a dedicated default gateway for vMotion traffic.

● Availability

◦ vMotion enhancements: Perform nondisruptive live migration of workloads across virtual switches and

vCenter Servers and over distances with a round-trip time (RTT) of up to 100 milliseconds (ms). This

support for dramatically longer RTT—a 10x increase in the supported time—for long-distance vMotion now

enables data centers physically located in New York and London to migrate live workloads between one

another.

◦ Replication-assisted vMotion: Customers with active-active replication set up between two sites can

perform more efficient vMotion migration, resulting in huge savings in time and resources, with up to 95

percent more efficient migration depending on the amount of data moved.

◦ Fault tolerance (up to 4 vCPUs): Get expanded support for software-based fault tolerance for workloads

with up to four vCPUs.

● Management

◦ Content library: This centralized repository provides simple and effective management for content,

including virtual machine templates, ISO images, and scripts. With vSphere Content Library, you can now

store and manage content from a central location and share content through a publish-and-subscribe

model.

◦ Cloning and migration across vCenter: Copy and move virtual machines between hosts on different

vCenter Servers in a single action.

◦ Enhanced user interface: vSphere Web Client is more responsive, more intuitive, and simpler than ever

before.

Page 14: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 14 of 52

Graphics Acceleration in Citrix XenDesktop and XenApp

Citrix HDX 3D Pro enables you to deliver the desktops and applications that perform best with a GPU for hardware

acceleration, including 3D professional graphics applications based on OpenGL and DirectX. (The standard virtual

delivery agent [VDA] supports GPU acceleration of DirectX only.)

Examples of 3D professional applications include:

● Computer-aided design (CAD), manufacturing (CAM), and engineering (CAE) applications

● Geographical information system (GIS) software

● Picture archiving and communication system (PACS) for medical imaging

● Applications using the latest OpenGL, DirectX, NVIDIA CUDA, and OpenCL versions

● Computationally intensive nongraphical applications that use CUDA GPUs for parallel computing

Outstanding User Experience over Any Bandwidth:

● On WAN connections: Deliver an interactive user experience over WAN connections with bandwidth as

low as 1.5 Mbps.

● On LAN connections: Deliver a user experience equivalent to that of a local desktop on LAN connections

with bandwidth of 100 Mbps.

You can replace complex and expensive workstations with simpler user devices by moving graphics processing

into the data center for centralized management.

HDX 3D Pro provides GPU acceleration for Microsoft Windows desktops and Microsoft Windows Server. When

used with VMware vSphere 6 and NVIDIA GRID GPUs, HDX 3D Pro provides vGPU acceleration for Windows

desktops. For the supported hypervisor versions, see Citrix Virtual GPU Solution.

GPU Acceleration for Microsoft Windows Desktops

With HDX 3D Pro, you can deliver graphically intensive applications as part of hosted desktops or applications on

Windows PCs. HDX 3D Pro supports physical host computers (including desktop, blade, and rack workstations)

and virtual machines with GPU pass-through and virtual machines with vGPU technology.

Using the supported hypervisor’s capability to perform GPU pass-through, you can create virtual machines with

exclusive access to dedicated graphics processing hardware. You can install multiple GPUs on the hypervisor and

assign virtual machines to each of these GPUs on a one-to-one basis.

Using vGPU technology, multiple virtual machines can directly access the graphics processing power of a single

physical GPU. The true hardware GPU sharing provides full Windows client and server desktops suitable for users

with complex and demanding design requirements. Supported for NVIDIA GRID K1 and K2 cards, the GPU sharing

uses the same NVIDIA graphics drivers that are deployed on nonvirtualized operating systems.

HDX 3D Pro offers the following features:

● Adaptive H.264-based deep compression for optimal WAN and wireless performance: HDX 3D Pro

uses CPU-based deep compression as the default compression technique for encoding. This approach

provides optimal compression that dynamically adapts to network conditions. The H.264-based deep

compression codec no longer competes with graphics rendering for CUDA cores on the NVIDIA GPU. The

deep-compression codec runs on the CPU and provides bandwidth efficiency.

Page 15: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 15 of 52

● Lossless compression option for specialized use cases: HDX 3D Pro also offers a CPU-based lossless

codec to support applications that require pixel-perfect graphics, such as medical imaging. Lossless

compression is recommended only for specialized use cases because it consumes significantly more

network and processing resources.

◦ When you use lossless compression:

− The lossless indicator, a system tray icon, notifies the user if the screen displayed is a lossy frame or a

lossless frame. This notification is helpful when the Visual Quality policy setting specifies Build to

Lossless. The lossless indicator turns green when the frames sent are lossless.

− The lossless switch enables the user to change to Always Lossless mode at any time in the session.

To select or deselect Always Lossless in a session, right-click the icon or use the shortcut Alt+Shift+1.

◦ For lossless compression, HDX 3D Pro uses the lossless codec for compression regardless of the codec

selected through policy.

◦ For lossy compression, HDX 3D Pro uses the original codec, either the default or the one selected

through policy.

◦ Lossless switch settings are not retained for subsequent sessions. To use the lossless codec for every

connection, select Always Lossless for the Visual Quality policy setting.

● Capability, in XenDesktop 7.6 Feature Pack 3, to override the default shortcut, Alt+Shift+1, to select

or deselect Lossless within a session: Configure a new registry setting at

HKLM\SOFTWARE\Citrix\HDX3D\LLIndicator.

◦ Name: HKLM_HotKey

◦ Type: String

◦ The format to configure a shortcut combination is C=0|1, A=0|1, S=0|1, W=0|1, K=val. Keys must be

comma "," separated. The order of the keys does not matter.

◦ A, C, S, W, and K are keys, where C = Control, A = Alt, S = Shift, W = Window, and K = a valid key.

Allowed values for K are 0–9, a–z, and any virtual-key code. For more information about virtual-key codes,

see Virtual-Key Codes on the Microsoft Developer Network (MSDN).

◦ Here are some examples of virtual-key codes:

− For F10, set K=0x79

− For Ctrl+F10, set C=1, K=0x79

− For Alt+A, set A=1, K=a or A=1, K=A or K=A, A=1

− For Ctrl+Alt+F5, set C=1, A=1, K=5 or A=1, K=5, C=1

− For Ctrl+Shift+F5, set A=1, S=1, K=0x74

Caution: Editing the registry incorrectly can cause serious problems that may require you to reinstall your

operating system. Citrix cannot guarantee that problems resulting from the incorrect use of Registry Editor can be

solved. Use Registry Editor at your own risk. Be sure to back up the registry before you edit it.

● Multiple and high-resolution monitor support: For Windows 7 and 8 desktops, HDX 3D Pro supports

user devices with up to four monitors. Users can arrange their monitors in any configuration and can mix

monitors with different resolutions and orientations. The number of monitors is limited by the capabilities of

the host computer GPU, the user device, and the available bandwidth. HDX 3D Pro supports all monitor

resolutions and is limited only by the capabilities of the GPU on the host computer.

Page 16: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 16 of 52

● Dynamic resolution: You can resize the virtual desktop or application window to any resolution.

● Support for NVIDIA Kepler architecture: HDX 3D Pro supports NVIDIA GRID K1 and K2 cards for GPU

pass-through and GPU sharing. The GRID vGPU enables multiple virtual machines to have simultaneous,

direct access to a single physical GPU, using the same NVIDIA graphics drivers that are deployed on

nonvirtualized operating systems.

● Support for VMware vSphere and ESX using vDGA: You can use HDX 3D Pro with vDGA for both

remote desktop service (RDS) and virtual desktop infrastructure (VDI) workloads. When you use HDX 3D

Pro with vSGA, support is limited to one monitor. Use of vSGA with large 3D models can result in

performance problems because of its use of API-intercept technology. For more information, see VMware

vSphere 5.1: Citrix Known Issues.

As shown in Figure 8:

● The host computer must reside in the same Microsoft Active Directory domain as the delivery controller.

● When a user logs on to Citrix Receiver and accesses the virtual application or desktop, the controller

authenticates the user and contacts the VDA for HDX 3D Pro to broker a connection to the computer

hosting the graphical application.

● The VDA for HDX 3D Pro uses the appropriate hardware on the host to compress views of the complete

desktop or of just the graphical application.

● The desktop or application views and the user interactions with them are transmitted between the host

computer and the user device through a direct HDX connection between Citrix Receiver and the VDA for

HDX 3D Pro.

Figure 8. Citrix HDX 3D Pro Process Flow

Page 17: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 17 of 52

GPU Acceleration for Microsoft Windows Server

HDX 3D Pro allows graphics-intensive applications running in Windows Server sessions to render on the server's

GPU. By moving OpenGL, DirectX, Direct3D, and Windows Presentation Foundation (WPF) rendering to the

server's GPU, the server's CPU is not slowed by graphics rendering. Additionally, the server can process more

graphics because the workload is split between the CPU and the GPU.

GPU Sharing for Citrix XenApp RDS Workloads

RDS GPU sharing enables GPU hardware rendering of OpenGL and DirectX applications in remote desktop

sessions.

● Sharing can be used on bare-metal devices or virtual machines to increase application scalability and

performance.

● Sharing enables multiple concurrent sessions to share GPU resources (most users do not require the

rendering performance of a dedicated GPU).

● Sharing requires no special settings.

For DirectX applications, only one GPU is used by default. That GPU is shared by multiple users. The allocation of

sessions across multiple GPUs with DirectX is experimental and requires registry changes. Contact Citrix Support

for more information.

You can install multiple GPUs on a hypervisor and assign virtual machines to each of these GPUs on a one-to-one

basis: either install a graphics card with more than one GPU, or install multiple graphics cards with one or more

GPUs each. Mixing heterogeneous graphics cards on a server is not recommended.

Virtual machines require direct pass-through access to a GPU, which is available with VMware vSphere 6. When

HDX 3D Pro is used with GPU pass-through, each GPU in the server supports one multiuser virtual machine.

Scalability using RDS GPU sharing depends on several factors:

● The applications being run

● The amount of video RAM that the applications consume

● The graphics card's processing power

Some applications handle video RAM shortages better than others. If the hardware becomes extremely

overloaded, the system may become unstable, or the graphics card driver may fail. Limit the number of concurrent

users to avoid such problems.

To confirm that GPU acceleration is occurring, use a third-party tool such as GPU-Z. GPU-Z is available at

http://www.techpowerup.com/gpuz/.

HDX 3D Pro Requirements

Servers and user devices must meet the minimum hardware requirements for the installed operating system.

The VDA for HDX 3D Pro is supported for installation on the following versions of Windows:

● Windows 8 64-bit and 32-bit editions

● Windows 7 64-bit editions with Service Pack 1

● Windows 7 32-bit editions with Service Pack 1

Page 18: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 18 of 52

● Windows XP Professional 64-bit edition with Service Pack 2

● Windows XP Professional 32-bit edition with Service Pack 3

The VDA for HDX 3D Pro can be used with XenDesktop 7 or 5.6.

HDX 3D Pro can be used to deliver any application that is compatible with the supported host operating systems,

but is particularly suitable for use with DirectX- and OpenGL-based applications and with multimedia applications

such as video.

The computer hosting the application can be either a physical machine or a virtual machine with GPU pass-

through. The GPU pass-through feature is available with Citrix XenServer 6.0.2 and 6.0.1 and VMware vSphere 6.

Citrix recommends that the host computer include at least 4 GB of RAM and four vCPUs with a clock speed of 2.3

GHz or higher.

The GPU should meet these specifications:

● For CPU-based compression, including lossless compression, HDX 3D Pro supports any display adapter on

the host computer that is compatible with the application that you are delivering.

● For optimized GPU frame buffer access using the NVIDIA GRID API, HDX 3D Pro requires NVIDIA Quadro

cards with the latest NVIDIA drivers. The GRID API delivers a high frame rate, resulting in a highly

interactive user experience.

To access desktops or applications delivered with HDX 3D Pro, users must install Citrix Receiver. For more

information about Receiver system requirements, see Receiver and Plug-ins.

In addition, the user device should meet these specifications:

● HDX 3D Pro supports all monitor resolutions that are supported by the GPU on the host computer.

However, for optimum performance with the minimum recommended user device and GPU specifications,

Citrix recommends maximum monitor resolutions for users' devices of 1920 x 1200 pixels for LAN

connections and 1280 x 1024 pixels for WAN connections.

● Citrix recommends that user devices include at least 1 GB of RAM and a CPU with a clock speed of 1.6

GHz or higher. Use of the default deep-compression codec, which is required on low-bandwidth

connections, requires a more powerful CPU. For optimum performance, Citrix recommends that users'

devices be equipped with at least 2 GB of RAM and a dual-core CPU with a clock speed of 3 GHz or higher.

● For multiple-monitor access, Citrix recommends user devices equipped with quad-core CPUs.

Note: User devices do not need a dedicated GPU to access desktops or applications delivered with HDX 3D

Pro.

Page 19: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 19 of 52

Solution Configuration

Figure 9 provides an overview of the solution configuration.

Figure 9. Reference Architecture

The hardware components of the solution are:

● Cisco UCS C240 M4 Rack Server (two Intel Xeon processor E5-2660 v3 CPUs at 2.60 GHz) with 256 GB of

memory (16 GB x 16 DIMMs at 2133 MHz) and a hypervisor host

● Cisco UCS VIC1227 mLOM

● Two Cisco Nexus® 9372 Switches (access switches)

● Two Cisco UCS 6324 Fabric Interconnects through Cisco UCS Mini

● Twelve 600-GB SAS disks at 10,000 rpm

● NVIDIA GRID K1 and K2 cards

The software components of the solution are:

● Cisco UCS firmware 3.0(2d)

● VMware ESXi 6.0 for VDI hosts

● Citrix XenApp and XenDesktop 7.6 with Feature Pack 3

● Microsoft Windows 7 SP1 64-bit

Configure Cisco UCS

This section describes the Cisco UCS configuration.

Page 20: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 20 of 52

Install NVIDIA GRID or Tesla GPU Card on Cisco UCS C240 M4

Install the GPU card on the Cisco UCS C240 M4 server. Table 3 lists the minimum firmware required for the GPU

cards.

Table 3. Minimum Server Firmware Versions Required for GPU Cards

Cisco Integrated Management Controller

BIOS Minimum Version

NVIDIA GRID K1 2.0(3a)

NVIDIA GRID K2 2.0(3a)

NVIDIA Tesla K10 2.0(3e)

NVIDIA Tesla K20 2.0(3e)

NVIDIA Tesla K20X 2.0(3e)

NVIDIA Tesla K40 2.0(3a)

For more information, see

http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/hw/C240M4/install/C240M4/replace.html - pgfId-

1372776.

Note the following NVIDIA GPU card configuration rules:

● You can mix GRID K1 and K2 GPU cards in the same server.

● Do not mix GRID GPU cards with Tesla GPU cards in the same server.

● Do not mix different models of Tesla GPU cards in the same server.

● All GPU cards require two CPUs and at least two 1400W power supplies in the server.

For more information, see:

● http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/hw/C240M4/install/C240M4/replace.html -

pgfId-1372848

● http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/hw/C240M4/install/C240M4.pdf

The rules for configuring the server with GPUs differ, depending on the server version and other factors. Table 4

lists rules for populating the C240 M4 with NVIDIA GPUs. Figure 10 shows a one-GPU installation, and Figure 11

shows a two-GPU installation.

Table 4. NVIDIA GPU Population Rules for Cisco UCS C240 M4 Rack Server

Single GPU Dual GPU

Riser 1A, slot 2

or

Riser 2, slot 5

Riser 1A, slot 2

and

Riser 2, slot 5

Note: When you install a GPU card in slot 2, Network Communications Services Interface (NCSI) support in

riser 1 automatically moves to slot 1. When you install a GPU card in slot 5, NCSI support in riser 2 automatically

moves to slot 4. Therefore, you can install a GPU card and a Cisco UCS VIC in the same riser.

Page 21: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 21 of 52

Figure 10. One-GPU Scenario

Figure 11. Two-GPU Scenario

For information about the physical configuration of GRID cards in riser slots 2 and 5 and for GPU card PCIe slot

and support information, see

http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/hw/C240M4/install/C240M4/replace.html - 75263.

Specify the Base Cisco UCS Configuration

To configure physical connectivity and implement best practices for Cisco UCS C-Series server integration with

Cisco UCS Manager, see

http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/release/notes/ucs_2_2_rn.html.

Page 22: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 22 of 52

Configure the GPU Card

1. After the NVIDIA GPU cards are physically installed and the C240 M4 Rack Server is discovered in Cisco UCS

Manager, select the server and choose Inventory > GPUs.

As shown in Figure 12, PCIe slots 2 and 5 are used with two GRID K2 cards running firmware Version

80.04.D4.00.09 | 2055.0552.01.08.

Figure 12. NVIDIA GRID Cards Inventory Displayed in Cisco UCS Manager

You also can use Cisco UCS Manager to perform firmware upgrades to the NVIDIA GPU cards.

2. Create host firmware policy by selecting the Servers tab in Cisco UCS Manager. Choose Policies > Host

Firmware Packages. Right-click and select Create Host Firmware Package (Figure 13).

Page 23: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 23 of 52

Figure 13. Cisco UCS Manager Firmware Package

3. Select the Simple configuration of the host firmware package, and for Rack Package, choose 3.0(2d)C (Figure

14).

Figure 14. Cisco UCS Manager Firmware Package Configuration

4. Click OK to display the list of firmware packages (Figure 15).

Figure 15. Cisco UCS Manager Firmware Package Selection for NVIDIA Components

Page 24: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 24 of 52

5. Apply the host firmware package in the service profile template service profiles firmware policy. After the

firmware upgrades are completed, the running firmware version for the GPUs is selected (Figure 16).

Figure 16. Running Firmware Version for the GPUs

Note: Virtual machine hardware Version 9 or later is required for vGPU and vDGA configuration. Virtual

machines with hardware Version 9 or later should have their settings managed through the vSphere Web Client.

Page 25: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 25 of 52

Configure vDGA Pass-Through GPU Deployment

This section outlines the installation process for configuring a ESXi host and virtual machine for vDGA support.

Figure 17 shows the GRID GPU components used for Pass-through support.

Figure 17. NVIDIA GRID GPU Pass-through Components

1. You need to install the GRID cards in the C240 M4 Rack Server. Using the vSphere Web Client, select the

ESXi host server, choose the Manage tab, select Settings, and choose Hardware > PCI Devices > Edit (Figure

18).

Figure 18. Enabling GRID Card Pass-Through Mode on ESXi Server

Page 26: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 26 of 52

2. A dialog box will appear showing all PCI devices along with the GRID cards. Select the GRID card types

installed on the server from among the available adapters (Figure 19).

Figure 19. NVIDIA Pass-Through Devices Displayed on the ESXi Host

3. After making changes for pass-through configuration, reboot the ESXi host.

4. Verify that the GPU devices to be used for pass-through configuration are marked as available (Figure 20). If a

device isn’t marked as available, refer to the VMware documentation to perform troubleshooting.

Figure 20. Displaying All Selected NVIDIA GRID GPU Card Adapters

Note: vDGA does not support live vSphere vMotion capabilities. Bypassing the virtualization layer, vDGA uses

vSphere DirectPath I/O to allow direct access to the GPU card. By enabling direct pass-through from the virtual

machine to the PCI device installed on the host, you effectively lock the virtual machine to that specific host. If you

need to move a vDGA-enabled virtual machine to a different host, you should power off the virtual machine, use

Page 27: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 27 of 52

vMotion to migrate it to another host that has a GPU card installed, and re-enable pass-through to the specific PCI

device on that host. Only then should you power on the virtual machine.

Prepare a Virtual Machine for vDGA Configuration

Use the following procedure to create the virtual machine that will be used as the VDI base image later.

1. Using vSphere Web Client, create a new virtual machine. To do this, right-click a host or cluster and choose

New Virtual Machine. Work through the New Virtual Machine wizard. Unless another configuration is specified,

select the configuration settings appropriate for your environment (Figure 21).

Figure 21. Creating a New Virtual Machine in VMware vSphere Web Client

2. Choose “ESXi 6.0 and later” from the “Compatible with” drop-down menu. This selection enables you to use

the latest features, including the mapping of shared PCI devices, which is required for the vGPU (Figure 22).

Figure 22. Selecting Virtual Machine Hardware Version 11

Page 28: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 28 of 52

Note: If you are using existing virtual machines for 3D enablement, be sure that the virtual machine is

configured as Version 9 or later for compatibility. Virtual machine Version 11 is recommended.

3. Reserve all guest memory. In the virtual machine Edit Settings options, select the Resources tab. Select

“Reserve all guest memory (All locked)” as shown in Figure 23.

Figure 23. Reserving All Guest Memory

4. If the virtual machine has more than 2 GB of configured memory, adjust pciHole.start. Add the following

parameter to the .vmx file of the virtual machine (you can add this parameter at the end of the file):

pciHole.start = “2048”

Note: This step is required only if the virtual machine has more than 2 GB of configured memory.

5. Add a PCI device:

a. In the virtual machine Edit Settings, choose New device > PCI Device. Click Add to add the new PCI

device (Figure 24, steps 1 and 2).

b. Select the PCI device and choose NVIDIA GPU from the drop-down list (Figure 24, step 3).

c. Click OK to complete the process (Figure 24, step 4).

Note: Only one virtual machine can be powered on if the same PCI device is added to multiple virtual machines.

Page 29: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 29 of 52

Figure 24. Adding a PCI Device to the Virtual Machine to Attach GPU Controller Pass-Through Mode

6. Install and configure Windows on the virtual machine:

a. Configure the virtual machine with four vCPUs and 8 GB of RAM (as an example configuration).

b. Install VMware Tools.

c. Join the virtual machine to the Active Directory domain.

d. Choose “Allow remote connections to this computer” in the Windows System Properties menu.

Page 30: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 30 of 52

Download and Install Virtual Machine GPU Drivers

1. Download the virtual machine drivers from the NVIDIA website:

http://www.nvidia.com/Download/index.aspx?lang=en-us (Figure 25).

Note: Select 32-bit or 64-bit graphics drivers based on the guest OS type.

Figure 25. Download NVIDIA Drivers

2. Install the drivers.

a. Accept the license terms and agreement (Figure 26); then click Next.

Figure 26. NVIDIA Graphics Drivers Installation System Check

Page 31: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 31 of 52

b. Select Custom (Advanced) installation; then click Next (Figure 27).

Figure 27. NVIDIA Graphics Driver Installation Options

c. Select the check box for each option and select “Perform a clean installation.” Then click Next (Figure 28).

Figure 28. Select All Components Available and Perform a Clean Installation

Page 32: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 32 of 52

The installation begins. A progress bar shows its progress (Figure 29).

Figure 29. Installation Progress

d. Click Restart Now. Reboot the virtual machine.

Note: After you restart the virtual machine, the mouse cursor may not track properly using the Virtual Network

Computing (VNC) or vSphere console. If so, use Remote Desktop.

3. Install Citrix XenDesktop HDX 3D Pro Virtual Desktop Agent (Figure 30). Reboot when prompted to do so.

Figure 30. Citrix XenDesktop HDX 3D Pro Virtual Desktop Agent Installation

Page 33: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 33 of 52

Verify That Applications Are Ready to Support the vGPU

Verify that the virtual machine is using the NVIDIA GPU and driver.

1. Verify that the correct display adapters were installed using Windows Device Manager (Figure 31).

Figure 31. Verifying the Adapters Installed

2. Connect through Citrix’s HDX protocol to the virtual desktop machine and verify that the GPU is active by

viewing the displayed information in the DirectX Diagnostic Tool:

a. Click the Start menu from the virtual machine to which the GRID card pass-through device is attached.

b. Type dxdiag, and when DxDiag appears in the list, click Enter; or click DxDiag in the list.

c. After DxDiag launches, click the Display tab to verify that the virtual machine is using the NVIDIA GPU

and driver (Figure 32).

Page 34: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 34 of 52

Figure 32. Launching DxDiag from the Virtual Machine to Check the Driver Status

3. Verify that all the GRID card controllers are present on the host by using following command:

lspci | grep NVIDIA

The command should return output similar to Figure 33 if you are using two GRID K2 cards.

Figure 33. Displaying All the GRID Card Controllers Present on the Host

Note: When you deploy vDGA, it uses the graphics driver from the GPU vendor rather than the virtual machine’s

vGPU 3D driver. To provide frame-buffer access, vDGA uses an interface between the remote protocol and the

graphics driver.

Page 35: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 35 of 52

Configure vGPU Deployment

This section outlines the installation process for configuring an ESXi host and virtual machine for vGPU support.

Figure 34 shows the components used for vGPU support.

Figure 34. NVIDIA GRID vGPU Components

Configure the VMware ESXi Host Server for vGPU Configuration

This section outlines the installation process for configuring an ESXi host for vGPU support.

1. Download the NVIDIA GRID GPU driver pack for vSphere ESXi 6.0 from

http://www.nvidia.com/download/driverResults.aspx/85391/en-us (Figure 35).

Note: Do not select the GRID Series drivers.

Figure 35. Download the ESXi 6.0 GPU Driver Pack

Page 36: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 36 of 52

2. Enable the ESXi Shell and the Secure Shell (SSH) protocol on the vSphere host from the Troubleshooting

menu of the vSphere Configuration Console (Figure 36).

Figure 36. ESXi Configuration Console

3. Upload the NVIDIA driver (vSphere Installation Bundle [VIB] file) to the /tmp directory on the ESXi host using a

tool such as WinSCP (shared storage is preferred if you are installing drivers on multiple servers) or using the

VMware Update Manager.

4. Log in as root to the vSphere console through SSH using a tool such as Putty.

Note: The ESXi host must be in maintenance mode for you to install the VIB module.

5. Enter the following command to install the NVIDIA vGPU drivers:

esxcli software vib install --no-sig-check -v /<path>/<filename>.VIB

The command should return output similar to Figure 37.

Figure 37. ESX SSH Console Connection for vGPU Driver Installation

Note: Although the display states “Reboot Required: false”, a reboot is necessary for the VIB file to load and for

xorg to start.

6. Exit the ESXi host from maintenance mode and reboot the host by using the vSphere Web Client or by

entering the following commands:

esxcli system maintenanceMode set -e false

reboot

a. After the host rebooted successfully, verify if the kernel module has loaded successfully, using following

command:

esxcli software vib list | grep -i nvidia

The command should return an output similar to the following:

Page 37: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 37 of 52

Figure 38. ESX SSH Console Connection for Driver Verification

Note: See the VMware knowledge base article for information about removing any existing NVIDIA drivers

before installing new drivers for any vGPU or vDGA test:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2033434.

7. Confirm the GRID GPU detection on the ESXi host. The status of the GPU card’s CPU, the card’s memory,

and disk space left on the card can be determined by entering the following command:

nvidia-smi

The command should return output similar to Figure 39 if you are using two GRID K2 cards.

Figure 39. ESX SSH Console Connection for GPU Card Detection

Note: The NVIDIA system management interface (SMI) also allows GPU monitoring using the following

command (this command adds a loop, automatically refreshing the display): nvidia-smi -l

Prepare a Virtual Machine for vGPU Configuration

Use the following procedure to create the virtual machine that will later be used as the VDI base image.

Page 38: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 38 of 52

1. Using vSphere Web Client, create a new virtual machine. To do this, right-click a host or cluster and choose

New Virtual Machine. Work through the New Virtual Machine wizard. Unless another configuration is specified,

select the configuration settings appropriate for your environment (Figure 40).

Figure 40. Creating a New Virtual Machine in VMware vSphere Web Client

2. Choose “ESXi 6.0 and later” from the “Compatible with” drop-down menu, to be able to leverage the latest

features, including the mapping of shared PCI devices, which is required for the vGPU (Figure 41).

Figure 41. Selecting Virtual Machine Hardware Version 11

Page 39: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 39 of 52

3. In customizing the hardware of the new virtual machine, add a new shared PCI device, select the appropriate

GPU profile, and reserve all virtual machine memory (Figure 42).

Note: If you are creating a new virtual machine and using the vSphere Web Client's virtual machine console

functions, then the mouse will not be usable in the virtual machine until after both the operating system and the

VMware Tools have been installed. If you cannot use the traditional vSphere Client to connect to the virtual

machine, do not enable NVIDIA GRID vGPU at this time.

Figure 42. Adding a Shared PCI Device to the Virtual Machine to Attach the GPU Profile

4. Install and configure Windows on the virtual machine:

a. Configure the virtual machine with the appropriate amount of vCPU and RAM according to the GPU

profile selected.

b. Install the VMware Tools.

c. Join the virtual machine to the Active Directory domain.

d. Choose “Allow remote connections to this computer” in the Windows System Properties menu.

GRID K1 and K2 Profile Specifications

The GRID vGPU allows up to eight users to share each physical GPU, assigning the graphics resources of the

available GPUs to virtual machines using a balanced approach. Each GRID K1 card has four GPUs, allowing 32

users to share a single card. Each GRID K2 card has two GPUs, allowing 16 users to share a single card. Table 5

summarizes the user profile specifications.

For more information, see http://www.nvidia.com/object/grid-enterprise-resources.html.

Page 40: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 40 of 52

Table 5. User Profile Specifications for GRID K1 and K2 Cards

NVIDIA GRID Card

Virtual GPU Profile

Application Certification

Graphics Memory in MB

Max Display per User

Maximum Resolution per Display

Maximum Users per Board

Use Case

GRID K2 K280Q Yes 4096 4 2560 x 1600 2 Designer

K260Q Yes 2048 4 2560 x 1600 4 Designer and power user

K240Q Yes 1024 2 2560 x 1600 8 Designer and power user

K220Q Yes 512 2 2560 x 1600 16 Power user

GRID K1 K180Q Yes 4096 4 2560 x 1600 4 Power user

K160Q Yes 2048 4 2560 x 1600 8 Power user

K140Q Yes 1024 2 2560 x 1600 16 Knowledge worker

K120Q Yes 512 2 2560 x 1600 32 Knowledge worker

NVIDIA vGPU Software (Driver) and Citrix HDX 3D Pro Agent Installation

Use the following procedure to install the GRID vGPU drivers on the desktop virtual machine and to install the HDX

3D Pro VDA to prepare this virtual machine to be managed by the XenDesktop controller. To fully enable vGPU

operation, the NVIDIA driver must be installed.

Before the NVIDIA driver is installed on the guest virtual machine, the Device Manager shows the Standard VGA

Graphics Adapter (Figure 43).

Figure 43. Device Manager Before the NVIDIA Driver Is Installed

1. Copy the Windows drivers from the NVIDIA GRID vGPU driver pack downloaded earlier to the master virtual

machine. Alternatively, download the drivers from http://www.nvidia.com/drivers and extract the contents

(Figure 44).

Note: Do not select the GRID Series drivers.

Page 41: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 41 of 52

Figure 44. NVIDIA Driver Download

2. Copy the 32- or 64-bit NVIDIA Windows driver from the vGPU driver pack to the desktop virtual machine and

run setup.exe (Figure 45).

Figure 45. NVIDIA Driver Pack

Note: The vGPU host driver and guest driver versions need to match. Do not attempt to use a newer guest

driver with an older vGPU host driver or an older guest driver with a newer vGPU host driver . In addition, the

vGPU driver from NVIDIA is a different driver than the GPU pass-through driver.

3. Install the graphics drivers using the Express option (Figure 46). After the installation has been completed

successfully, restart the virtual machine.

Note: Ensure that remote desktop connections have been enabled. After this step, console access may not be

usable to the virtual machine when connecting from a vSphere Client.

Page 42: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 42 of 52

Figure 46. Selecting the Express Installation Option

4. To validate the successful installation of the graphics drivers as well as the vGPU device, open Windows

Device Manager and expand the Display Adapter section (Figure 47).

Figure 47. Validating the Driver Installation

Note: If you continue to see an exclamation mark, the following are the most likely the reasons:

● The GPU driver service is not running.

● The GPU driver is incompatible.

Page 43: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 43 of 52

5. To start the HDX 3D Pro VDA installation, mount the XenApp and XenDesktop 7.6 (or later) ISO image on the

virtual machine or copy the Feature Pack VDA installation media to the virtual desktop virtual machine.

6. Install Citrix XenDesktop HDX 3D Pro Virtual Desktop Agent (Figure 48). Reboot when prompted to do so.

Figure 48. XenDesktop Virtual Delivery Agent Installation Setup

7. Reboot the virtual machine after the VDA for HDX 3D Pro has been installed successfully (Figure 49).

Page 44: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 44 of 52

Figure 49. Successful Installation of the Virtual Delivery Agent

8. After the HDX 3D Pro Virtual Desktop Agent has been installed and the virtual machine rebooted successfully,

install the graphics applications, benchmark tools, and sample models that you to deliver to all users. Refer to

this blog for a list of graphics tools that you can use for evaluation and testing purposes.

http://blogs.citrix.com/2014/08/13/citrix-hdx-the-big-list-of-graphical-benchmarks-tools-and-demos/

Verify That Applications Are Ready to Support the vGPU

Verify that the NVIDIA driver is running.

1. Right-click the desktop. The NVIDIA Control Panel will be listed in the menu; select it to open the control

panel.

2. Select System Information in the NVIDIA control panel to see the vGPU that the virtual machine is using, the

vGPU’s capabilities, and the NVIDIA driver version that is loaded (Figure 50).

Page 45: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 45 of 52

Figure 50. NVIDIA Control Panel

Page 46: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 46 of 52

Why Use NVIDIA GRID vGPU for Graphic Deployments

GRID vGPU allows multiple virtual desktops to share a single physical GPU, and it allows multiple GPUs to reside

on a single physical PCI card. All provide the 100 percent application compatibility of vDGA pass-through graphics,

but with lower cost because multiple desktops share a single graphics card. With XenDesktop, you can centralize,

pool, and more easily manage traditionally complex and expensive distributed workstations and desktops. Now all

your user groups can take advantage of the benefits of virtualization.

The GRID vGPU brings the full benefits of NVIDIA hardware-accelerated graphics to virtualized solutions. This

technology provides exceptional graphics performance for virtual desktops equivalent to local PCs that share a

GPU among multiple users.

GRID vGPU is the industry's most advanced technology for sharing true GPU hardware acceleration among

multiple virtual desktops—without compromising the graphics experience. Application features and compatibility

are exactly the same as they would be at the user's desk.

With GRID vGPU technology, the graphics commands of each virtual machine are passed directly to the GPU,

without translation by the hypervisor. By allowing multiple virtual machines to access the power of a single GPU in

the virtualization server, enterprises can increase the number of users with access to true GPU-based graphics

acceleration on virtual machines.

The physical GPU in the server can be configured with a specific vGPU profile. Organizations have a great deal of

flexibility in how best to configure their servers to meet the needs of various types of end users.

vGPU support allows businesses to use the power of the NVIDIA GRID technology to create a whole new class of

virtual machines designed to provide end users with a rich, interactive graphics experience.

vGPU Profiles

In any given enterprise, the needs of individual users vary widely. One of the main benefits of GRID vGPU is the

flexibility to use various vGPU profiles designed to serve the needs of different classes of end users.

Although the needs of end users can be diverse, for simplicity users can be grouped into the following categories:

knowledge workers, designers, and power users.

● For knowledge workers, the main areas of importance include office productivity applications, a robust web

experience, and fluid video playback. Knowledge workers have the least-intensive graphics demands, but

they expect the same smooth, fluid experience that exists natively on today’s graphics-accelerated devices

such as desktop PCs, notebooks, tablets, and smartphones.

● Power users are users who need to run more demanding office applications, such as office productivity

software, image editing software such as Adobe Photoshop, mainstream CAD software such as Autodesk

AutoCAD, and product lifecycle management (PLM) applications. These applications are more demanding

and require additional graphics resources with full support for APIs such as OpenGL and Direct3D.

● Designers are users in an organization who run demanding professional applications such as high-end CAD

software and professional digital content creation (DCC) tools. Examples include Autodesk Inventor, PTC

Creo, Autodesk Revit, and Adobe Premiere. Historically, designers have used desktop workstations and

have been a difficult group to incorporate into virtual deployments because of their need for high-end

graphics and the certification requirements of professional CAD and DCC software.

Page 47: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 47 of 52

vGPU profiles allow the GPU hardware to be time-sliced to deliver exceptional shared virtualized graphics

performance (Figure 51).

Figure 51. GRID vGPU GPU System Architecture

Page 48: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 48 of 52

Additional Configurations

Install the HDX 3D Pro Virtual Desktop Agent Using the CLI

When you use the installer's GUI to install a VDA for a Windows desktop, simply select Yes on the HDX 3D Pro

page. When you use the CLI, include the /enable_hdx_3d_pro option with the XenDesktop VdaSetup.exe

command.

To upgrade HDX 3D Pro, uninstall both the separate HDX 3D for Professional Graphics component and the VDA

before installing the VDA for HDX 3D Pro. Similarly, to switch from the standard VDA for a Windows desktop to the

HDX 3D Pro VDA, uninstall the standard VDA and then install the VDA for HDX 3D Pro.

Install and Upgrade NVIDIA Drivers

The NVIDIA GRID API provides direct access to the frame buffer of the GPU, providing the fastest possible frame

rate for a smooth and interactive user experience. If you install NVIDIA drivers before you install a VDA with HDX

3D Pro, NVIDIA GRID is enabled by default.

To enable GRID on a virtual machine, disable Microsoft Basic Display Adapter from the Device Manager. Run the

following command:

Montereyenable.exe -enable -noreset

Then restart the VDA.

If you install NVIDIA drivers after you install a VDA with HDX 3D Pro, GRID is disabled. Enable GRID by using the

MontereyEnable tool provided by NVIDIA.

To disable GRID, run the following command:

Montereyenable.exe -disable –noreset

Then restart the VDA.

Use HDX Monitor

Use the HDX Monitor tool (which replaces the Health Check tool) to validate the operation and configuration of

HDX visualization technology and to diagnose and troubleshoot HDX problems. To download the tool and learn

more about it, see https://taas.citrix.com/hdx/download/.

Optimize the HDX 3D Pro User Experience

To use HDX 3D Pro with multiple monitors, be sure that the host computer is configured with at least as many

monitors as are attached to user devices. The monitors attached to the host computer can be either physical or

virtual.

Do not attach a monitor (either physical or virtual) to a host computer while a user is connected to the virtual

desktop or application providing the graphical application. Doing so can cause instability for the duration of a user's

session.

Let your users know that changes to the desktop resolution (by them or an application) are not supported while a

graphical application session is running. After closing the application session, a user can change the resolution of

the Desktop Viewer window in the Citrix Receiver Desktop Viewer Preferences.

When multiple users share a connection with limited bandwidth (for example, at a branch office), Citrix

recommends that you use the “Overall session bandwidth limit” policy setting to limit the bandwidth available to

Page 49: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 49 of 52

each user. This setting helps ensure that the available bandwidth does not fluctuate widely as users log on and off.

Because HDX 3D Pro automatically adjusts to make use of all the available bandwidth, large variations in the

available bandwidth over the course of user sessions can negatively affect performance.

For example, if 20 users share a 60-Mbps connection, the bandwidth available to each user can vary between 3

Mbps and 60 Mbps, depending on the number of concurrent users. To optimize the user experience in this

scenario, determine the bandwidth required per user at peak periods and limit users to this amount at all times.

For users of a 3D mouse, Citrix recommends that you increase the priority of the generic USB redirection virtual

channel to 0. For information about changing the virtual channel priority, see CTX128190.

Use GPU Acceleration for Microsoft Windows Server DirectX, Direct3D, and WPF Rendering

DirectX, Direct3D, and WPF rendering is available only on servers with a GPU that supports display driver interface

(DDI) Version 9ex, 10, or 11.

● On Windows Server 2008 R2, DirectX and Direct3D require no special settings to use a single GPU.

● On Windows Server 2012, RDS sessions on the remote desktop session host server use the Microsoft

Basic Render driver as the default adapter. To use the GPU in RDS sessions on Windows Server 2012,

enable the “Use the hardware default graphics adapter for all Remote Desktop Services sessions” setting in

the group policy by choosing Local Computer Policy > Computer Configuration > Administrative Templates

> Windows Components > Remote Desktop Services > Remote Desktop Session Host > Remote Session

Environment.

● On Windows Server 2008 R2 and Windows Server 2012, all DirectX and Direct3D applications running in all

sessions use the same single GPU by default. To enable experimental support for distributing user sessions

across all eligible GPUs for DirectX and Direct3D applications, create the following settings in the registry of

the server running Windows Server sessions:

◦ [HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\CtxHook\AppInit_Dlls\Graphics Helper]

"DirectX"=dword:00000001

◦ [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Citrix\CtxHook\AppInit_Dlls\Graphics Helper]

"DirectX"=dword:00000001

● To enable WPF applications to render using the server's GPU, create the following settings in the registry of

the server running Windows Server sessions:

◦ [HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\CtxHook\AppInit_Dlls\Multiple Monitor Hook]

"EnableWPFHook"=dword:00000001

◦ [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Citrix\CtxHook\AppInit_Dlls\Multiple Monitor

Hook] "EnableWPFHook"=dword:00000001

Use GPU Acceleration for Windows Server: Experimental GPU Acceleration for NVIDIA CUDA or

OpenCL Applications

Experimental support is provided for GPU acceleration of CUDA and OpenCL applications running in a user

session. This support is disabled by default, but you can enable it for testing and evaluation purposes.

To use the experimental CUDA acceleration features, enable the following registry settings:

● [HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\CtxHook\AppInit_Dlls\Graphics Helper]

"CUDA"=dword:00000001

Page 50: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 50 of 52

● [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Citrix\CtxHook\AppInit_Dlls\Graphics Helper]

"CUDA"=dword:00000001

To use the experimental OpenCL acceleration features, enable the following registry settings:

● [HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\CtxHook\AppInit_Dlls\Graphics Helper]

"OpenCL"=dword:00000001

● [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Citrix\CtxHook\AppInit_Dlls\Graphics Helper]

"OpenCL"=dword:00000001

Use OpenGL Software Accelerator

The OpenGL Software Accelerator is a software rasterizer for OpenGL applications such as ArcGIS, Google Earth,

Nehe, Maya, Blender, Voxler, CAD, and CAM. In some cases, the OpenGL Software Accelerator can eliminate the

need to use graphics cards to deliver a good user experience with OpenGL applications.

Important: The OpenGL Software Accelerator is provided as is and must be tested with all applications. It may not

work with some applications and is intended as a solution to try if the Windows OpenGL rasterizer does not provide

adequate performance. If the OpenGL Software Accelerator works with your applications, it can be used to avoid

the cost of GPU hardware.

The OpenGL Software Accelerator is provided in the Support folder on the installation media, and it is supported on

all valid VDA platforms.

Try the OpenGL Software Accelerator in the following cases:

● If the performance of OpenGL applications running in virtual machines is a concern, try using the OpenGL

accelerator. For some applications, the accelerator outperforms the Microsoft OpenGL software rasterizer

that is included with Windows because the OpenGL accelerator uses SSE4.1 and AVX. The OpenGL

accelerator also supports applications using OpenGL versions up to Version 2.1.

● For applications running on a workstation, first try the default version of OpenGL support provided by the

workstation's graphics adapter. If the graphics card is the latest version, in most cases it will deliver the best

performance. If the graphics card is an earlier version or does not deliver satisfactory performance, then try

the OpenGL Software Accelerator.

● 3D OpenGL applications that are not adequately delivered using CPU-based software rasterization may

benefit from OpenGL GPU hardware acceleration. This feature can be used on bare-metal devices or virtual

machines.

Conclusion

The combination of Cisco UCS Manager, Cisco UCS C240 M4 Rack Servers, NVIDIA GRID K1 and K2 or Tesla

cards using VMware vSphere ESXi 6.0, and Citrix XenDesktop 7.6 provides a high-performance platform for

virtualizing graphics-intensive applications.

By following the guidance in this document, our customers and partners can be assured that they are ready to host

the growing list of graphics applications that are supported by our partners.

Page 51: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 51 of 52

For More Information

● Cisco UCS C-Series Rack Servers

◦ http://www.cisco.com/en/US/products/ps10265/

◦ http://www.cisco.com/c/en/us/products/servers-unified-computing/ucs-c240-m4-rack-server/index.html

◦ http://www.cisco.com/c/en/us/products/interfaces-modules/ucs-virtual-interface-card-1227/index.html

◦ http://www.cisco.com/c/en/us/products/servers-unified-computing/ucs-c-series-rack-servers/index.html

● NVIDIA

◦ http://www.nvidia.com/object/grid-citrix-deployment-guide.html

◦ http://www.nvidia.com/object/grid-enterprise-resources.html - deploymentguides

◦ http://www.nvidia.com/content/cloud-computing/pdf/nvidia-grid-datasheet-k1-k2.pdf

◦ http://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-b-series-blade-

servers/tesla_kseries_overview_lr.pdf

◦ http://www.nvidia.com/content/grid/pdf/GRID_K1_BD-06633-001_v02.pdf

◦ http://www.nvidia.com/content/grid/pdf/GRID_K2_BD-06580-001_v02.pdf

◦ http://www.nvidia.com/content/tesla/pdf/tesla-kseries-overview-lr.pdf

● Citrix XenApp and XenDesktop 7.6

◦ https://www.citrix.com/content/dam/citrix/en_us/documents/products-solutions/virtualize-3d-professional-

graphics-design-guide.pdf

◦ http://docs.citrix.com/en-us/xenapp-and-xendesktop/7/cds-deliver-landing/hdx-enhance-ux-xd/hdx-sys-

reqs.html

◦ http://docs.citrix.com/en-us/xenapp-and-xendesktop/7-6/xad-hdx-landing/xad-hdx3dpro-gpu-accel-

desktop.html

◦ http://docs.citrix.com/en-us/xenapp-and-xendesktop/7-6/xad-hdx-landing/xad-hdx3dpro-gpu-accel-

server.html

◦ http://docs.citrix.com/en-us/xenapp-and-xendesktop/7-6/xad-hdx-landing/xad-hdx-opengl-sw-accel.html

◦ https://www.citrix.com/content/dam/citrix/en_us/documents/products-solutions/reviewers-guide-for-hdx-

3d-pro.pdf

◦ https://www.citrix.com/content/dam/citrix/en_us/documents/go/reviewers-guide-remote-3d-graphics-apps-

part-2-vsphere-gpu-passthrough.pdf

◦ http://docs.citrix.com/en-us/xenapp-and-xendesktop/7-6/xad-hdx-landing/xad-hdx3dpro-intro.html

◦ http://support.citrix.com/proddocs/topic/xenapp-xendesktop-76/xad-whats-new.html

◦ http://support.citrix.com/proddocs/topic/xenapp-xendesktop-76/xad-architecture-article.html

◦ file://localhost/XenDesktop 7.x Handbook http/::support.citrix.com:article:CTX139331

◦ file://localhost/XenDesktop 7.x Blueprint http/::support.citrix.com:article:CTX138981

◦ http://blogs.citrix.com/2014/08/13/citrix-hdx-the-big-list-of-graphical-benchmarks-tools-and-demos/

Page 52: Integrate Cisco UCS C240 M4 Rack Server with NVIDIA … UCS B200 M4 Blade Server: Delivering performance, versatility, and density without compromise, the Cisco UCS B200 M4 addresses

© 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 52 of 52

● Microsoft Windows and Citrix optimization guides for virtual desktops

◦ http://support.citrix.com/article/CTX125874

◦ http://support.citrix.com/article/CTX140375

◦ http://support.citrix.com/article/CTX117374

● VMware vSphere ESXi and vCenter Server 6

◦ http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2

107948

◦ http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2

109712

◦ http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2

033434

Printed in USA C11-736417-00 12/15