Top Banner
SUM209: XenDesktop in the Enterprise - Best Practices and Lessons Learned Nick Rintalan, Senior Architect Thomas Berger, Architect Citrix Consulting May 2011
50
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sum209

SUM209: XenDesktop in the Enterprise - Best Practices and Lessons Learned

Nick Rintalan, Senior ArchitectThomas Berger, ArchitectCitrix ConsultingMay 2011

Page 2: Sum209

XenDesktop Architecture Review

User

XD ControllerWeb Interface

Active Directory

License Server

Provisioning Server

Storage

Virtual Desktop

Hypervisor

HT

TP

(s)

XM

L LDA

PH

TT

P(s

)

UDP

HT

TP

(s)

/ W

CF

NFS, iSCSI, FC

CIFSNFS

iSCSIFC

ICA

SQL Server

Page 3: Sum209

XenDesktop Architecture Review

User

XD ControllerWeb Interface

Active Directory

License Server

Provisioning Server

Storage

Virtual Desktop

Hypervisor

HT

TP

(s)

XM

L LDA

PH

TT

P(s

)

UDP

HT

TP

(s)

/ W

CF

NFS, iSCSI, FC

CIFSNFS

iSCSIFC

ICA

SQL Server

Page 4: Sum209

• WI 5.x itself scales to the IIS specification• The real bottleneck in “Web Interface Scalability” is the XML Service

• Placing WI on either 2003 or 2008 has no real impact on scalability• However, using 2008-based XML Brokers reduces scalability by almost 50%!• We can no longer “exploit” the Log on Locally user right assignment in 2008

• A 2008 R2 WI box with 2 CPU and 2 GB RAM has been proven to scale to ~31k users/hour or ~9 users/sec• Almost 60k users/hour with 2003-based XML Brokers

Web Interface Scalability

Page 5: Sum209

Citrix Confidential - Do Not Distribute

• Always deploy 2 Web Interface servers for redundancy

• Use a hardware load balancer if possible (i.e. NetScaler)• Intelligent monitoring of WI availability / XML Service

• WI is a good candidate for virtualization• 2 vCPUs and 4 GB RAM is a good starting spec

• Check if encryption is required (User WI XML)• Otherwise user credentials are transferred as clear text

• Disable Socket Pooling if not using SSL

Enterprise Considerations

Page 6: Sum209

XenDesktop Architecture Review

User

XD ControllerWeb Interface

Active Directory

License Server

Provisioning Server

Storage

Virtual Desktop

Hypervisor

HT

TP

(s)

XM

L LDA

PH

TT

P(s

)

UDP

HT

TP

(s)

/ W

CF

NFS, iSCSI, FC

CIFSNFS

iSCSIFC

ICA

SQL Server

Page 7: Sum209

Citrix Confidential - Do Not Distribute

• Important to understand the new XenDesktop architecture• No more IMA service or Data Store• New XD5 Controllers are stateless and the database (SQL) is relational• The SQL database must be made highly available• Many of the services (XML, etc.) have been re-written in .NET• Registry-based discovery & registration is now used by default• New architecture allows for greater scalability

The All New XenDesktop 5 Controller

Page 8: Sum209

Citrix Confidential - Do Not Distribute

• In XenDesktop 4 it was a best practice to dedicate servers to certain roles• 2 DDCs for IMA and PMS• 2 DDCs for XML and Controller

• In XenDesktop 5 it is recommended to not dedicate servers• 2 DDCs per XD5 “site” (instead of 4 DDCs per XD4 farm) • Site services are load balanced automatically• Specify all XD5 Controllers as XML Brokers in Web Interface

DDC Bottlenecks (XD4 vs. XD5)

Page 9: Sum209

Citrix Confidential - Do Not Distribute

• Scalability tests using two (physical) 2x4’s and 16 GB RAM:• Boot Storm (20,000 desktops in 10 minutes): 40-50% CPU utilization during the

virtual desktop registration process• Logon Storm (20,000 logons in 13 minutes): 50-60% CPU utilization during user

connection process

• User Perception: 99.9% of the brokered connections responded to launch requests in less than 2.5 seconds

XD5 Controller Scalability

Page 10: Sum209

Citrix Confidential - Do Not Distribute

• Rough estimate based on scalability tests:• A single XD5 site can scale to 10,000 desktops with 2 Controllers in most cases

• Need to get more granular?• 125-180 virtual desktop registrations per minute per dedicated core• 100-120 user logons per minute per dedicated core• Assumes the desktops are delivered via PVS and the Controller’s CPUs are not

shared with other components

XD5 Controller Scalability

Page 11: Sum209

Citrix Confidential - Do Not Distribute

• XenDesktop 5 uses SQL Database:• To store all configuration and session information• A message bus between the ControllersAllows for a more flexible architecture (stateless DDCs!)This causes a massively higher performance impact on SQL

XD SQL Database Scalability

Page 12: Sum209

Citrix Confidential - Do Not Distribute

• Scalability tests using three (physical) 2x4’s with 16 GB RAM:• Boot Storm (20,000 desktops in 10 minutes):

• 15-25% CPU (SQL principal database) • 5-10% CPU (SQL mirror) • SQL witness was essentially idle

• Logon Storm (20,000 logons in 13 minutes): • 32% CPU (SQL principal database)• 10% CPU (SQL mirror)• SQL witness was essentially idle

XD SQL Database Scalability

Page 13: Sum209

Citrix Confidential - Do Not Distribute

Pay Attention to the Transaction Log!

• 20,000 desktop scenario

• High number of SQL transactions• 666 transactions / sec equals 20k

desktops sending heart beat (every 30s)

• Can cause transaction log to grow excessively (gigabytes)– Check CTX126916

Page 14: Sum209

Citrix Confidential - Do Not Distribute

• Ensure SQL database is made highly available• Check Citrix XenDesktop Design Handbook (bit.ly/xdhandbook) for SQL

Database Sizing

• The database itself will be small (MBs), but the transaction log will be big (GBs)

• Leverage database mirroring, failover clustering or “HA” built into the hypervisor

Enterprise Considerations – XD SQL Database

Page 15: Sum209

XenDesktop Architecture Review

User

XD ControllerWeb Interface

Active Directory

License Server

Provisioning Server

Storage

Virtual Desktop

Hypervisor

HT

TP

(s)

XM

L LDA

PH

TT

P(s

)

UDP

HT

TP

(s)

/ W

CF

NFS, iSCSI, FC

CIFSNFS

iSCSIFC

ICA

SQL Server

Page 16: Sum209

• Most frequently asked question is:

How many VMs / box?

• The most definitive answer is:

It depends!

Hypervisor Scalability

Page 17: Sum209

Citrix Confidential - Do Not Distribute

…because all users and apps are different!

• Real world ratios range from…

Hypervisor Scalability – Why Does it Depend?

16 VMs per Core(Light Task Worker)

4 Cores per VM(Heavy Trader / CAD)

Page 18: Sum209

Citrix Confidential - Do Not Distribute

• You will need to test!• Formal P&S Testing with tools such as ESLT or LoadRunner are preferred

• If P&S Testing cannot be performed, conduct an extended pilot within your environment• This will at least allow you to gather some baseline data• Don’t forget to include a “buffer” in case you’re off

Hypervisor Scalability – Where to Begin?

Page 19: Sum209

Citrix Confidential - Do Not Distribute

• Gather some performance statistics from existing workstations• Only works if the workload will not change when going virtual

• 3rd party software may help such as:• Liquidware Labs Stratusphere• Novell PlateSpin Recon• Microsoft Assessment and Planning (MAP) Toolkit for Hyper-V

• Make sure to include an even bigger “buffer”

What if a Pilot is Not Possible?

Page 20: Sum209

Virtualization Layer…once it is implemented

Citrix Confidential - Do Not Distribute

Page 21: Sum209

• XD on XenServer, Hyper-V and vSphere is all about the same in terms of user density• Architecture and features can be slightly different

• Processors that support nested paging are highly recommended • Extended Page Tables (Intel)• Rapid Virtualization Indexing (AMD)

• Remember to “save” 1 core* and ~1-3 GB of memory* for the hypervisor itself• However, XS 5.6 FP1 now uses 4 CPUs by default instead of 1

Hypervisor Scalability

Page 22: Sum209

• Certain memory over-commitment features can be helpful in VDI deployments• XenServer, vSphere and Hyper-V now all support dynamic memory

management (essentially “ballooning”)• Transparent Page Sharing (TPS) doesn’t help much with “new” operating

systems (legacy 4KB pages vs. new large 2 MB pages)

• Don’t turn anything on/off unless you know what you’re doing

Hypervisor Scalability

Page 23: Sum209

• Dedicate NICs for certain functions • Ensures highest scalability• Eases monitoring / trend analysis

• Avoid single points of failure• Create Bonds/Teams whenever possible

• NIC virtualization (i.e. HP Virtual Connect Flex-10) can help here greatly

Hypervisors – Network Recommendations

Page 24: Sum209

eth0 + eth 4User and

Infrastructure traffic

Embedded (dual port) Add on (quad port)

Hypervisor NIC Bonding

eth2 + eth5Host Management

and HA traffic

eth1 + eth3Storage traffic

Page 25: Sum209

Citrix Confidential - Do Not Distribute

• Create separate resource pools or clusters for servers and desktops

• Be aware of the limitations of your hypervisor:• XenServer Pool Size• Maximum supported VMs per vCenter / SCVMM instance

• Be aware of the specific requirements of your hypervisor:• Hyper-V save state file• Resource requirements for vCenter / SCVMM• XenServer dom0 vCPU and RAM assignment

Hypervisors – General Recommendations

Page 26: Sum209

XenDesktop Architecture Review

User

XD ControllerWeb Interface

Active Directory

License Server

Provisioning Server

Storage

Virtual Desktop

Hypervisor

HT

TP

(s)

XM

L LDA

PH

TT

P(s

)

UDP

HT

TP

(s)

/ W

CF

NFS, iSCSI, FC

CIFSNFS

iSCSIFC

ICA

SQL Server

Page 27: Sum209

• CPU and Memory are not typically the bottlenecks

• Disk and Network I/O are!

• With careful design and planning, PVS can be virtualized

• The virtual vs. physical decision depends on several factors:• Number and type of target devices (50 target devices vs. 5000, XA vs. XD, etc.)• Networking infrastructure and NICs (1 Gb vs. 10 Gb, SR-IOV compatibility, etc.)• Hypervisor and LACP support (vSphere vs. XenServer)

Provisioning Server Scalability

Page 28: Sum209

Citrix Confidential - Do Not Distribute

• PVS cannot cache, but Windows can!

• Block-level storage makes it easy• FC, iSCSI and even local disk storage• No caching by default with CIFS or NFS

• It is possible, but need to tweak!

• 64-bit is the key to success• It’s all about the File/System Cache!• Read IOPS from the vDisk Store in the Steady State

approach ZERO!

PVS Scalability – Eliminating the Disk Bottleneck

Page 29: Sum209

• The number of unique vDisks streamed simultaneously greatly affects scalability• Total RAM Required = 2 GB + (# vDisks x 2 GB)

• Shared storage optimized for write performance is ideal• 80-90% writes is common in XD deployments in the steady-state• RAID 1, 10 are good options (RAID 5 is bad)

• More streams = longer failover process• 1,000 streams ≈ 5 minutes• 1,500 streams ≈ 8 minutes

PVS Scalability – Enterprise Considerations

Page 30: Sum209

• Implement at least two PVS servers• Use PVS HA Mode to distribute the load• In case of different hardware, leverage PVS “Power Rating” feature

• Teaming NICs for throughput is also highly recommended to achieve maximum scalability

• As a rule of thumb, for every 1 Gb NIC, expect ~500 target devices able to be streamed by PVS

• Citrix has scaled a single PVS box up to 3300 target devices• Consulting typically scales each PVS box to 1000-1500 target devices

PVS Scalability – Enterprise Considerations

Page 31: Sum209

XenDesktop Architecture Review

User

XD ControllerWeb Interface

Active Directory

License Server

Provisioning Server

Storage

Virtual Desktop

Hypervisor

HT

TP

(s)

XM

L LDA

PH

TT

P(s

)

UDP

HT

TP

(s)

/ W

CF

NFS, iSCSI, FC

CIFSNFS

iSCSIFC

ICA

SQL Server

Page 32: Sum209

Storage Recommendations

• Quick and Dirty estimates• 5 simultaneous bootups per spindle• 12 simultaneous logons per spindle• 14 simultaneous logoffs per spindle• 18 simultaneous users per spindle

• Capacity calculations impacted by• Disk speed• RAID level• Read/Write % (20/80)• User Activity (example values)

Activity IOPS

Startup 26

Logon 12.5

Working 8

Logoff 10.7

RAID Level Write Cost

0 1

1 or 10 2

5 4

Disk Speed Random IOPS

15,000 150

10,000 110

5,400 50

Page 33: Sum209

• It’s all about the Write IOPS

• 90/10 Write to Read Ratio is common in the steady-state

• RAID 1 or 10 is best, RAID 5 or 6 *not* recommended (unless a huge amount of spindles or write cache on RAID controller)

• Spread the write cache drives over several LUNs and ensure the LUNs are sized properly• For example, with 100 desktops and 5 GB write cache drives, consider using

4 or 5 100-150 GB RAID10 LUNs

Storage Usage: Write Cache

Page 34: Sum209

• Check registry setting (CTX123570 - FILE_NO_INTERMEDIATE_BUFFERING)

• When the Write Cache fills up, expect a BSOD or hang/freeze

• Things that cause Write Cache activity to be high:• Boot / Shutdown / User logging on or off • User starting application (streamed or local, hosted should have minimal effect)• Application behavior and profile solution

• Windows Perfmon <Physical Disk \ Disk Writes/sec> (\ Disk Transfers / sec gives you the whole picture)

Storage usage: Write Cache (cont’)

Page 35: Sum209

Example (very simple):

• 1,000 VM environment (Windows 7, 20 GB vDisks)

Why is “How much disk space do I need for the write cache” a dangerous question?

• Write Cache max. size est. to 5 GB

= Write Cache space 5 TB

11 x 500 GB disks (RAID 5)20 x 500 GB disks (RAID 10)

• If we want to be cost-conscious, we opt for SATA

• IOPS est. to peak at 20 IOPS / user

• Estimated 100 users simultaneous (R/W ratio 30/70)

• IO load - Logical 2,000 IOPS (600 reads / 1400 writes)

• IO load - Physical RAID 5 = 600 + 4*1400 = 6,200 IOPS over 120 SATA disks required (50 IOPS / disk) over 40 SCSI disks (150 IOPS / disk)

Page 36: Sum209

Citrix Confidential - Do Not Distribute

• HBAs must support disk throughput• Especially important for shared HBA scenarios (i.e. Blades)

• Storage controllers must cope with the load• SAN may not be exclusively for VDI

• Prevent concurrent hard disk intensive tasks• Active Scan after antivirus pattern update / Scheduled defrag

• Use only fixed-size VHDs for write-cache drives and Provisioning Services vDisks• Disk can become fragmented on physical media

Storage Considerations

Page 37: Sum209

XenDesktop Architecture Review

User

XD ControllerWeb Interface

Active Directory

License Server

Provisioning Server

Storage

Virtual Desktop

Hypervisor

HT

TP

(s)

XM

L LDA

PH

TT

P(s

)

UDP

HT

TP

(s)

/ W

CF

NFS, iSCSI, FC

CIFSNFS

iSCSIFC

ICA

SQL Server

Page 38: Sum209

Virtual Desktop Operating System

• Windows XP• Requires around 5000 IOPS each for startup• File system alignment issues need to be dealt with

• Windows 7• Generates more IOPS than XP for startup / logon (VRC shows +83% for boot)• However, generates considerably less IOPS than XP when working / idle• Optimized for virtualized environments (Host Integration Services)• Windows 7 performed better on Hyper-V (go figure!)

Page 39: Sum209

User Group Operating System

vCPU Allocation

Memory Allocation

Avg IOPS (Steady State)

Estimate Users/Core

Light Windows XP 1 768MB-1 GB 4-8 10-12  Windows 7 1 1-1.5 GB 5-10 8-10Normal Windows XP 1 1-1.5 GB 8-14 8-10  Windows 7 1 1.5-2 GB 10-15 6-8Power Windows XP 1 1.5-2 GB 14-25 6-8  Windows 7 1-2 2-3 GB 15-30 4-6Heavy Windows XP 1 2 GB 25-50 4-6  Windows 7 2 4 GB 30-60 2-4

Average Resource Allocation

• Windows XP base image• Uniprocessor HAL: give 2 vCPUs and hypervisor won’t utilize• Multiprocessor HAL: use 1 vCPU and waste resources while system tries to align processors

Do NOT give virtual desktops more resources than needed

Page 40: Sum209

Citrix Confidential - Do Not Distribute

• There is a myriad of desktop optimization guides available

• Good guides are:• Citrix XenDesktop Design Handbook (bit.ly/xdhandbook)

• Windows XP / 7 – Optimization Guides• CTX124239• http://www.virtualrealitycheck.net• http://www.virtualfeller.com• http://www.citrixtools.net

Desktop Optimizations

Page 41: Sum209

XenDesktop Architecture Review

User

XD ControllerWeb Interface

Active Directory

License Server

Provisioning Server

Storage

Virtual Desktop

Hypervisor

HT

TP

(s)

XM

L LDA

PH

TT

P(s

)

UDP

HT

TP

(s)

/ W

CF

NFS, iSCSI, FC

CIFSNFS

iSCSIFC

ICA

SQL Server

Page 42: Sum209

• License check-outs• A standalone Intel Xeon 2.83 GHz quad-core processor with 4 GB of RAM is

able to handle 248 license check-outs per second (446,400 user / 30 minutes)• Dell PowerEdge 2650 with a 2.2 GHz processor can handle 170 license check-

outs per second (306,000 user / 30 minutes)• Virtually no CPU resources consumed for check-ins

• Great candidate for virtualization!

License Server Scalability

Page 43: Sum209

Before you leave…

• Recommended related breakout sessions: • SUM210 - Taking user experience to another level: understanding HDX technologies• SUM211 - It’s all about “me!”—user personalization and profiles

• Session surveys are available online at www.citrixsummit.com starting Thursday, May 26• Provide your feedback and pick up a complimentary gift at the registration desk

• Download presentations starting Friday, June 3, from your My Organizer Tool located in your My Synergy Microsite event account

Page 44: Sum209
Page 45: Sum209

Top 5 Questionsvery technical ones…

Page 46: Sum209

• With the previous XD versions, we always had a chance to specify the connection account (to the DB) with the IMA service. Is there still such functionality available or is this relevant at all?

• The XD5 services access the database using their computer account logins (domain\machine$).

FAQ

Page 47: Sum209

• How big will the transaction log become?

• Number of Virtual Desktop Agents X 24 Hours X approximately 62 kilobytes of data

• Example 1.000 Virtual Desktop Agent Farm in idle state: • 1.000 VDA X 24 X 62K = 1.480 megabytes

• Note: This can be substantially higher in active environments.• Check: CTX126916

FAQ

Page 48: Sum209

• How can VDA requests be effectively load balanced?

• Configure all the desktops with the addresses of all brokers. The Virtual Desktop Agent randomly selects one DDC from the list and tries to register with that DDC.

• Note: Hardware Load Balances / NLB does not work as communication is Kerberos based.

FAQ

Page 49: Sum209

• How does the VDA discover its XenDesktop site and controller?

• Registry based discovery (XD5 default)• HKLM\Software\Citrix\VirtualDesktopAgent\ListOfDDCs

• Active Directory browsing• Service Connection Point (SCP) / Controllers Group

• Quick deploy discovery (MCS only)• C:\Personality.ini contains list of Controllers

FAQ

Page 50: Sum209

• Why does Citrix recommend NFS for MCS, but not iSCSI and Fibre Channel?

FAQ