Top Banner
Introduction Background Reading Configuration Supported Hardware Supported software Storage configuration Logical Domain components Domains Networks CPU Memory OpenBoot PROM Physical and virtual console access Onboard cryptographic acceleration Resource Isolation CPU Isolation Memory Isolation Network Isolation Physical vs Virtual performance Disk I/O Isolation I/O Scaling Logical Domain Administration Building Logical Domains Converting a physical host into a control domain Configuring virtual storage for guest domains Creating guest domains Operational requirements Startup and shutdown Restarting domains Restarting daemons Adding storage to a guest Patching implications Console access Commissioning Decommissioning Gotchas Logical Domain Naming Conventions Hostnames Virtual disk servers Virtual disk devices Virtual disk Virtual switches Virtual networks Supporting Technologies Built automation - jumpstart Monitoring Capacity Management Backup and recovery Capacity Planning Inventory Outstanding work Background and further reading Appendix cpusin.master cpuspin.child Introduction Foswiki > Main Web > ToolsAndTechnologies > LDOM (19 Apr 2011, AnthonyBennett) LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM 1 de 21 26/04/2012 12:36 p.m.
21
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LDOM _ Main _ Foswiki

Introduction

Background Reading

Configuration

Supported Hardware

Supported software

Storage configuration

Logical Domain components

Domains

Networks

CPU

Memory

OpenBoot PROM

Physical and virtual console access

Onboard cryptographic acceleration

Resource Isolation

CPU Isolation

Memory Isolation

Network Isolation

Physical vs Virtual performance

Disk I/O Isolation

I/O Scaling

Logical Domain Administration

Building Logical Domains

Converting a physical host into a control domain

Configuring virtual storage for guest domains

Creating guest domains

Operational requirements

Startup and shutdown

Restarting domains

Restarting daemons

Adding storage to a guest

Patching implications

Console access

Commissioning

Decommissioning

Gotchas

Logical Domain Naming Conventions

Hostnames

Virtual disk servers

Virtual disk devices

Virtual disk

Virtual switches

Virtual networks

Supporting Technologies

Built automation - jumpstart

Monitoring

Capacity Management

Backup and recovery

Capacity Planning

Inventory

Outstanding work

Background and further reading

Appendix

cpusin.master

cpuspin.child

Introduction

Foswiki > Main Web > ToolsAndTechnologies > LDOM (19 Apr 2011, AnthonyBennett)

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

1 de 21 26/04/2012 12:36 p.m.

Page 2: LDOM _ Main _ Foswiki

This page details how a virtualised Solaris environment shall be implemented within using Logical_Domains and covers the

following areas:

High level configuration and supported hardware and software components

A breakdown of the various components in Logical Domains

Resource isolation - how resources are isolated or shared between guest domains and the performance implications of

guests under load

LDOM administration - how to use the Logical Domains toolset along with support implications of managing a virtualised

Solaris environment

LDOM component naming conventions - keeping things consistent across the estate.

Supporting technologies - other technology components required to operate a virtualised Solaris environment

Background Reading

For those new to Logical Domains, the following is a suggested reading list to get up to speed with the concepts:

Beginners Guide to LDoms: Understanding and Deploying Logical Domains - good walkthrough on the basics

LDOMS 1.0.3 Admin Guide - fills in the technical gaps to give a complete understanding of the technology.

Octave Orgeron produced a series of articles for USENIX: Part 1, Part 2, Part 3 and Part 4

For keeping up to date with latest LDOM information, the following blogs are worth subscribing to:

Ariel Hendel,

Sun virtualisation,

Liam Merwick,

Sun Omicron,

The Hyper Trap,

The Navel of Narcissus,

Virtual Steve,

Virtuality,

C0t0d0s0,

Sparks and white noise

Configuration

This section covers the configuration components of Logical Domains and standards to be adhered to when deploying domains.

Supported Hardware

LDOMs are supported on the Niagara range of servers (T1000 through T5240) and are standardising on the T5120 for delivering

LDOMs for the following reasons:

T5x20 servers have eight Floating Point Units or FPUs - one per core. In comparison, the Tx000 systems only have a single

FPU for the server. Having multiple FPUs provides a more general purpose server to cope with differing workloads

T5x20 servers have effectively replaced the Tx00 systems in terms of increased performance for a similar cost.

The T5x40 systems are dual CPU servers providing twice the compute resource of the T5x20. With Logical Domains being a

relatively new technology within , this increased density of consolidation is considered too high for initial adoption.

T5120 and T5220 servers offer the same CPU and memory capacity. The servers are 1U and 2U respectively with the

T5220 providing 8 internal disk slots rather than 4 and 6 PCI-E slots rather than 3. The additional storage and I/O capability

of the T5220 provides a more flexible server type to be used by both virtualised and non-virtualised systems.

Supported software

The supported software stack for Logical Domains is as follows:

Control Domain

Logical Domain manager 1.0.3

LDOM Toolset 1.0

Veritas Volume Manager 5.0 mp3

Control and guest domains

Solaris 10 update 5

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

2 de 21 26/04/2012 12:36 p.m.

Page 3: LDOM _ Main _ Foswiki

Storage configuration

There are two elements to LDOM storage configuration: Firstly the storage used to boot the control domain and secondly, storage

for guest domains.

As per existing Solaris standards, the control domain will boot from internal 146GB SAS disks which will be mirrored using Solaris

Volume Manager or SVM. Disk layouts will follow existing jumpstart standards and there are no LDOM-specific requirements with

regards to boot environment storage.

Storage for the guest domains will be SAN presented via a pair of Emulex LPe11000-S Host Bus Adaptors. This SAN storage will

be managed by Veritas Volume Manager and virtual disks for the guest OS will be constructed from veritas volumes. The use of

SAN based storage for the guest domains along with the use of Veritas Volume Manger offers the following benefits:

SAN storage increases availability levels as all storage is accessed via two independent fabrics and storage is automatically

configured in a highly available configuration with the use of RAID 5.

Using SAN based storage moves towards decoupling the control and guests domains and more easily allows for guest

portability and migration in the future.

As storage requirements increase, it is quicker and more efficient to meet this storage requirement from SAN storage rather

than increasing internal storage or possibly needing to by DAS storage.

SAN storage provides a richer set of functionality such as off-host snapshots and replication. Although not used at present,

this technology could be leveraged in the future.

In comparison to SVM, Veritas Volume Manager offers a powerful and more flexible interface to cope with future storage

demands.

ZFS is a relatively new product within Sun. ZFS offers considerable cost savings and will likely be investigated in the future

but at present, Veritas Volume Manager provides a more proven trustworthy storage management platform.

14 * 20GB SAN LUNs are presented to the I/O domain providing 280GB of usable storage for the guest domains. This storage is

subsequently configured into 30GB Veritas volumes, each of which are presented to the guest domains via a virtual disk server

running on the control domain.

The following diagram illustrates how virtualised IO works within Logical Domains.

Using the current Solaris 10 build, a 40GB boot disk leaves 26GB of usable storage for applications. The next Solaris build will be a

slimmer install and provide more flexible partitioning for application installs.

Further details around configuring storage and allocating/scaling guest storage can be found in the

[Logical_Domains#Logical_Domain_Administration|LDOM Administration] section

Logical Domain components

This section details the various resources used in an LDOM environment and how the resources are virtualised and made available

Domains

LDOMs allow four different types of domain to be created:

Control domain - creates and manages other domains. The control domain runs the Logical Domain Manager software provided

by the SUNWldm package. The first step in using Logical Domains is to convert a physical server build into the control domain. There

is a single control domain per server.

Service domain - provides virtual services to other domains. Such services include virtual switches, virtual disk and the virtual

console service. Within , the control and service domain are the same.

I/O domain - has direct ownership and access to I/O devices such as HBAs and network cards. I/O domains provide virtual disk

and network services to other domains. There can be a maximum of two I/O domains per server, one of which must be the control

domain. Within , the I/O and control domain are the same.

Guest domain - provides a virtual machine using services from the Service and I/O domain and managed by the Control domain.

For simplicity, the roles of the control, service and I/O domain are collapsed into a single domain. Throughout the document, the

terms Control, Service and I/O domain are used interchangeably.

The reason for collapsing control, service and I/O domains into a single domain are to reduce complexity and management

overhead. Additionally, the T5x20 server range has a single PCI-E root complex off which all PCI-E devices reside. This

configuration means it is not physically possible to configure a T5x20 in a split PCI configuration to enable two I/O domains.

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

3 de 21 26/04/2012 12:36 p.m.

Page 4: LDOM _ Main _ Foswiki

Networks

The control domain is the only domain to have direct network connectivity. It "owns" the network adaptors and is responsible for

routing all data to and from the public network.

Conversely, guest domains have no direct network connectivity themselves. To allow guest domains to connect with other systems,

vswitches are created on the control domain to virtualise network access for guest domains.

The following diagram shows how the virtualised network layer will be configured on a typical LDOM server:

The control domain will be presented with the three standard network connections: Two FMA connections will be provided

via separate switches for primary network connectivity and a single network connection will be presented for the backup

LAN. All three connections will be gigabit.

Three vswitches will be created on the Control domain. Each vswitch is connected to a physical interface and will provide

external network connectivity to the guest domains.

As part of initialising the Control domain, the Solaris network configuration is updated to use vswitch devices rather than

physical interfaces. The e1000g0 device is swapped with vsw0 , e1000g1 with vsw1 and e1000g2 with vsw2 . This switching

from physical to vswitch devices is required to enable the control domain to communicate to the guest domains directly over

TCP/IP. This communication is not an LDOM requirement, more an operational requirement that all systems on a common

subnet should be able to communicate with each other directly.

After converting network connectivity to vswitches, IP Multipathing or IPMP is implemented across the two FMA network

connections on the control domain. IPMP is configured in an active/passive configuration and this provides highly available

network connectivity to the Control domain.

When adding guest domains, each domain is given three virtual network devices with each vnet device connecting into the

existing vswitches. As for control domains, active/passive IPMP is configured across the two FMA networks for availability

reasons.

See Gotchas? for why network connectivity is not provided via two I/O domains.

CPU

The T5120 has an 8 core CPU with each core being able to run 8 separate threads. Additionally, each core has its own Floating

Point Unit or FPU. To the native operating system, the hardware appears as a 64 cpu system.

CPU allocation is managed at a thread level making it technically possible to have up to 64 domains on a single T5220. In practice,

supporting this number of small domains on a single server is inadvisable for the following reasons:

It is likely that the amount of resource allocated to each server is too small to be usable.

Such a high density of applications on a small server poses too high a risk in terms of the impact of a single hardware failure.

Where multiple operating systems share a common cpu core, workload in one domain could impact the performance of

another.

When allocating CPU resources to a domain, it is not possible to specify an individual thread or cpu core to allocate from - the

lowest available thread will be allocated. To provide repeatable guest domain performance, domains will be configured in multiples

of 8 threads which ensures one or more complete cores are allocated to a single domain. By default, guest domains will be given a

single core of 8 threads although this could be scaled in multiples of 8 threads if required.

CPU resources are hard-allocated to a domain so as long as each domain is allocated threads from a single CPU core, it should

not be possible for high CPU activity in one domain to impact another domain on the same server. (Benchmarking? supports this

statement).

CPU resources are hard allocated to a system but it is possible to administratively move CPU resources between domains via the

ldm command on the control domain. This dynamic reconfiguration can be performed live without the need for a domain reboot.

Memory

As for CPU resources, memory is also hard-allocated to a guest domain. Sun recommends a minimum of 512MB of memory per

Solaris 10 instance and LDOMs allow memory to be allocated in units as small as 4MB.

Although LDOMs allow memory to be allocated in arbitrarily small chunks, it is recommended that the memory allocated to a given

guest domain is directly related to the percentage of cpu resource allocated. Using the T5220 which allows for a recommended

maximum of 8 guests, 1/8th of the total system memory should be allocated along with each CPU core.

On each physical server, 128MB of physical memory is reserved for system use and cannot be allocated to guest domains. To

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

4 de 21 26/04/2012 12:36 p.m.

Page 5: LDOM _ Main _ Foswiki

ensure consistent guest domain configurations, the control domain will be allocated 3968MB allowing all guests to be allocated a

full 4GB.

This pairing of CPU and memory allocation allows for easier estate management as the use of LDOMs grows. Systems will never

be left with unused cpu or memory resources and migrating guest domains between physical systems will be easier as all virtual

systems will be a fraction of all available resources rather than a fully customisable amount.

OpenBoot? PROM

On traditional SPARC systems, the hardware provides a single OpenBoot? PROM or OBP environment on which the operating

system will run. With the introduction of the hypervisor layer and the ability to run concurrent operating systems on a single server,

each guest domain must have its own OBP environment.

A T5220 server will have a single OBP image held on firmware and each domain (even for non-virtualised single-system T5220s)

will load a copy of the OBP into RAM and execute from here. Once the operating system has booted, the OBP memory will be

released to be used by the OS.

Although each domain runs a virtual OBP, each domain maintains its own OBP variables and these variables along with the basic

domain configurations are stored on the system controller and will persist across domain reboots. Although OBP variables persist

across reboots, they will not automatically persist across a hardware powercycle. Operational workarounds to this issue are

detailed in the Administration section.

Physical and virtual console access

The T5220 console is accessible via both the serial port and network management port. Both of these routes provide console

access to the primary or Control domain in the same manner as for a physical box.

To provide console access for guest domains, a virtual network console service is configured as part of the control domain

initialisation. The vntsd daemon can bind to ports in the 5000-5100 range and as guest domains are created, they are automatically

allocated a port number to provide virtual console connectivity. By default, these TCP ports are only available internally to the

control domain but the svc:/ldoms/vntsd:default SMF service is adjusted at control domain initialisation time to ensure the ports

are accessible on the control domain public network. This is in preparation for all Sun console connections to be externally secured

and managed with Conserver

Onboard cryptographic acceleration

The Niagara CPU range provides cryptographic acceleration on the CPU with each of the 8 CPU cores having its own Modular

Arithmetic Unit or MAU. Performing cryptographic operations in hardware rather than software offers significant throughput

improvements. As an example using dsa1024 operations in a single threaded application provides a 245-fold improvement for

verify operations and 15-fold improvement for sign operations.

As part of the fidelity standard configuration, guest domains will be configured with a single MAU per guest domain based on the

configuration guidelines of a complete CPU core being allocated to each guest.

Note, for applications to make use of the onboard cryptographic acceleration, they must be configured to do so. Using the

Cryptographic Accelerator of the UltraSPARC T1 Processor is a good overview on how this can be achieved.

Resource Isolation

The LDOM architecture provides a mixture of dedicated and shared resources between domains. In the case of CPU and memory,

resources are provided via the hypervisor directly to the guest domain whereas storage and network I/O is virtualised via the control

domain.

The summary position of how resource isolation works in LDOMs is as follows:

For CPU bound applications?, it is not possible for CPU loading in one domain to impact another.

Memory bound applications? cannot impact memory access in another domain.

Network loading? scales across parallel LDOMs only marginally less than it would on a physical system.

IO scaling? suffers a 6-10% throughput penalty per domain as guest domains perform concurrent I/O.

These results suggest LDOMs are suitable for the consolidation of all general purpose applications with the exception of

data-bound applications such as databases and backup servers. Backup and database servers are well defined applications for

which other technologies are available to drive up utilisation levels.

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

5 de 21 26/04/2012 12:36 p.m.

Page 6: LDOM _ Main _ Foswiki

The following sections go into the detail of how each of the resource areas have been tested to illustrate domain isolation when

under specific loading.

CPU Isolation

When creating guest domains?, CPU resources will be allocated in groups of 8 to ensure complete cores are allocated to a given

domain. To test the impact of high CPU loading between logical domains, a T5220 has been split into 8 domains as follows:

{| | Hostname | Description | CPUs | Memory |---- |stella01 |Control and I/O domain |0 1 2 3 4 5 6 7 |4GB |---- |stella03 |Guest

domain |8 9 10 11 12 13 14 15 |4GB |---- |stella04 |Guest domain |16 17 18 19 20 21 22 23 |4GB |---- |stella05 |Guest domain |24

25 26 27 28 29 30 31 |4GB |---- |stella06 |Guest domain |32 33 34 35 36 37 38 39 |4GB |---- |stella07 |Guest domain |40 41 42 43

44 45 46 47 |4GB |---- |stella08 |Guest domain |48 49 50 51 52 53 54 55 |4GB |---- |stella09 |Guest domain |56 57 58 59 60 61 62

63 |4GB |---- |}

SciMark is a composite CPU benchmark which executes five computational kernels against small problem sets. For benchmarking,

the C version is used and executed three times and the average of the composite score is recorded.

SciMark? was run on the guest domain stella09 under three load scenarios as follows:

Scenario 1 SciMark? running on stella09. All remaining domains are idle

Scenario 2 SciMark? running on stella09. Remaining 6 guest domains are under heavy CPU loading scripts detailed in

appendix?. Control domain idle.

Scenario 3 SciMark? running on stella09. All guest domains and the control domain are under heavy CPU loading scripts

detailed in appendix?

The composite SciMark2? values are shown below:

{| | Scenario | Description | Composite SciMark2? score |---- |1 |All domains idle |34.42 |---- |2 |All guest domains under cpu load

|34.37 |---- |3 |All guest domains and control domain under cpu load |34.32 |---- |}

These results show that Logical Domains deliver highly effective CPU isolation between domains and high CPU loading does not

significantly impact the control domain.

Memory Isolation

LDOMS hard-allocate memory to each of the configured domains so it should not be possible for a memory-bound domain to

impact the performance of another domain. If applications have a higher memory footprint than provided by the domain

configuration, paging and ultimately swapping will occur which will use virtual I/O to access storage provided via the control domain.

As the virtual I/O layer and ultimately the underlying storage is shared between all guest domains, it is a possibility that significantly

oversubscribed domains could impact the performance of another guest domain. It is expected that application and standard OS

performance monitoring tools would detect and alert on such a scenario before it could impact other guest domains.

To prove memory bound applications do not significantly impact other running domains, the memrand benchmark has been used

from the libMicro portable microbenchmark suite. The choice of memrand follows a blog posting from Phil Harman illustrating

memory latency for increasingly parallel workloads.

As for the CPU testing, a T5220 was split into 8 domains with 4GB and 8 threads (1 complete CPU core) allocated to each domain

and memory tests were conducted under various levels of parallelism. Following Phils example, memrand was run to perform

negative stride pointer chasing to show memory latency as below:

# *./bin/memrand -s 128m -B 1000000 -T 8 -C 10 -L*

# ./bin/../bin-sun4v/memrand -s 128m -B 1000000 -T 8 -C 10 -L

prc thr usecs/call samples errors cnt/samp size

memrand 1 8 0.16282 12 0 1000000 134217728

The following table shows the memory latency under increasing levels of parallelism.

{| | Scenario | Description | Memory latency |---- |1 |memrand run as a single thread on a single guest domain. All other domains

idle |160.2 ns |---- |2 |memrand run as a single thread on two guest domains . |160.2 ns |---- |3 |memrand run as a single thread on

four guest domains |161.0 ns |---- |4 |memrand run as a single thread on 7 guest domains |161.7 ns |---- |5 |memrand run as a

single thread on 7 guest domains and the control domain |162.2 ns |---- |}

These results show that memory access times are not impacted when other domains are performing memory-bound operations.

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

6 de 21 26/04/2012 12:36 p.m.

Page 7: LDOM _ Main _ Foswiki

Network Isolation

The standard LDOM configuration will be to have 8 domains on each T5220 server. Using gigabit network connections negate the

need for individual domains to need their own dedicated network adaptors: 8 domains sharing gigabit networks should provide

suitable bandwidth for standard applications. Three physical networks are presented to the control domain: two FMA connections

and a single backup connection.

The control domain network adaptors are configured as follows to provide virtual switches to the guest Logical Domains:

{| | Underlying Interface | Network connection | Vswitch name |---- |e1000g0 |FMA network |primary-vsw |---- |e1000g1 |Backup

network |backup-vsw |---- |e1000g2 |FMA network |alternate-vsw |---- |}

Vswitches are internal to the server and provide a virtual network segment to other guest domains. Each guest domain will connect

to the vswitch via a virtual network adaptor as shown in the diagram below:

Physical vs Virtual performance

To compare network performance between the physical hardware and Logical Domains, a pair of T2000 servers were used to

generate network load using iperf. All servers are connected via a 1Gb/s switched network and all hosts are running Solaris 10 as

shown below.

For the first set of physical tests, a T5200 with 32GB of memory and 8 cores was used as a load target and the T2000s were used

to generate an increasing number of network streams for 45 second intervals.

For the second set of tests, the same T5220 was configured into 8 domains with each domain having a single core and 4GB of

memory. domain1 is a joint control and I/O domain and domains 2 through 8 are guest domains.

The following table and associated graph shows the aggregate throughput achieved when running parallel iperf streams against a

single T5220 followed by the same parallel testing against individual LDOMs on the same hardware.

{| |colspan=4 | Load generation to a physical T5220 |---- | Client load generator | Load targets | Total streams | Total

throughput in Mbit/s |---- |T2000 a |T5220 |1 |390 |---- |T2000 a |T5220 |rowspan=2;|2 |rowspan=2;|909 |---- |T2000 b |T5220 |----

|T2000 a |T5220 (2 streams) |rowspan=2;| 4 |rowspan=2;| 963 |---- |T2000 b |T5220 (2 streams) |---- |T2000 a |T5220 (3 streams)

|rowspan=2;|6 |rowspan=2;|932.9 |---- |T2000 b |T5220 (3 streams) |---- |T2000 a |T5220 (4 streams) |rowspan=2;|8

|rowspan=2;|1009.5 |---- |T2000 b |T5220 (4 streams) |---- |}

{| |colspan=4 | Load generation to parallel guest LDOMs |---- | Client load generator | Load targets | Total streams | Total

throughput in Mbit/s |---- |T2000 a |domain 8 |1 |399 |---- |T2000 a |domain 8 |rowspan=2;|2 |rowspan=2;|879 |---- |T2000 b

|domain 7 |---- |T2000 a |domain 6, domain 8 |rowspan=2;|4 |rowspan=2;|920 |---- |T2000 b |domain 5, domain 7 |---- |T2000 a

|domain 4, domain 6, domain 8 |rowspan=2;|6 |rowspan=2;|926 |---- |T2000 b |domain 3, domain 5, domain 7 |---- |T2000 a |domain

2, domain 4, domain 6, domain 8 |8 |902.7 |---- |}

These results show that the introduction of a virtualised network layer has an observable but not hugely significant impact on

network throughput. It should be noted that increased CPU utilisation will be seen on the control domain during periods of high

network activity although not enough to significantly impact other guest domains. The following graph shows average CPU

utilisation on the I/O domain when running network tests to parallel guest domains.

With only a single core to perform all network I/O, cpu utilisation runs at 45% in the Logical Domain testing rather than 7% when

running the same tests against a single OS with all 8 cores available. This discrepancy is reasonable considering the reduction on

CPU available to service the network.

Disk I/O Isolation

LDOMs provide a virtualised I/O layer across guest domains. Storage for guest domains will be SAN based and presented via a

pair of HBAs to the I/O domain. This configuration provides no single point of failure external to the server and provides for future

guest mobility.

On the I/O domain, Veritas Volume Manager is used to manage the SAN storage and volumes will be created to present to the

guests as virtual disks. The following diagram illustrates the IO configuration of a typical LDOM host

I/O Scaling

Due to the number of variables involved and differing workloads, it is challenging to provide meaningful I/O benchmarks illustrating

scalability and separation. Each application may have different I/O patterns and block sizes and both host and array caching

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

7 de 21 26/04/2012 12:36 p.m.

Page 8: LDOM _ Main _ Foswiki

influences the outcome.

To provide a simple but reasonably representative test of I/O scaling and throughput, a process is run on the guest domains to copy

a fixed size directory to local storage. The following command copies 12,271 files totalling 973MB:

cd / && tar cf - usr/lib > /opt/dump

The test is run on an increasing number of guest domains in parallel to measure the impact of concurrent I/O.

The LDOM configuration for this test is as follows:

I/O domain boots from internal storage.

10 * 20GB LUNS presented from IBM DS6000 storage to the I/O domain

SAN storage accessed via a pair of Emulex LP11000 HBAs

SAN storage managed by Veritas Volume Manager 4.1 including the IBM DS6000 Array Support Library

7 * 30GB volumes configured, each striped across all 10 LUNS.

Each volume is configured as a virtual disk to be presented to a single guest LDOM.

Guest domains were rebooted between each test to remove the effect of host based filesystem caching.

The following graph shows the total measured throughput as tests were run against an increasing number of parallel guest

domains. The dotted line shows expected throughput if scaling were to be linear:

The following graph shows aggregated I/O throughput for all active SAN devices along with an average disk busy percentage while

running the throughput tests:

This data shows that Logical Domains do not scale in a linear fashion with regards to filesystem based disk I/O but this is not due to

limitations in the underlying SAN storage.

There are a number of elements to note with regards to I/O scaling: Aside from databases and storage-centric products such as

backups, most applications are not I/O bound The throughput figures above are based on filesystem access when reading lots of

small files and is not indicative of maximum achievable throughput.

It is recommended that Logical Domains are not used to consolidate database applications. Databases are typically well

understood standardised products for which other consolidation mechanisms would be more suitable.

Logical Domain Administration

Building Logical Domains

The following prerequisites must be met to use Logical Domains within

Hardware must be a T5220

T5220 firmware version must be 7.1.1 (provided by patch 136932-01)

Solaris build must be FSOS 1.0

System must be built with volume manager 5.0 MP3

XXX GB of SAN storage must be presented.

Jumpstart will need updating to meet the following requirements for LDOMS:

Logical Domains toolset must be installed

Logical Domain Manager 1.0.3 software must be installed

Required patches 125891-01 127755-01 118833-36 124921-02 125043-01 and 127127-11 must be installed

Until jumpstart is updated, add the Logical Domains Toolset package as follows and the control domain initialisation script will add

any required patches and packages:

pkgadd -d /shared/package/location SYSldom all

There are three steps to configuring logical domains:

Converting the physical host into a control domain1.

Configuring SAN storage to be used as virtual storage for the guest2.

Configuring guest domains.3.

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

8 de 21 26/04/2012 12:36 p.m.

Page 9: LDOM _ Main _ Foswiki

The Logical Domains toolset (SYSldom) contains a suite of scripts which have been produced to automate the creation and

management of logical domains. The following sections walk through these stages.

Converting a physical host into a control domain

The first step is to convert the physical host into the primary LDOM. The primary LDOM runs the Logical Domain Manager software,

provides a shared storage and network layer and ultimately manages the guest domains. Creating the primary domain will

configure the basic daemons and virtual services required for guest domains. It will also release most of the CPUs and memory

resources back into an available pool to be used when creating guest domains.

If the server has previously been used as an LDOM, the system controller will need to be reset to a default configuration and then

power cycled. If the system isn't showing the full 32GB of memory in the banner output, it's likely to have been configured as an

LDOM in a previous life. To reset a system to factory defaults, connect to the console (ssh admin@-ilo), and run the following

commands:

sc> *bootmode config="factory-default"*

sc> *poweroff*

Are you sure you want to power off the system [y/n]? y

SC Alert: SC Request to Power Off Host.

SC Alert: Host system has shut down.

sc> *poweron*

SC Alert: Host System has Reset

To convert the physical host to a control domain, run the initialise_ldom script. The script is safe to re-run and will check whether

each stage needs to be done before making any changes. If anything fails, fix according to the alerts given and re-run the script. It

will be necessary to reboot the system on completion:

root@stella02# */opt/SYSldom/bin/initialise_ldom*

Checking prerequisites

Checking hardware is sun4v

Checking for non-virtualised OS

Checking system firmware version

Checking OS version is at least Solaris 10 11/6

Checking required packages

Installing missing packages

Adding SUNWldm.v

Adding SUNWldmib.v

Adding SUNWldlibvirt.v

Adding SUNWldvirtinst.v

Adding SUNWjass

Checking for required patches

Installing missing patches

adding 125891-01

adding 127755-01

adding 127127-11

OK - all prerequisites have been met

Executing JASS

Adjusting ldm_control-config.driver to standards

Adjusting ldm_control-hardening.driver to standards

Running JASS with ldm_control-secure.driver driver

Checking and configuring primary domain

svc:/ldoms/ldmd:default is currently in disabled state. Enabling

Creating virtual diskserver

Creating virtual console concentrator service

svc:/ldoms/vntsd:default is currently in disabled state. Enabling

Switching virtual console to listen on any address

Creating vswitches

Creating primary-vsw on e1000g0

Creating backup-vsw on e1000g1

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

9 de 21 26/04/2012 12:36 p.m.

Page 10: LDOM _ Main _ Foswiki

Creating alternate-vsw on e1000g2

All hostnames are present in the system hosts files

Switching from e1000g0 to vswitch interface with probe-based IPMP

Switching from e1000g1 to vswitch interface

Switching from e1000g2 to vswitch interface with probe-based IPMP

Configuring primary domain

Removing crypto from primary domain

Restricting primary domain to 8 virtual cpus (1 core)

Restricting primary domain memory to ~4GB

Creating initial ldom config on system controller

installing /etc/rc2.d/S99ldombringup script

Adding root cron job to save LDOM configs

Configuration changes have been made which require a reboot

Please reboot with init 6

When the host has rebooted, the available memory should have been reduced to around 4GB, only 8 cpus should be available and

when listing domains, a single "primary" domain will be shown:

root@stella02# *prtdiag | egrep Memory\ size*

Memory size: 3968 Megabytes

root@stella02# psrinfo -vp

The physical processor has 8 virtual processors (0-7)

UltraSPARC-T2 (cpuid 0 clock 1165 MHz)

root@stella02# */opt/SUNWldm/bin/ldm ls*

NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME

primary active -n-cv SP 8 3968M 1.2% 36m

To list the unallocated resources on the host, run ldm ls-devices

All LDOM services and daemons are now ready to support guest domains but no storage has yet been configured to provide virtual

storage for the guests. The storage will be configured in the following section.

Configuring virtual storage for guest domains

All guest domain storage will be SAN presented and managed by Veritas Volume Manager. The configure_ldom_storage script

will configure veritas volume manager, create the appropriate veritas disk group and prompt the user as to which disks should be

added to be used by guest LDOMS.

As for the initialise_ldom script, the script is safe to re-run and will check whether each stage needs to be done before making

any changes. If anything fails, fix according to the alerts given and re-run the script. The only input required by the user is to confirm

which SAN disks should be initialised to be used for guest LDOM storage.

The following prerequisites must be met before running the configure_ldom_storage script:

Veritas Volume Manager 5.0 must be installed

SAN storage should be presented to the host and visible in format

SAN storage should not contain valid veritas data

root@stella02# /opt/SYSldom/bin/configure_ldom_storage

Checking veritas

Volume Manager licences already enabled

Checking veritas daemons

Enabling vxvm configuration daemon

Initialising vxvm

Do you want to initialise the SAN disks for use now? For new LDOMs, answer yes.

For taking over storage from another LDOM, answer no: (y/n): y

Adding storage to vxvm

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

10 de 21 26/04/2012 12:36 p.m.

Page 11: LDOM _ Main _ Foswiki

The following disks are visibile to vxvm

IBM_DS8x000_0 auto:none - - online invalid

IBM_DS8x000_1 auto:none - - online invalid

IBM_DS60000_0 auto:none - - online invalid

IBM_DS60000_1 auto:none - - online invalid

c1t2d0s2 auto:none - - online invalid

c1t3d0s2 auto:none - - online invalid

c1t4d0s2 auto:none - - online invalid

c1t5d0s2 auto:none - - online invalid

c1t6d0s2 auto:none - - online invalid

c1t7d0s2 auto:none - - online invalid

Please enter the list of disks to use for LDOM storage

separated by spaces i.e. emcpower0s2 emcpower1s2 .....

Disk names: IBM_DS8x000_0 IBM_DS8x000_1 IBM_DS60000_0 IBM_DS60000_1

Validating disk names...

About to create stella02_ldom disk group using the following disks:

IBM_DS8x000_0 IBM_DS8x000_1 IBM_DS60000_0

IBM_DS60000_1

Are you sure (y/n): y

initialising IBM_DS8x000_0 in vxvm

initialising IBM_DS8x000_1 in vxvm

initialising IBM_DS60000_0 in vxvm

initialising IBM_DS60000_1 in vxvm

Initialising stella02_ldom disk group with IBM_DS8x000_0

Adding IBM_DS8x000_1 to stella02_ldom

Adding IBM_DS60000_0 to stella02_ldom

Adding IBM_DS60000_1 to stella02_ldom

All disks added to stella02_ldom

There is now 81906Mb available for guest LDOM storage

Once the configure_ldom_storage script has run, the control domain is fully setup and ready to create guest domains.

Creating guest domains

There are two parts required to create a guest domain. The first stage is to create a guest LDOM on the control domain and

allocate cpu, memory, network and I/O resources. The second stage is to jumpstart this virtual machine (the same as if it were a

physical host)

The standard guest configuration will be as follows:

8 virtual CPUs (one complete core)

4GB of memory

Three virtual network connections (2 FMN and 1 backup)

1 virtual console

1 * 30GB virtual disk for the OS.

To create a guest OS, the following are required:

The initialise_ldom and configure_ldom_storage must have already been run

The primary hostname, IPMP interfaces and backup interface must all be resolvable in DNS but not pingable.

To create a new guest domain, run the /opt/SYSldom/bin/add_guest menu.

<verbatim> Guest LDOM creation menu

=================================

</verbatim>

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

11 de 21 26/04/2012 12:36 p.m.

Page 12: LDOM _ Main _ Foswiki

<verbatim> h. Hostname of guest LDOM..............................undefined

c. Number of virtual cpus......................................8

m. Memory allocated in GB......................................4

r. Reset to defaults............................................

q. Quit and abandon changes.....................................

a. Add LDOM.....................................................

</verbatim>

LDOMs already configured: (primary)

56 cpu threads unallocated (8-63)

28.0 GB unallocated memory

stella02_ldom disk group is 80.0 GB with 80.0 GB free

Please make a choice:

Press the h key to enter the guest domain hostname. The menu will validate that the primary hostname, IPMP addresses and

backup interface are resolvable and not pingable. If this is true, the guest domain can be added with the a key.

CPU resources should be allocated in multiples of 8 to ensure each guest is allocated a complete core. If the number of cpus are

adjusted, the memory allocation will be updated to be half the number of cpu threads to ensure even allocation of cpu/memory

amongst guest domains.

The following shows the creation of the stella17 guest domain:

<verbatim> Guest LDOM creation menu

=================================

</verbatim>

<verbatim> h. Hostname of guest LDOM...............................stella17

c. Number of virtual cpus......................................8

m. Memory allocated in GB......................................4

r. Reset to defaults............................................

q. Quit and abandon changes.....................................

a. Add LDOM.....................................................

</verbatim>

LDOMs already configured: (primary)

56 cpu threads unallocated (8-63)

28.0 GB unallocated memory

stella02_ldom disk group is 80.0 GB with 80.0 GB free

Please make a choice: a

Validating LDOM

Stage 1/16: Executing ldm add-domain stella17

Stage 2/16: Executing ldm add-vcpu 8 stella17

Stage 3/16: Executing ldm set-mau 1 stella17

Stage 4/16: Executing ldm add-memory 4G stella17

Stage 5/16: Executing ldm add-vnet vnet_pri primary-vsw stella17

Stage 6/16: Executing ldm add-vnet vnet_bak backup-vsw stella17

Stage 7/16: Executing ldm add-vnet vnet_alt alternate-vsw stella17

Stage 8/16: Executing vxassist -g stella02_ldom make stella17.os 32g

Stage 9/16: Executing ldm add-vdsdev /dev/vx/dsk/stella02_ldom/stella17.os dev-stella17.os@primary-vds0

Stage 10/16: Executing ldm add-vdisk stella17.os dev-stella17.os@primary-vds0 stella17

Stage 11/16: Executing ldm set-variable "nvramrc=`cat /opt/SYSldom/etc/defaultnvramrc`" stella17

Stage 12/16: Executing ldm set-variable boot-device=disk stella17

Stage 13/16: Executing ldm set-variable use-nvramrc?=true stella17

Stage 14/16: Executing ldm set-variable auto-boot?=false stella17

Stage 15/16: Executing ldm bind-domain stella17

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

12 de 21 26/04/2012 12:36 p.m.

Page 13: LDOM _ Main _ Foswiki

Stage 16/16: Executing ldm start-domain stella17

LDom stella17 started

LDOM successfully created and is ready to add to be jumpstarted

Terminal server connection: stella02:5000

Press enter to continue

In the above example, a virtual terminal server connection has been created on stella02, port 5000. Connecting to this port with

telnet will bring up the familiar OBP prompt and the host can be jumpstarted in the normal manner:

root@stella02# *telnet stella02 5000*

Trying 10.60.45.104...

Connected to stella02.

Escape character is '^]'.

Connecting to console "stella17" in group "stella17" ....

Press ~? for control options ..

{0} ok *banner*

SPARC Enterprise T5220, No Keyboard

Copyright 2008 Sun Microsystems, Inc. All rights reserved.

OpenBoot 4.28.0, 4096 MB memory available, Serial #66671006.

Ethernet address 0:14:4f:f9:51:9e, Host ID: 83f9519e.

Operational requirements

Startup and shutdown

From a cold-start, the T5220 hardware will require a poweron command from the console. Once the hardware is powered on and

has completed POST, the control domain will auto-boot. By default, guest LDOMs do not auto-start. To resolve this, an /etc/rc2.d

/S99ldomstartup script is deployed when configuring the primary domain. This script will auto-start any bound domains that aren't

already running.

For shutdowns, the guest domains must be shutdown before the control domain to prevent data loss. This ordering should be

factored into any BCP or powerdown tests.

Restarting domains

When the control domain is rebooted, guest domains will freeze until the control domain comes back online. This frozen outage

period is typically around 2 minutes for a fully configured T5220 and existing network connections will not be dropped.

It is considered good practice to manually shutdown all guest domains before restarting a control domain. The ability to reboot

control domains should mainly be used to cope with a hardware fault or panic on the control domain not causing unnecessary

outage on the guest domain.

Restarting guest domains will have no impact on other domains.

Restarting daemons

The ldmd service svc:/ldoms/ldmd:default can be restarted on the primary domain at any time without impacting the guest

domains. It is designed to be a stateless service.

The Virtual console service svc:/ldoms/vntsd:default can be restarted at any time without impacting the guest domains. During

the service restart, any connected console sessions will be dropped and the user will need to reconnect.

Adding storage to a guest

If a guest domain requires more storage than the OS disk, a new virtual disk should be created and attached to the guest. Storage

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

13 de 21 26/04/2012 12:36 p.m.

Page 14: LDOM _ Main _ Foswiki

should be taken from the _ldom disk group and added following the Virtual disk? naming conventions.

The following shows a 100M virtual disk being created and attached to the stella04 guest:

root@stella01# *vxassist -g stella01_ldom maxsize*

Maximum volume size: 65755136 (32107Mb)

root@stella01# *vxassist -g stella01_ldom make stella04.dat02 100M*

root@stella01# *dd if=/dev/zero of=/dev/vx/dsk/stella01_ldom/stella04.dat02 count=1024 bs=1024*

root@stella01# *ldm add-vdsdev /dev/vx/dsk/stella01_ldom/stella04.dat02 dev-stella04.dat02@primary-vds0*

root@stella01# *ldm add-vdisk stella04.dat02 dev-stella04.dat02@primary-vds0 stella04*

Initiating delayed reconfigure operation on LDom stella04. All configuration

changes for other LDoms are disabled until the LDom reboots, at which time

the new configuration for LDom stella04 will also take effect.

Overwriting the beginning of the veritas volume is required to remove the old VTOC in case the storage is being re-used

The guest should now be rebooted. When it restarts, login and run devfsadm and the guest will be able to see a new 100M virtual

device

root@stella04# *echo | format*

Searching for disks...done

AVAILABLE DISK SELECTIONS:

0. c0d0 <SUN-DiskImage-29GB cyl 38623 alt 2 hd 1 sec 1618>

/virtual-devices@100/channel-devices@200/disk@0

1. c0d1 <SUN-DiskImage-100MB cyl 339 alt 2 hd 1 sec 600>

/virtual-devices@100/channel-devices@200/disk@1

Specify disk (enter its number): Specify disk (enter its number):

Patching implications

Each of the guest domains run an independent OS image so there is no need for guest domains to be kept at the same patch level.

The only LDOM-specific patch requirement is a few additional steps to take before patching a control domain:

All guests should be shutdown and associated change control process followed

It may be advisable to disable the /etc/rc2.d/S99ldombringup script until patching is complete to prevent guest domains

from auto-booting when the primary domain reboots.

Normal patching process on the control domain should be followed including any follow-up checks.

When patch validation is complete, re-enable the /etc/rc2.d/S99ldombringup script and run the script to restart all guest

domains.

Console access

See Physical and virtual console access?

Commissioning

See Building logical domains? and Building guests? to cover the physical process of creating the domains.

With regards to guest domains, the following information should be added to the CMDB/inventory system:

For each guest, who is the primary control domain

Virtual console server details

Decommissioning

The standard Solaris decommissioning process should be followed. Once the host has been decommissioned, running the

following commands on the control domain will replace the 'dispose hardware' stage:

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

14 de 21 26/04/2012 12:36 p.m.

Page 15: LDOM _ Main _ Foswiki

ldm stop-domain -f guestname

ldm unbind-domain guestname

ldm rm-vnet vnet_pri guestname

ldm rm-vnet vnet_bak guestname

ldm rm-vnet vnet_alt guestname

ldm rm-vdisk guestname.os guestname

vxassist -g `hostname`_ldom -rf rm guestname.os

ldm remove-domain guestname

Gotchas

The following is a list of potential issues to be aware of when running a Solaris virtualised environment:

Virtualised operating systems have a virtualised OBP environment which is initially held in memory. Once the guest OS is

loaded, the in-memory OBP is released so it isn't possible to return to the ok> prompt. To get back to the ok> prompt on

guest domains, reboot the domain and send a break before the OS starts to boot.

Guest domains must be shutdown in the normal manner before shutting down the control domain.

Although the LDOM software allows for multiple I/O domains to be created, there are hardware limitations as to why this

cannot typically be adopted:

Split PCI bus is only available on the T5x40 range of servers (and the older T2000). The T5x20 range only has a

single PCI-E root complex so I/O cannot be split.

The internal disks are only available via a single controller. Where servers allow a Split PCI bus configuration, there is

no internal storage to provide a boot environment for the domain so technologies such as SAN or iSCSI boot would

be required.

Logical Domain Naming Conventions

To enable toolset automation for LDOM provisioning and management and to provide a uniform virtualised environment, a number

of conventions have been adopted. This section details the various conventions and their usage.

Hostnames

There is no requirement to couple a logical hostname to its physical control domain. Virtual hostnames will follow the same

convention used for existing physical hosts and be allocated in the same manner. This convention is documented in

Hostname_Convention? page

Using the LDOM software, each logical domain will be named to match the hostname of the guest OS. The only exception to this is

the control domain which will be named primary. Using a generic name for the control domain is a requirement for guest mobility as

guest domains configurations use services in the format @.

With a collapsed control, I/O and service domain, all resources will reference the primary domain. By using this common reference

for the control domain, guest domains can be moved to a new host by copying the underlying virtual disk devices and importing the

guest domain configuration onto the new host.

The following output is from the primary domain stella01 which provides 7 guest domains:

root@stella01# *ldm ls*

NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME

primary active -n-cv SP 8 3968M 0.7% 2h 16m

stella03 active -n--- 5000 8 4G 0.2% 2h 7m

stella04 active -n--- 5001 8 4G 0.2% 2h 7m

stella05 active -n--- 5002 8 4G 0.2% 2h 7m

stella06 active -n--- 5003 8 4G 0.2% 2h 7m

stella07 active -n--- 5004 8 4G 0.2% 2h 7m

stella08 active -n--- 5005 8 4G 0.3% 1h 48m

stella09 active -n--- 5006 8 4G 0.2% 1h 52m

Virtual disk servers

A virtual disk server runs on the control domain to present virtual disk devices to a guest domain. Virtual disk servers are named

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

15 de 21 26/04/2012 12:36 p.m.

Page 16: LDOM _ Main _ Foswiki

-vds where is the domain providing the disk service and is a unique instance number for the disk server within the given domain.

The current configuration provides a single virtual disk server and with current hardware available and availability requirements for

virtualisation, this configuration is unlikely to change.

The list of virtual disk servers along with virtual disks provided by the virtual disk servers can be and the associated can be

retrieved by querying the services

root@stella01# *ldm list-services primary*

_<... output removed ...>_

VDS

NAME VOLUME OPTIONS DEVICE

primary-vds0 dev-stella03.os /dev/vx/dsk/stella01_ldom/stella03.os

dev-stella04.os /dev/vx/dsk/stella01_ldom/stella04.os

dev-stella05.os /dev/vx/dsk/stella01_ldom/stella05.os

dev-stella06.os /dev/vx/dsk/stella01_ldom/stella06.os

dev-stella07.os /dev/vx/dsk/stella01_ldom/stella07.os

dev-stella08.os /dev/vx/dsk/stella01_ldom/stella08.os

dev-stella09.os /dev/vx/dsk/stella01_ldom/stella09.os

Virtual disk devices

Virtual disk devices are managed by the control domain and are a way of abstracting underlying physical storage into a form that

can be presented to one or more guest domains. At the backend, virtual disk devices can be constructed in a number of ways and

have standardised on using veritas volumes.

As part of the configure_ldom_storage tool, a _ldom diskgroup will be created and all virtual disk devices will be constructed from

storage in this diskgroup.

Virtual disk devices will be named dev-. where is the name of the guest to which the storage is presented and represents the use

of the storage within the guest. This naming convention follows through to the underlying veritas volume which will be . in the _ldom

diskgroup.

As part of a standard build, each guest LDOM will only have a single virtual disk device which will have a tag of os. As additional

storage is added to a guest domain the tag datNN should be used where NN is a two digit number starting at 01.

Each virtual disk device should be a standard veritas volume created with vxassist. It is not necessary to stripe volumes across

available storage, leaving vxassist to automatically place the volume will suffice.

The following table shows the virtual disk devices listed for a guest stella04 which has an additional two data devices. Note, this

output shows which virtual disk devices are created on the control domain rather than which devices are mapped to a given guest

domain.

root@stella01# *ldm list-services primary*

_<... output removed ...>_

VDS

NAME VOLUME OPTIONS DEVICE

primary-vds0 dev-stella03.os /dev/vx/dsk/stella01_ldom/stella03.os

dev-stella04.os /dev/vx/dsk/stella01_ldom/stella04.os

dev-stella04.dat01 /dev/vx/dsk/stella01_ldom/stella04.dat01

dev-stella04.dat02 /dev/vx/dsk/stella01_ldom/stella04.dat02

dev-stella05.os /dev/vx/dsk/stella01_ldom/stella05.os

_<... output removed ...>_

Virtual disk

Virtual disks are how a virtual disk device is mapped to a given guest domain. Each virtual disk device may be mapped to multiple

guest domains, each domain having its own virtual disk entry. Although possible, shared virtual disk devices will not be

implemented at as storage sharing can be achieved at a more generic layer with technologies such as NFS.

The naming of a virtual disk will follow the naming of the underlying virtual disk device. For virtual disk devices following the dev-.@

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

16 de 21 26/04/2012 12:36 p.m.

Page 17: LDOM _ Main _ Foswiki

convention, the virtual disk on a guest domain will be . .

The following output shows the naming of two virtual disks on a guest domain stella04 and their underlying virtual disk devices:

root@stella01# *ldm ls -l stella04*

_<... output removed ...>_

DISK

NAME VOLUME TOUT DEVICE SERVER

stella04.os dev-stella04.os@primary-vds0 disk@0 primary

stella04.dat01 dev-stella04.dat01@primary-vds0 disk@1 primary

Unlike standard SCSI disks, virtual disks do not have the concept of a target number and as such, are presented to the guest

operating system in the format c?d?s? rather than c?t?d?s?. All virtual disks from a given virtual disk server will have the same

controller number with disk numbers being allocated in the order of virtual disk creation.

At present, it is not envisaged that shared virtual disk devices will be used as other technologies such as NAS provide a generic

shared storage offering which is more flexible and cross platform.

Virtual switches

Virtual switches are labelled -vsw where describes the network connection. By convention, the control domain will present three

virtual switches to match the three required network connection of a unix host. Two FMA switches will be created and a single

backup switch.

The following shows the three vswitches and their underlying network adaptors. The switches will be automatically configured as

part of the initialise_ldom script:

root@stella01# *ldm list-services primary*

_<... output removed ...>_

VSW

NAME MAC NET-DEV DEVICE MODE

primary-vsw 00:14:4f:d3:db:32 e1000g0 switch@0

backup-vsw 00:14:4f:d3:db:33 e1000g1 switch@1

alternate-vsw 00:14:4f:d3:db:34 e1000g2 switch@2

_<... output removed ...>_

On the control domain, the vswitches are presented as vswN network adaptors where N matches the instance number in the device

column. For example, the backup network would be presented as vsw1. The following shows a typical network configuration on the

control domain with IPMP configured across the primary and alternate vswitches:

root@stella01# ifconfig -a

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

vsw0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2

inet 10.61.83.80 netmask ffffff00 broadcast 10.61.83.255

groupname MAIN

ether 0:14:4f:d3:db:32

vsw0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2

inet 10.61.83.81 netmask ffffff00 broadcast 10.61.83.255

vsw1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3

inet 10.61.17.72 netmask ffffe000 broadcast 10.61.31.255

ether 0:14:4f:d3:db:33

vsw2: flags=69040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 1500 index 4

inet 10.61.83.82 netmask ffffff00 broadcast 10.61.83.255

groupname MAIN

ether 0:14:4f:d3:db:34

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

17 de 21 26/04/2012 12:36 p.m.

Page 18: LDOM _ Main _ Foswiki

Virtual networks

Virtual network adaptors are created on a guest domain and connected to a vswitch. Virtual networks will be labelled vnet_ where

is a three letter code to match the virtual switch name.

The following shows the three virtual network devices for a given client and the vswitches they connect into:

root@stella01# *ldm ls -l stella04*

_<... output removed ...>_

NETWORK

NAME SERVICE DEVICE MAC

vnet_pri primary-vsw@primary network@0 00:14:4f:f8:ad:46

vnet_bak backup-vsw@primary network@1 00:14:4f:f8:4d:64

vnet_alt alternate-vsw@primary network@2 00:14:4f:fb:f5:67

_<... output removed ...>_

At a guest operating system level, the virtual networks will be presented as vnet devices with the device instance number matching

the DEVICE column in the ldm ls output:

root@stella04# grep network@ /etc/path_to_inst

"/virtual-devices@100/channel-devices@200/network@0" 0 "vnet"

"/virtual-devices@100/channel-devices@200/network@1" 1 "vnet"

"/virtual-devices@100/channel-devices@200/network@2" 2 "vnet"

Tying all the network layers together produces the following stack:

The control domain provides a backup-vsw@primary vswitch connected to the e1000g1 network adaptor.

On the control domain, the backup-vsw@primary vswitch is presented as vsw1 and configured as a network adaptor

A vnet_bak virtual network device is created for the stella04 guest. This virtual network is connected to the backup-

vsw@primary vswitch

a On the guest domain, a vnet1 device is configured as a network adaptor.

Supporting Technologies

Built automation - jumpstart

Solaris system provisioning is currently achieved via a combination of Jumpstart, post-build install scripts and manual configuration

tasks. This process can be time consuming and error prone and should be reviewed as part of the Solaris Virtualisation program.

The driver for reviewing jumpstart is to provide a timely, repeatable and robust Solaris provisioning environment. One of the drivers

of a virtualised operating environment is the ability to reduce provisioning time and moving to a virtualised Solaris offering will be

hampered without a timely and automated Solaris provisioning environment.

The following are high level Solaris build requirements that need to be met by an automated Solaris build environment. It is not

considered exhaustive and the first point is expected to expand to a large collection of requirements in itself:

All automation should be managed within jumpstart: boot net - install should provide an application-ready system.

Standard hardware layout for each server type

For security reasons, a minimal Solaris install should be done rather than all package

The use of Logical Domains introduces the following requirements of the jumpstart infrastructure:

Automatic system patching should include Solaris and OBP patch requirements for LDOM 1.0.3

For physical T5220 installs, the LDOM 1.0.3 software packages should be automatically installed along with the LDOM tools

package.

The jumpstart configuration should support smaller disk sizes as presented by guest domains.

For guest domains, no root disk mirroring should be performed - virtual devices are already highly available at a control

domain/SAN layer.

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

18 de 21 26/04/2012 12:36 p.m.

Page 19: LDOM _ Main _ Foswiki

For guest domains, IPMP should be configured across vnet devices rather than e1000g devices.

Monitoring

Solaris monitoring is currently managed with HP Openview toolset. There are concerns within the operational team that the current

monitoring configuration does not fully monitor the Solaris operating environment.

The monitoring configuration should be reviewed to ensure that the physical device, operating system, network and storage layers

are all sufficiently monitored.

On top of the base Solaris monitoring requirements, Logical domains introduces a number of monitoring requirements in terms of

both guests and control domains. It is expected that Logical Domain monitoring is automatically included in the Solaris monitoring

template.

The following are the monitoring requirement for logical domains:

A control domain can automatically be detected by the presence of the /opt/SUNWldm/bin/ldm binary

A guest domain can automatically be detected by the presence of the '/virtual-devices.*"vnet"' pattern in

/etc/path_to_inst

For control domains, three SMF services must be in an online state:

svc:/platform/sun4v/drd:default

svc:/ldoms/ldmd:default

svc:/ldoms/vntsd:default

For guest domains, if the standard monitoring raises alerts on unmirrored disks, this should be disabled.

Capacity Management

At a basic level, the capacity requirements for Logical Domains are the same as for physical servers. Core resources such as cpu,

memory, network and I/O must be recorded to identify over/under utilisation of resources along with trending when resource

thresholds may be breached in the future.

With regards to cpu and memory resources, both are hard allocated resources for a guest domain - there is no sharing between

guests. As such, standard capacity management tools can be loaded onto the guest OS and will accurately report capacity data.

Capacity management for disk resources splits into two areas: utilisation and performance.

Storage utilisation should be captured and trended to predict when storage allocations will fill up. From a guest operating system

layer, storage is presented and utilised in the same way as for physical systems so standard capacity management tools will be

able to record and trend this data.

Gathering storage performance information at a guest domain level is not currently possible as the virtual disk driver does not

measure I/O activity or save kstats which could subsequently be read by the iostat command. This issue is being tracked as Sun

Bug ID 6503157. Storage performance monitoring will need to be implemented on the control domain where the underlying physical

storage resides. The use of veritas volumes to provide virtual storage for guest domains will assist the identification of heavy disk

activity as per-guest I/O counters will be available via the vxstat command on the control domain.

As for storage, capacity management for network resources splits into utilisation and performance. The virtual network drivers on

the guest domain expose performance counters meaning that standard capacity management tools loaded on the guest OS will

accurately report capacity data. As the network layer is shared between all guest domains, network utilisation should also be

measured on the control domain.

The measuring of performance data against another host highlights an area where an accurate and automated inventory system is

required to group guest and control domains together to assist the system administrator.

Backup and recovery

Veritas NetBackup? 6.0 is the standard backup and recovery platform within . LDOMs are supported by Symantec as detailed in

the NetBackup 6 OS Compatibility matrix

Both control and guest domains should be added to NetBackup? in the standard manner. Recovery scenarios and extra steps

required for logical domains are detailed in the Logical Domain Administration section of this document.

Capacity Planning

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

19 de 21 26/04/2012 12:36 p.m.

Page 20: LDOM _ Main _ Foswiki

Inventory

Outstanding work

The following list of related work streams that need to be completed to allow the successful deployment of logical domains within

fidelity:

Jumpstart

Monitoring

Inventory

Capacity planning

Operational readiness, training and handover

Tech questions

How much memory do we need per guest? Java can only use 2GB so we may only need to buy 16GB systems.

What size SAN luns do we use? 50GB are probably portable enough...

What size OS disks do we need? Talk to BAU to get indicative sizes.

Any additional storage

SYSldom package to be created and automatically be delivered via jumpstart

Background and further reading

Jeff Savit's LDOMs Concepts and Examples is a good primer explaining the concepts of Logical Domains.

After reading the presentation, Beginners guide to LDoms is a walkthrough on how to configure LDOMs. This document refers to

LDOMS 1.0 rather than 1.0.3 so some of the command output differs slightly and 1.0.3 introduces a some new capabilities (such as

being able to use veritas volumes as virtual storage)

An alternative set of walkthrough documents Is Octave Orgeron's series of articles: An Introduction to Logical Domains Part 1, Part

2, Part 3 and Part 4

Finally, the Logical Domains 1.0.3 Administration Guide and associated Release Notes are the definitive and up-to-date reference

guides.

Appendix

cpusin.master

#!/bin/sh

MAXCHILD=64

if [ -f /tmp/stop ]; then

echo "`date '+%H:%M:%S'` Killing children on `hostname`..."

KIDS=`ps -ef | egrep '[s]pin.child' | awk '{print $2}'`

echo $KIDS | xargs -n 1 kill -9

rm /tmp/stop

exit

else

echo "`date '+%H:%M:%S'` Starting to spawn cpu exercising scripts on `hostname`"

fi

while [ 1 ]

do

CUR=`ps -ef | egrep '[c]puspin.child' | wc -l`

if [ ${CUR} -lt ${MAXCHILD} ]; then

nohup ./cpuspin.child > /dev/null 2>&1 &

else

echo "`date '+%H:%M:%S'` There are now ${MAXCHILD} \c"

echo "cpu exercising scripts running on `hostname`"

break

fi

done

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

20 de 21 26/04/2012 12:36 p.m.

Page 21: LDOM _ Main _ Foswiki

cpuspin.child

#!/bin/sh

while [ 1 ]

do

:

done

Attachments 8

Topic revision: r1 - 19 Apr 2011 - 13:12:51 - AnthonyBennett

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing

authors.

Ideas, requests, problems regarding Foswiki? Send feedback

LDOM < Main < Foswiki http://www.sunsa.co.uk/Main/LDOM

21 de 21 26/04/2012 12:36 p.m.