Copyright 2009 Peter Baer Galvin - All Rights Reserved Solaris 10 Administration Topics Workshop 2 - Virtualization By Peter Baer Galvin For Usenix Last Revision Apr 2009 Saturday, May 2, 2009
Dec 13, 2014
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Solaris 10 Administration Topics Workshop2 - Virtualization
By Peter Baer Galvin
For UsenixLast Revision Apr 2009
Saturday, May 2, 2009
Copyright 2008 Peter Baer Galvin - All Rights Reserved
About the Speaker
Peter Baer Galvin - 781 273 4100
www.cptech.com
My Blog: www.galvin.info
Bio
Peter Baer Galvin is the Chief Technologist for Corporate Technologies, Inc., a leading systems integrator and VAR, and was the Systems Manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines. He was contributing editor of the Solaris Corner for SysAdmin Magazine , wrote Pete's Wicked World, the security column for SunWorld magazine, and Pete’s Super Systems, the systems administration column there. He is now Sun columnist for the Usenix ;login: magazine. Peter is co-author of the Operating Systems Concepts and Applied Operating Systems Concepts texbooks. As a consultant and trainer, Mr. Galvin has taught tutorials in security and system administration and given talks at many conferences and institutions.
2
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ObjectivesCover a wide variety of topics in Solaris 10
Useful for experienced system administrators
Save time
Avoid (my) mistakes
Learn about new stuff
Answer your questions about old stuff
Won't read the man pages to you
Workshop for hands-on experience and to reinforce concepts
Note – Security covered in separate tutorial
3
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
More Objectives
What makes novice vs. advanced administrator?
Bytes as well as bits, tactics and strategy
Knows how to avoid trouble
How to get out of it once in it
How to not make it worse
Has reasoned philosophy
Has methodology
4
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Prerequisites
Recommend at least a couple of years of Solaris experience
Or at least a few years of other Unix experience
Best is a few years of admin experience, mostly on Solaris
5
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
About the Tutorial
Every SysAdmin has a different knowledge set
A lot to cover, but notes should make good reference
So some covered quickly, some in detail
Setting base of knowledge
Please ask questions
But let’s take off-topic off-line
Solaris BOF6
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Fair WarningSites vary
Circumstances vary
Admin knowledge varies
My goals
Provide information useful for each of you at your sites
Provide opportunity for you to learn from each other
7
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Why Listen to Me
8
20 Years of Sun experienceSeen much as a consultantHopefully, you've used:
My Usenix ;login: column
The Solaris Corner @ www.samag.com
The Solaris Security FAQ
SunWorld “Pete's Wicked World”
SunWorld “Pete's Super Systems”
Unix Secure Programming FAQ (out of date)
Operating System Concepts (The Dino Book), now 8th ed
Applied Operating System Concepts
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Slide Ownership
As indicated per slide, some slides copyright Sun Microsystems
Thanks to Jeff Victor for input
Feel free to share all the slides - as long as you don’t charge for them or teach from them for fee
9
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
OverviewLay of the Land
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Schedule
11
Times and Breaks
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Coverage
Solaris 10+, with some Solaris 9 where needed
Selected topics that are new, different, confusing, underused, overused, etc
12
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Outline
Overview
Objectives
Virtualization choices in Solaris
Zones / Containers
LDOMS and Domains
Virtualbox
Xvm (aka Xen)
13
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Polling Time
Solaris releases in use?
Plans to upgrade?
Other OSes in use?
Use of Solaris rising or falling?
SPARC and x86
OpenSolaris?
14
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Your Objectives?
15
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Your Lab Environment
Apple Macbook Pro
3GB memory
Mac OS X 10.4.10
VMware Fusion 1.0
Solaris Nevada
50 Containers
16
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Lab PreparationHave device capable of telnet on the USENIX network
Or have a buddy
Learn your “magic number”
Telnet to 131.106.62.100+”magic number”
User “root, password “lisa”
It’s all very secure
17
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Lab Preparation
Or...
Use virtualbox
Use your own system
Use a remote machine you have legit access to
18
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Lab Preparation
Or...
Use virtualbox
Use your own system
Use a remote machine you have legit access to
19
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Choosing Virtualization Technologies
(See separate “virtualization comparison” document)
20
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 21!"#$%&'()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778
!"#$%&'()*"+,(-+*(.#&!/01*)"2
!"#$%&#'()*+,(*%-.#'()*
/012(301$%$%4-, 5%1$"0#(!067%-',)*(5%1$"0#%80$%4-9',4"16'(!0-0.':'-$
;<-0:%6(*<,$':;4:0%-,
*4#01%,(=4-$0%-'1,
>?4-',(@(*9!A
=4-$0%-'1,(B41(C%-"D
*4#01%,(E(=4-$0%-'1,
*4#01%,(F(=4-$0%-'1,
*4#01%,(9',4"16'!0-0.'1(>*9!A
G(H-(*4#01%,(IJK
C4.%60#(;4:0%-,
*"-(D5!
L'-
5!M01'
/<&'1N5
O1'-2($4(B#'D%P%#%$< O1'-2($4(%,4#0$%4-
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 22!"#$%&'()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778
!"#$%&'&()*+,""-*+.&-/
! !"#$%&'()"*+$&*,%'-" 9-:"'-*$;-(#-<$&#*,1#'-*=$.-.)(+$>)),0(&#,=$?)(;<)1:@:(&A-#$3/B$",&<&C1,&)#=$D!$.1#14-.-#,$')*,*=$>&#-@4(1&#-:$*-'"(&,+
" !<-@;-(#-<=$5-,-()4-#-)"*$100<&'1,&)#$-#A&()#.-#,*
! ./*$0&1(!/'+,0'(."0$&*'-" %1E&.&C-*$51(:?1(-$&*)<1,&)#
! 2"3&1$#(."0$&*'4(5&%+6$#(7$18&*,'-" %"<,&0<-$;-(#-<*=$>"<<$D!$-#A&()#.-#,*=$5-,-()4-#-)"*
! F-'5#)<)4&-*$1(-$').0<-.-#,1(+
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 23!"#$%&'()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778
!"#$%&#'()*+(),()*-.)/"#$.0#/.12!"#$%&'()"*+$&*,%'($*-(.&%+/$#(0$12&*,'
!13#.2*4*&!13*4*5"(6/!13#.2*4*&!13*4*5"(6/ !137!137
8139"/()8139"/()
812/#.2()*:812/#.2()*: 812/#.2()*7812/#.2()*7 812/#.2()*;812/#.2()*;
812/#.2()*<812/#.2()*< 812/#.2()*=812/#.2()*=
345
!/*(3.0
!678)()09
:;"<'!678)
;=*$<&1(;"<$&*'
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones, Containers, and LDOMS
24
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Overview
Cover details and use of Zones/Containers and LDOMS
Note that Xen (x64 only) and Virtualbox (open source x64 only) are coming
No slides yet
25
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones OverviewThink of them of chroot on steroids
Virtualized operating system services
Isolated and “secure” environment for running apps
Apps and users (and superusers) in zone cannot see / effect other zones
Delegated admin control
Virtualized device paths, network interfaces, network ports, process space, resource use (via resource manager)
Application fault isolation
Detach and attach containers between systems
Cloning of a zone to create identical new zone26
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones Overview - 2Low physical resource use
Up to 8192 zones per system!
Differentiated file system
Multiple versions of an app installed and running on a given system
Inter-zone communication is only via network (but short-pathed through the kernel
No application changes needed – no API or ABI
Can restrict disk use of a zone via the loopback file driver (lofi) using a file as a file system
Can dedicate an Ethernet port to a zone
Allowing snooping, firewalling, managing that port by the zone
27
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Other Virtualization OptionsMany virtualization options to consider
Containers is just one of them
Xen (xVM) - being integrated into Solaris Nevada
Run other OSes (linux, win) with S10+ has the host
Industry semi-standard
Para-virtualization, x86 only
LDOMs - hard partitions, shipped in May 2007
Run multiple copies of Solaris on the same coolthreads chip (Niagara, Rock in the future)
Some resource management - move CPUs and mem
VMWare - solaris as a guest, not a host so far, x86 only
Traditional Sun Domains - SPARC only, Enterprise servers only
28
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 29
!"#$%&'()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778
!"#$%&'()"*+'
!"#$%&'()"*+,"
-./"01
2#%&34"(,%/56"7
!"#$%"(8%!"(-*9:;0<&%%/=<&3,'9:<:>(9:?@AB@C@:C1
&'$(8%!"
8%!"(&%%#D(E8%!"E$"F
,&G5#%(5&%H",#-2261
&"/%#"(3)/+!E/%!+#%&+!4
-IJKLM(IN!KOM(PQRK1
563#S%&/(3)/+!+2#&3#+%!
-2G2"*"!#)M()"*S23)/M(+S,%!S+4M@@@1
5&%7G(5&%H",#-5&%7G1
,%&"(2"&*+,"2
-+!"#)M(&5,F+!)M(22.)M(@@@1
!"#$%&'()"*+,"
-,"01
T556+,3#+%!
R!*+&%!/"!#
U+&#N36
L63#S%&/
./"0D:
,"0D:
8,%!2
EN2&
8%!"3)/)
(%)%$%*'(8%!"8%!"(&%%#D(E8%!"E/G2V6
)F3(N2"&2(5&%H-2.M(F32.M(5&2#3#1
2G2#"/(5&%H",#-+!"#)M(22.)1
./"0D=
,"0D=
8,%!2
EN2&
!"#$%&'()"*+,"
-,"91
!"#$%"(8%!"(&%%#D(E(
3N)+#(2"&*+,"2-3N)+#)1
2",N&+#G(2"&*+,"2-6%4+!M(QIK1
,"0
,"9
,%!2%6"
EN2&
2G2#"/(2"&*+,"2-53#&%61
./"0
/G2V6(5&%H",#-/G2V6)1
$"F(2"&*+,"(5&%H",#-T53,."(9@=@::1
355(N2"&2(5&%H-2.M(F32.M(5&2#3#1
H"2(5&%H",#-H:2"1
2G2#"/(5&%H",#-+!"#)M(22.)1
./"0D9
,"0D9
8,%!2
EN2&
%++,*'-.'-(8%!"
8%!"(&%%#D(E8%!"E355
8%!"3)/)8%!"3)/)
8%!"(/3!34"/"!#
-8%!",S4M(8%!"3)/M(86%4+!1
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved(From the Solaris 10 Sun Net Talk about Solaris 10 Security)
30
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Limits
Only one OS installed on a system
One set of OS patches
Only one /etc/system
Although Sun working to move as many settings as possible out of /etc/system
System crash / OS crash -> all zones crash
Each (sparse) zone uses
~ 100MB of disk
some VM and physical memory (for processes and daemons running in the zone) - ~40MB of physical memory
31
Saturday, May 2, 2009
Sparse vs. Whole Root ZoneSparse
Loop-back mount of system directories (/usr, etc)
Little disk space use
Each zone shares global-zone system-binaries -> shared memory
Apps may not be supported
Cannot change system files
Inter-zone communication only via network
Whole-Root
Full install of all system files
Lots of disk space
Each binary independent -> memory use
Apps may not be supported (but more likely)
Can change system files
Inter-zone communication only via network
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved!"#$%&'()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778
!"#$%&'($%)*+,$-+
!"#$%"&'##(&)
)*#+,-
. / 0
).-' )/,0&&&111&&&1111&&1111
)$2+ ).-' )/,0
3#+,&'##(4&) 3#+,&02,5
!"#$%"&02,5
,(6111
111&&&&1111&&&&1111
9)#-:
3#+,&'##(4&)*#+,-)*#+,7
33
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 34!"#$%&'()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778
!"#$%&'($%)*+,$-+.%)/01+$23"",
!"#$%"&'##(&)
4 5 6
)*+'
)$,- )*+' )./0
1#-/&0,/2
!"#$%"&0,/2
/(3444
444&&&&4444&&&&4444
9)#-$:
56
1#-/&'##(7&)8#-/+)8#-/9
1#-/&'##(7&)
)./0&&&444&&&4444&&4444)8#-/+
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Global ZoneAka the usual system
Global Is assigned ID 0 by the system
Provides the single instance of the Solaris kernel that is bootable and running on the system
Contains a complete installation of the Solaris system software packages
Can contain additional software packages or additional software, directories, files, and other data not installed through packages
35
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Global Zone - 2Provides a complete and consistent product database that contains information about all software components installed in the global zone
Holds configuration information specific to the global zone only, such as the global zone host name and file system table
Is the only zone that is aware of all devices and all file systems
36
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Global Zone - 3Is the only zone with knowledge of non-global zone existence and configuration
Is the only zone from which a non-global zone can be configured, installed, managed, or uninstalled
Can see the file systems of the non-global zones (i.e. can copy files into the non-global zone roots for the non-global zones to see
37
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Non-global ZonesNon-Global Is assigned a zone ID by the system when the zone is booted
Shares operation under the Solaris kernel booted from the global zone
Contains an installed subset of the complete Solaris Operating System software packages
Contains Solaris software packages shared from the global zone (“sparse zone”)
Can contain additional installed software packages not shared from the global zone
38
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Non-global Zones -2
Can contain additional software, directories, files, and other data created on the non-global zone that are not installed through packages or shared from the global zone
Has a complete and consistent product database that contains information about all software components installed on the zone, whether present on the non-global zone or shared read-only from the global zone Is not aware of the existence of any other zones
Cannot install, manage, or uninstall other zones, including itself
Has configuration information specific to that non-global zone only, such as the non-global zone host name and file system table
39
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
“Sparse” and “Whole Root” ZonesBy default /lib, /platform, /sbin, /usr are LOFS read-only mounted from global zone into child zone
Ergo those can’t be modified by child zone
Packages installed in child zone only install non (/lib, /platform, /sbin, /usr) components into the child zone’s file systems
Saves disk space
Saves memory
Whole root zone removes those mountsPackages install entirely
Ergo child zone can modify its /lib, /platform, /sbin, /usr
Some apps not supported in zones, some only in whole root, some in sparse root
Per app check with app vendor!
Note that ZFS clone use for zone builds may mean that sparse root is no longer useful!
40
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Non-global Zone StatesConfigured - The zone’s configuration is complete and committed to
stable storage, not initially booted
Incomplete - During an install or uninstall operation
Installed - The zone’s configuration is instantiated on the system but no virtual platform. Files copied into zoneroot.
Ready - The virtual platform for the zone is established. The kernel creates the zsched process, network interfaces are plumbed, file systems are mounted, and devices are configured. A unique zone ID is assigned by the system, no processes associated with the zone have been started.
Running - User processes associated with the zone application environment are running.
Shutting down and Down - These states are transitional states that are visible while the zone is being halted. However, a zone that is unable to shut down for any reason will stop in one of these states.
41
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved(From System Administration Guide: N1Grid Containers, Resource Management, and Solaris Zones)
42
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone boot
Note that zoneadm allows “boot” “reboot” “halt” and “shutdown”. Only “shutdown” and “boot” execute the smf commands
Also note that there are many options to these commands (such as zoneadm boot -- - m verbose)
43
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone ConfigurationData from the following are not referenced or copied when a zone is installed:
Non-installed packages
Patches
Data on CDs and DVDs
Network installation images
Any prototype or other instance of a zone
In addition, the following types of information, if present in the global zone, are not copied into a zone that is being installed:
New or changed users in the /etc/passwd file
New or changed groups in the /etc/group file
Configurations for networking services such as DHCP address assignment, UUCP, or sendmail
Configurations for network services such as naming services
New or changed crontab, printer, and mail files
System log, message, and accounting files
44
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Configuration
zlogin –C logs in to a just-boot virgin zoneOnly root can zlogin – normal zone access is via network
The usual sysidconfig questions are asked (hostname, name service, timezone, kerberos)
The zone root directory must exist prior to zone installation
Zone reboots to put configuration changes into effect (a few seconds)
Messages look like a system reboot (within your window)
45
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
sysidcfgCreate to shorten first boot questions
File gets copied into <zonehome>/root/etc
Sample contents:
name_service=DNS
{domain_name=petergalvin.infoname_server=63.240.76.19search=arp.com}
network_interface=PRIMARY{hostname=zone00.petergalvin.info}
timezone=US/Easternterminal=vt100system_locale=C
timeserver=localhost
root_password=aMG0YPkgZQPqo <obviously change this>
security_policy=NONE
nfsv4_domain=dynamic46
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Configuration - 2# zonecfg -z app1
app1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:app1> create
zonecfg:app1> set zonepath=/opt/zone/app1
zonecfg:app1> set autoboot=false
zonecfg:app1> add net
zonecfg:app1:net> set physical=pnc0
zonecfg:app1:net> set address=192.168.118.140
zonecfg:app1:net> end
zonecfg:app1> add fs
zonecfg:app1:fs> set dir=/export/home
zonecfg:app1:fs> set special=/export/home
zonecfg:app1:fs> set type=lofs
zonecfg:app1:fs> end
zonecfg:app1> add inherit-pkg-dir
zonecfg:app1:inherit-pkg-dir> set dir=/opt/sfw
zonecfg:app1:inherit-pkg-dir> end
zonecfg:app1> verify
zonecfg:app1> commit
zonecfg:app1> exit
47
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Configuration - 3# df -k
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0d0s0 5678823 2689099 2932936 48% /
/devices 0 0 0 0% /devices
/dev/dsk/c0d0p0:boot 10296 1401 8895 14% /boot
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
fd 0 0 0 0% /dev/fd
swap 600780 28 600752 1% /var/run
swap 600776 24 600752 1% /tmp
/dev/dsk/c0d0s7 4030684 32853 3957525 1% /export/home
# zoneadm -z app1 verify
WARNING: /opt/zone/app1 does not exist, so it cannot be verified.
When 'zoneadm install' is run, 'install' will try to create
/opt/zone/app1, and 'verify' will be tried again,
but the 'verify' may fail if:
the parent directory of /opt/zone/app1 is group- or other-writable
or
/opt/zone/app1 overlaps with any other installed zones.
could not verify net address=192.168.118.140 physical=pnc0: No such device or address
zoneadm: zone app1 failed to verify
48
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Configuration - 4# ls -l /opt/zonetotal 2drwx------ 4 root other 512 Aug 21 12:44 test# mkdir /opt/zone/app1# chmod 700 /opt/zone/app1# ls -l /opt/zonetotal 4drwx------ 2 root other 512 Sep 16 15:14 app1drwx------ 4 root other 512 Aug 21 12:44 test# zonadm -z app1 verifycould not verify net address=192.168.118.140
physical=pnc0: No such device or addresszoneadm: zone app1 failed to verify# zonecfg -z app1zonecfg:app1> infozonepath: /opt/zone/app1autoboot: false
49
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Configuration - 5net:
address: 192.168.118.140
physical: pnc0
zonecfg:app1> remove physical=pnc0
zonecfg:app1> add net
zonecfg:app1:net> set physical=pcn0
zonecfg:app1:net> set address=192.168.118.140
zonecfg:app1:net> end
zonecfg:app1> exit
# zoneadm -z app1 verify
# zoneadm -z app1 install
Preparing to install zone <app1>.
Creating list of files to copy from the global zone.
Copying <2199> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <779> packages on the zone.
Initializing package <0> of <779>: percent complete: 0%. . .
50
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Configuration -6Zone <app1> is initialized.
The file </opt/zone/app1/root/var/sadm/system/logs/install_log> contains a log of the zone installation.
# zoneadm list -v
ID NAME STATUS PATH
0 global running /
1 test running /opt/zone/test
# df -k
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0d0s0 5678823 2766177 2855858 50% /
/devices 0 0 0 0% /devices
/dev/dsk/c0d0p0:boot 10296 1401 8895 14% /boot
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
fd 0 0 0 0% /dev/fd
swap 594332 32 594300 1% /var/run
swap 594500 200 594300 1% /tmp
/dev/dsk/c0d0s7 4030684 32853 3957525 1% /export/home51
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Configuration -7# zoneadm -z app1 boot
zoneadm: zone 'app1': WARNING: pcn0:2: no matching subnet found in netmasks(4) for 192.168.118.131; using default of 192.168.118.131.
# zoneadm list -v
ID NAME STATUS PATH
0 global running /
1 test running /opt/zone/test
2 app1 running /opt/zone/app1
# telnet 192.168.118.140
Trying 192.168.118.140...
telnet: Unable to connect to remote host: Connection refused
# zlogin -C app1
[Connected to zone 'app1' console]
Select a Locale
0. English (C - 7-bit ASCII)
1. U.S.A. (UTF-8)
2. Go Back to Previous Screen
Please make a choice (0 - 2), or press h or ? for help: 0
. . .
52
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Configuration -8rebooting system due to change(s) in /etc/default/init
[NOTICE: Zone rebooting]
SunOS Release 5.10 Version s10_63 32-bit
Copyright 1983-2004 Sun Microsystems, Inc. All rights reserved.Use is subject to license terms.Hostname: zone-app1
The system is coming up. Please wait.starting rpc services: rpcbind done.
syslog service starting.Sep 16 15:48:24 zone-app1 sendmail[7567]: My unqualified host
name (zone-app1) unknown; sleeping for retry
Sep 16 15:49:24 zone-app1 sendmail[7567]: unable to qualify my own domain name (zone-app1) -- using short name
WARNING: local host name (zone-app1) is not qualified; see cf/README: WHO AM I?
/etc/mail/aliases: 12 aliases, longest 10 bytes, 138 bytes total
53
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Configuration -9Creating new rsa public/private host key pairCreating new dsa public/private host key pairThe system is ready.zone-app1 console login: rootPassword: Sep 16 15:51:08 zone-app1 login: ROOT LOGIN /dev/consoleSun Microsystems Inc. SunOS 5.10 s10_63 May 2004# cat /etc/passwdroot:x:0:1:Super-User:/:/sbin/shdaemon:x:1:1::/:bin:x:2:2::/usr/bin:. . .noaccess:x:60002:60002:No Access User:/:nobody4:x:65534:65534:SunOS 4.x NFS Anonymous Access
User:/:
54
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Configuration -10# useradd -u 101 -g 14 -d /export/home/pbg -s /bin/bash
pbg# passwd pbgNew Password: Re-enter new Password: passwd: password successfully changed for pbg# zoneadm list -v ID NAME STATUS PATH 3 app1 running / # exitzone-app1 console login: ~.[Connection to zone 'app1' console closed]
55
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Configuration - 11
# zoneadm list -v
ID NAME STATUS PATH
0 global running / 1 test running /opt/zone/test 3 app1 running /opt/zone/app1
# uptime 3:53pm up 5:14, 1 user, load average: 0.23, 0.34, 0.43
# telnet 192.168.118.140Trying 192.168.118.140…Connected to 192.168.118.140.
Escape character is ‘^]’.Login: pbg
Password:
56
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and ZFSInstalling a zone with its root on ZFS is not supported as the system then lacks the ability to be upgraded.Note that “add fs” can be used to add access to a ZFS file system to a zone
Beyond that, “add dataset” delegates a ZFS file system to a zone, removes it from the global zone
The zone can manage the file system, except where management would effect other file systems / parent file system
Filesystem contents can still be seen from global zone via zonepath+mountpoint (i.e. /zones/zone00/zfs/zonefs/zone00)
# zfs create zfs/zonefs/zone00# zonecfg -z zone00zonecfg:zone00> add datasetzonecfg:zone00:dataset> set name=zfs/zonefs/zone00zonecfg:zone00:dataset> end
57
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Scriptcreate -bset zonepath=/opt/zones/zone0set autoboot=falseadd inherit-pkg-dirset dir=/libendadd inherit-pkg-dirset dir=/platformendadd inherit-pkg-dirset dir=/sbinend
58
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Script
add inherit-pkg-dirset dir=/usrend
add inherit-pkg-dirset dir=/opt/sfw
endadd netset address=192.168.128.200
set physical=pcn0end
add rctlset name=zone.cpu-sharesadd value (priv=privileged,limit=1,action=none)
end
59
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Life in a Zone# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
lo0:1: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
zone test
inet 127.0.0.1 netmask ff000000
lo0:2: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
zone app1
inet 127.0.0.1 netmask ff000000
pcn0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index 2
inet 192.168.80.128 netmask ffffff00 broadcast 192.168.80.255
ether 0:c:29:44:a9:df
pcn0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
zone test
inet 192.168.80.139 netmask ffffff00 broadcast 192.168.80.255
pcn0:2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
zone app1
inet 192.168.80.140 netmask ffffff00 broadcast 192.168.80.255
60
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Life in a Zone - 2$ telnet 192.168.80.140
. . .
$ df -k
Filesystem kbytes used avail capacity Mounted on
/ 9515147 1894908 7525088 21% /
/dev 9515147 1894908 7525088 21% /dev
/export/home 10076926 10369 9965788 1% /export/home
/lib 9515147 1894908 7525088 21% /lib
/platform 9515147 1894908 7525088 21% /platform
/sbin 9515147 1894908 7525088 21% /sbin
/usr 9515147 1894908 7525088 21% /usr
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
fd 0 0 0 0% /dev/fd
swap 1043072 16 1043056 1% /var/run
swap 1043056 0 1043056 0% /tmp
$ touch /usr/foo
touch: /usr/foo cannot create
Note that virtual memory (and therefore swap) are global resources
61
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Life in a Zone - 3$ ps -ef
UID PID PPID C STIME TTY TIME CMD
root 11120 11120 0 11:00:35 ? 0:00 zsched
pbg 11377 11347 0 11:01:28 pts/8 0:00 ps -ef
root 11229 11120 0 11:00:40 ? 0:00 /usr/sbin/cron
root 11341 11120 0 11:00:46 ? 0:00 /usr/sfw/sbin/snmpd
root 11266 11120 0 11:00:41 ? 0:00 /usr/lib/im/htt -port 9010 -s
yslog -message_locale C
root 11339 11336 0 11:00:46 ? 0:00 /usr/lib/saf/ttymon
root 11250 11120 0 11:00:41 ? 0:00 /usr/lib/utmpd
root 11264 11261 0 11:00:41 ? 0:00 /usr/sadm/lib/smc/bin/smcboot
root 11261 11120 0 11:00:41 ? 0:00 /usr/sadm/lib/smc/bin/smcboot
root 11227 11120 0 11:00:40 ? 0:00 /usr/sbin/nscd
root 11218 11120 0 11:00:40 ? 0:00 /usr/lib/autofs/automountd
root 11325 11120 0 11:00:45 ? 0:00 /usr/lib/dmi/snmpXdmid -s zon
e-app1
root 11239 11120 0 11:00:40 ? 0:00 /usr/lib/sendmail -bd -q15m
root 11265 11261 0 11:00:41 ? 0:00 /usr/sadm/lib/smc/bin/smcboot
root 11230 11120 0 11:00:40 ? 0:00 /usr/sbin/inetd -s
root 11273 11266 0 11:00:42 ? 0:00 htt_server -port 9010 -syslog
-message_locale C
root 11129 11120 0 11:00:36 ? 0:00 init
62
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Life in a Zone - 4# mount -p
/ - / ufs - no rw,intr,largefiles,logging,xattr,onerror=panic
/dev - /dev lofs - no zonedevfs
/export/home - /export/home lofs - no
/lib - /lib lofs - no ro,nodevices,nosub
/platform - /platform lofs - no ro,nodevices,nosub
/sbin - /sbin lofs - no ro,nodevices,nosub
/usr - /usr lofs - no ro,nodevices,nosub
proc - /proc proc - no nodevices,zone=app1
mnttab - /etc/mnttab mntfs - no nodevices,zone=app1
fd - /dev/fd fd - no rw,nodevices,zone=app1
swap - /var/run tmpfs - no nodevices,xattr,zone=app1
swap - /tmp tmpfs - no nodevices,xattr,zone=app1
# hostname
zone-app1
# zonename
app1
63
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Clone
As of S10 8/07, zones are “cloneable”
Much faster than installing a zone
As of 10/08 zones on ZFS -> ZFS clone - instantaneous
Usable only if the zones of similar configs
Configure a zone i.e. zone00
Install the zone
Configure a new zone i.e. zone01
Then rather than zoneadm install, with zone00 halted, do
# zoneadm –z zone01 clone –m copy zone00
64
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Clone (cont)
A cloned zone is unconfigured and must be configured
When ZFS used as clone file system# zoneadm -z <newzone> clone <oldzone>
Can clone a zone’s previously-taken snapshot via # zoneadm -z <newzone> clone -s \ <snapshot name> <oldzone>
65
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Clone (cont)So to clone zone1 to make zone2
# zonecfg -z zone1 export -f configfile
Edit configfile to change zonepath and address (at least)
Create zone2 via zonecfg -z zone2 -f configfile
Halt zone1 via zoneadm -z zone1 halt
Clone zone1 via zoneadm -z zone2 clone zone1
Use “-m copy” if zone1 on UFS
Boot up both zones
Check status via zoneadm list -iv66
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone MigrationZones can be moved between like systems
Available S10 8/07
Separate the zone from its current system
# zoneadm –z <zone> detach Note zone must be halted first
Attach a detached zone to a different system (assuming its file system is now visible there, send a tarball, etc)
# zoneadm –z <zone> attach [-F]Note zone must be configured before this can work
Note new system is validated to assure the zone can function there
To create a config for a zone that is detached rather than having to zonecfg it from scratch
# zonecfg –z <zone> create -a zonepath67
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Migration (cont)Can dry-run an attach / detach via the “-n” option to see if the attach will work
Can upgrade the attaching zone on the attaching system via “-u” but only if all packages on the attaching system are as new or newer than the detaching system
Can force an attach if a detach could not be done (dead system for example)
Best to save your zone cfg files for use on the attach system (or you have to recreate them)
68
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Other Cool Zone Stuff
ps –Z shows zone in which each process is running
Can use resource manager with zones
Zones can use global naming services
Use features to enable or disable accounts per zone
Interzone networking executed via loopback for performance
69
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
LabsCreate a “simple” zone
Install it
Boot it
Configure it
Look around in it - file systems, processes, resource use, users, etc
Halt it
70
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and DTrace
Zones can get some DTrace privileges (starting 11/06)# zonecfg -z my-zone
zonecfg:my-zone> set limitpriv="default,dtrace_proc,dtrace_user"
zonecfg:my-zone> exit
DTrace can use zonenames are predicates to filter results
# dtrace -n 'syscall:::/zonename==”zone1”/
{@[probefunc]=count()}'
71
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Fair-share SchedulingSolaris has many scheduler classes available
A thread has priority 0-169, user threads are 0-59
The higher the priority, the sooner scheduled on CPU
Scheduler class decides how the priority is modified over time
Default user-land is Time-sharing
Time-sharing dynamically changes the priority of each thread based on its activity
If a thread used it time quantum, its priority decreases
(The quantum is the scheduling interval)
Kernel uses “sys” class
Have a look via ps -elfc
72
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Fair-share Scheduling
73
!"#$%&'()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778
!"#$%&'"$(%&)'(*+,($
!"#$%&'())*+#,%-',*'.*/,#0/%$&
BackupApp ServerDatabaseWeb
12
3
4!
!""#"!"$#$!
"%$!%5
!"#$%&'()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778
!"#$%&'"$(%&)'(*+,($
!"#$%&'())*+#,%-',*'.*/,#0/%$&
BackupApp ServerDatabaseWeb
12
3
4!
!""#"!"$#$!
"%$!%5
!"#$%&'()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778
!"#$%&'"$(%&)'(*+,($
!"#$%&'())*+#,%-',*'.*/,#0/%$&
BackupApp ServerDatabaseWeb
12
3
4!
!""#"!"$#$!
"%$!%5
Database gets 4 / 4+3+2+1= 40% of all CPU time available to container
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and Fair Share Scheduling
FSS allows all CPU to be used if needed, but overuse to be limited based on “shares” given to CPU users
Shares give to projects et al, and/or to containers
Load the fair share schedule as the default schedule class
dispadmin –d FSS
Move all processes into the FSS classpriocntl -s -c FSS -i class TS
Give the global zone some (2) sharesNote this is not persistent across reboots!prctl -n zone.cpu-shares -v 2 -r -i zone global
74
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and Fair-share scheduling (2)
Check the shares of the global zoneprctl -n zone.cpu-shares -i zone global
Add a zone-wide resource control (1 share) to a zone (within zonecfg) (before S10U5)
zonecfg:my-zone> add rctl zonecfg:my-zone:rctl> set name=zone.cpu-shareszonecfg:my-zone:rctl> add value \ (priv=privileged,limit=1,action=none)zonecfg:my-zone:rctl> end
How many total shares are given out on a given machine?
75
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
FX SchedulerTime-share is heavy weight scheduler
Has to calculate for every thread that ran in the last quantum, every quantum
Plus decreases priority on CPU hogs
Instead consider “FX” - fixed scheduler class
All priorities stay the same
Light weight schedule can gain back a few percent of CPU
76
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 77!"#$%&'()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778
!"#$%&'()*+,-.'*(/,,0+! 9-*&4#-:$,)$4()"0$'5)*-#$(-*)"('-*$*"'5$1*$3/;*<$.-.)(+<$=>?$')##-',&)#*
! @$0))A$'1#$B-$1**)'&1,-:$C&,5$3/;*$1#:$1$*'5-:"A-(
! 3/;*$'1#$B-$1**&4#-:D
" :+#1.&'1AA+<$B+$')#E&4"($1$.&#&.".$1#:$.1F&.".$#".B-($)E$3/;*$,51,$1$G)#-$)($0))A$*5)"A:$"*-
" B+$!)A1(&*$C5-#$&,$:-'&:-*$,)$,(1#*E-($3/;*$1.)#4$-F&*,$0))A*$C&,5$H,5(-*5)A:H$1#:$H&.0)(,1#'-H$01(1.-,-(*
" *,1,&'1AA+<$B+$H0&##H$1$3/;$,)$1$0))A$2$"*-E"A$,)$-#*"(-$,51,$1$0()'-**$*,1+*$)#$1$3/;$1#:$:)-*#H,$*51(-$,5-$3/;H*$'1'5-
" @$3/;$&*$.)I-:$B-,C--#$0))A*$C5-#$1#$H&.0)(,1#,H$C)(JA)1:$*"(01**-*$&,*$",&A&G1,&)#$,5(-*5)A:$E)($1$*"EE&'&-#,$0-(&):$)E$,&.-
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 78!"#$%&'()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778
!"#$%&'()*+,-.'*(/,,0+
! 95-(-$&*$)#-$0)):$')#;&4"(1,&)#$0-($!):1(&*$&#*,1#'-! <+$=-;1":,>$)#-$0)):$-?&*,*>$@0)):A=-;1":,B! 95-*-$'1#$C-$C)"#=$,)$1$0)):D" /()'-**>$,1*E>$0()F-',>$3)#,1&#-(
! G$3)#,1&#-($'1#$C-$*,1,&'1::+$1**&4#-=$,)$1#$-?&*,$H*51(-=I$0)):$J5-#$,5-$3)#,1&#-($C)),*" %":,&0:-$3)#,1&#-(*$'1#$*51(-$,51,$0)):" !"'5$1$3)#,1&#-($)#:+$"*-*$(-*)"('-*$J5-#$&,$&*$("##
! G$3)#,1&#-($'1#$C-$1**&4#-=$,)$1$,-.0)(1(+$0)):" /)):$)#:+$-?&*,*$J5&:-$3)#,1&#-($("#*" 951,$0)):$'1##),$C-$*51(-=$J&,5$),5-($3)#,1&#-(*
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
DRPsYou can make “DRP”s non-dynamic by not including a variation in the range (i.e. 2 to 2 rather than 1 to 2)
Probably preferred rather than real dynamic
With pools, interrupts and I/O only occur in the default pool
This can help pin a process to a set of CPUS
Cache stays hot, less context switching
So consider a DRP config with the kernel in the default pool and all apps in another pool
79
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and Dynamic Resource Pools
Assign zones to dedicated CPU resources
Used to assign zone to processor set
Can be dynamically created, deleted, modified
Can be used with FSS
Can be used to reduce Oracle (and other?) costs!
Consider two DRPs, one with an email container and one with 2 X web server containers (and global) (from http://www.sun.com/software/solaris/howtoguides/containersLowRes.jsp):
80
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and DRPs (cont)
81
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and DRPs (cont)Create a pool (from global zone) via# # enable DRPs# pooladm –e# # save current config
# pooladm –s# # show current state, at start only pool_default exists
global# pooladm
system my_system string system.comment int system.version 1 boolean system.bind-default true int system.poold.pid 638
pool pool_default int pool.sys_id 0 boolean pool.active true boolean pool.default true int pool.importance 1 string pool.comment pset pset_default
82
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and DRPs (cont)pset pset_default
int pset.sys_id -1 boolean pset.default true uint pset.min 1 uint pset.max 65536 string pset.units population uint pset.load 7 uint pset.size 8 string pset.comment
cpu int cpu.sys_id 1 string cpu.comment string cpu.status on-line
cpu int cpu.sys_id 0 string cpu.comment string cpu.status on-line
cpu int cpu.sys_id 3 string cpu.comment string cpu.status on-line
cpu int cpu.sys_id 2 string cpu.comment string cpu.status on-line
83
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and DRPs (cont)Create a new one-CPU processor set called email-pset# poolcfg -c 'create pset email-pset (uint pset.min=1; uint pset.max=1)'
Create a resource pool for the processor set# poolcfg -c 'create pool email-pool'
Link the pool to the processor set# poolcfg -c 'associate pool email-pool (pset email-pset)'
Set an objective (if including a range of processors (i.e. min <> max)# poolcfg -c 'modify pset email-pool (string pset.poold.objectives="wt-load")'
Activate the configuration# pooladm -c
84
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and DRPs (cont)Check the config
# pooladm
system my_system string system.comment int system.version 1 boolean system.bind-default true int system.poold.pid 638
pool email-pool int pool.sys_id 1 boolean pool.active true boolean pool.default false int pool.importance 1 string pool.comment pset email
pool pool_default int pool.sys_id 0 boolean pool.active true boolean pool.default true int pool.importance 1 string pool.comment pset pset_default
pset email-pset int pset.sys_id 1 boolean pset.default false uint pset.min 1 uint pset.max 1 string pset.units population uint pset.load 0 uint pset.size 1 string pset.comment
cpu int cpu.sys_id 0 string cpu.comment string cpu.status on-line
85
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and DRPs (cont)Check the config
pset pset_default int pset.sys_id -1 boolean pset.default true uint pset.min 1 uint pset.max 65536 string pset.units population uint pset.load 7 uint pset.size 7 string pset.comment
cpu int cpu.sys_id 1 string cpu.comment string cpu.status on-line
cpu int cpu.sys_id 3 string cpu.comment string cpu.status on-line
cpu int cpu.sys_id 2 string cpu.comment string cpu.status on-line
86
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
DRPsNote that you can give ranges of CPUs to be used in DRPs
If you do be sure to set an “objective” else nothing will be dynamic
Note that some software licenses allow licensing of the app for only those CPUs in the DRP that the zone is attached to (i.e. only pay for your DRP CPUs, not all CPUs)(!)
87
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and DRPs (cont)Now enable FSS, make it default for pool_default
# poolcfg -c 'modify pool pool_default (string pool.scheduler="FSS")'
Create an instance of the configuration
# pooladm -c
Move all the processes in the default pool and its associated zones under the FSS.
# priocntl -s -c FSS -i class TS
# priocntl -s -c FSS -i pid 1
Now have the zones use the DRPs
# zonecfg –z email-zone
zonecfg:email-zone> set pool=email-pool
# zonecfg –z Web1-zone
zonecfg: Web1-zone> set pool=pool_default
zonecfg:Web1-zone> add rctl
zonecfg:Web1-zone:rctl> set name=zone.cpu-shares
zonecfg:Web1-zone:rctl> add value (priv=privileged,limit=3,action=none)
zonecfg:Web1-zone:rctl> end
# zonecfg -z Web2-zone
zonecfg:Web2-zone> set pool=pool_default
zonecfg:Web2-zone> add rctl
zonecfg:Web2-zone:rctl> set name=zone.cpu-shares
zonecfg:Web2-zone:rctl> add value (priv=privileged,limit=2,action=none)
zonecfg:Web2-zone:rtcl> end
88
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones, Resources, and S10 8/07
Much simpler now if you just want a zone to have dedicated CPUs, memory limits
(From http://blogs.sun.com/jerrysblog/feed/entries/atom?cat=%2FSolaris)
zonecfg:my-zone> set scheduling-class=FSS zonecfg:my-zone> add dedicated-cpu zonecfg:my-zone:dedicated-cpu> set ncpus=1-4zonecfg:my-zone:dedicated-cpu> set importance=10zonecfg:my-zone:dedicated-cpu> end
zonecfg:my-zone> add capped-memoryzonecfg:my-zone:capped-memory> set physical=50mzonecfg:my-zone:capped-memory> set swap=128mzonecfg:my-zone:capped-memory> set locked=10mzonecfg:my-zone:capped-memory> end
You have to enable poold via svcadm if “importance”used
Still use dispadmin to set system-wide scheduling89
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones, Resources, and S10 8/07 (cont)
Can use zonecfg for the global zone to persistently set resource management settings in global
Now can set other zone-wide resource limits easily
zone.cpu-shares zone.max-locked-memory (locked property of the capped-memory resource is preferred) zone.max-lwps zone.max-msg-ids zone.max-sem-ids zone.max-shm-ids zone.max-shm-memory zone.max-swap (The swap property of the capped-memory resource is the preferred way to set this control)
90
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and Networking S10 8/07Can now create exclusive-IP zones (i.e. dedicate an HBA port to a zone) known as “IP Instances”
Need this if you want advanced networking features in a zone (firewalls, snooping, DHCP client, traffic shaping)
Each zone get its own IP stack (and soon xVM will too)
zonecfg:my-zone>set ip-type=exclusive
zonecfg:my-zone> add net
zonecfg:my-zone:net> set physical=e1000g1zonecfg:my-zone:net> end
Now the zone can set its own IP address et al, can do IPMP within a zone
“zonecfg set physical=” to one of the interfaces in an IPMP group
Project Crossbow will allow virtual NICs to be IP instance entity (no longer tying up Ethernet port)
Limited to Ethernet devices that use GLDv3 drivers (dladm show-link not reporting “legacy”)
91
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones, Resources and 5/08CPU Caps Can limit the aggregated amount of CPU that a container’s CPUs can accumulate
Although it is possible to use prctl(1M) command to manage CPU caps, the capctl Perl script that simplifies it# capctl <-P project> <-p pid> <-Z zone> <-n name> <-v value>
* -P proj: Specify project id * -p pid: Specify pid
* -Z zone: Specify zone name * -n name: Specify resource name
* -v value: Specify resource value
For example, to set a cap for project foo to 50% you can say:# capctl -P foo -v 50
To change the cap to 80%:# capctl -P foo -v 80
To see the cap value:
# capctl -P foo
To remove the cap:# capctl -P foo -v 0
92
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
prctl vs zonecfg
prctl can read resource settings in the global or child zones
Not persistent for setting variables
Can’t set variables in the child zone
zonecfg is persistent, but only runs in global zone
93
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone IssuesZone cannot reside on NFS
But zone can be NFS client
Each zone normally has a “sparse” installation of a package, if package is from “inherit-package-dir” directory treeBy default, a package installed in global zone is installed in all existing non-global zones
Unless the pkgadd –G or –Z options are used
See also SUNW_PKG_ALLZONES and SUNW_PKG_HOLLOW package parameters
Patches installed in global zone is installed in all non-global zones
If any zone does not match patch dependencies, patch not installed
94
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone issues - contUpgrading the global zone to a new Solaris release upgrades the non-global zones but depends on which upgrade method is used (hint - use live upgrade)
Best practice is to keep packages and patches synced between global and all non-global zones
Watch out for giving users root in a zone – could violate policy or regulations
Flash Archive (flar) can be used to capture system containing zones and clone it, but only if zones are halted.
Details at http://www.opensolaris.org/os/community/zones/faq/flar_zones
95
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and Packages# pkgadd -d screen*
The following packages are available:
1 SMCscreen screen
(intel) 4.0.2
Select package(s) you wish to process (or 'all' to process
all packages). (default: all) [?,??,q]:
## Not processing zone <zone10>: the zone is not running and cannot be booted
## Booting non-running zone <zone0> into administrative state
## waiting for zone <zone0> to enter single user mode...
## Verifying package <SMCscreen> dependencies in zone <zone0>
## Restoring state of global zone <zone0>
## Booting non-running zone <zone1> into administrative state
## waiting for zone <zone1> to enter single user mode...
. . .
## Booting non-running zone <zone0> into administrative state
## waiting for zone <zone0> to enter single user mode...
## waiting for zone <zone0> to enter single user mode...
## Installing package <SMCscreen> in zone <zone0>
96
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Sparse Zones vs. Whole Root ZonesWhen should you use “sparse”, when should you use “whole root”
Check per-application support and/or requirements
sparse zones don’t allow writes into /, /usr, etc by default, some apps don’t like that
Can intermix sparse and whole-root on the same system
Make a sparse root into a whole root
# zonecfg create -b
In the future, likely that the world will use whole root zones and ZFS cloning
But zone roots on ZFS not supported until U6 because not upgradeable
97
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Upgrading a System Containing Containers
Supported methods vary, depending on OS release being upgraded from
Generally liveupgrade is best, but many details to consider
Well documented at http://docs.sun.com/app/docs/doc/820-4041/gdzlc?a=view
98
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Best PracticesNote that global zone root can copy files directly into zones via their zonepath directory
Consider building at least one container per system
Put all users and apps in there
Fast to copy for testing
Fast reboot
Put it on shared storage for future attach / detach
But watch out for limits
dtrace
app support in a zone
Surprisingly, a global-zone mount within the zone file system is immediately seen in the zone
99
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Best Practices (2)Use zonecfg export to save each zone’s config settings - store on a different system
For every zone created, in its “virgin state”, create a clone of it and store it on a different system
Put zones on ZFS for best feature set
Consider configuring child zones to send syslog output to central syslog server
100
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and /etc/systemFor variables no longer in /etc/system they can be set via the rctladm command, but only per project. This example is from the Sun installation guide for Weblogic on Solaris 10…
Modify /etc/project in each zone the app will run in to contain the following additions to the resource controls for user.root (assuming the application will run as root):
bash-3.00# cat /etc/project
system:0::::user.root:1::::process.max-file-descriptor=(privileged,1024,deny);
process.max-sem-ops=(privileged,512,deny);process.max-sem-nsems=(privileged,512,deny);
project.max-sem-ids=(privileged,1024,deny);project.max-shm-ids=(privileged,1024,deny);project.max-shm-memory=(privileged,4294967296,deny)
noproject:2::::default:3::::
group.staff:10::::
101
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zones and /etc/system (cont)
Note that /etc/project is read at loginAlso to enable warnings via syslog if the resource limits are approached execute the following commands once in each zone the app will run in (they update the /etc/rctladm.conf file)Do this in the global zone, not persistent so script it:
#rctladm -e syslog process.max-file-descriptor#rctladm -e syslog process.max-sem-ops#rctladm -e syslog process.max-sem-nsems#rctladm -e syslog process.max-sem-ids#rctladm -e syslog process.max-shm-ids#rctladm -e syslog process.max-shm-memory
102
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Branded Zones
Shipped in S10 8/07
Allows native binary execution of bins from other operating systems
Centos first
Install a brandz zone, install the “guest” OS, then install binaries (RPMs et al) and run them
Currently limited to centos and other 2.4-based distros
Result - can use DTrace to analyze Linux perf problems
See man pages for brands(5), lx(5)
103
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
brandzExample install given at http://milek.blogspot.com/2006/10/brandz-integrated-into-snv49.html
# zonecfg -z linuxlinux: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:linux> create -t SUNWlx
zonecfg:linux> set zonepath=/home/zones/linux
zonecfg:linux> add net
zonecfg:linux:net> set address=192.168.1.10/24
zonecfg:linux:net> set physical=bge0
zonecfg:linux:net> end
zonecfg:linux> add attr
zonecfg:linux:attr> set name="audio"
zonecfg:linux:attr> set type=boolean
zonecfg:linux:attr> set value=true
zonecfg:linux:attr> end
zonecfg:linux> exit
104
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
brandz (cont)# zoneadm -z linux install -d /mnt/iso/
centos_fs_image.tar.bz2A ZFS file system has been created for this zone.
Installing zone 'linux' at root directory '/home/zones/linux'
from archive '/mnt/iso/centos_fs_image.tar.bz2'
This process may take several minutes.
Setting up the initial lx brand environment.
System configuration modifications complete!
Setting up the initial lx brand environment.
System configuration modifications complete!
Installation of zone 'linux' completed successfully.
Details saved to log file:
"/home/zones/linux/root/var/log/linux.install.10064.log"
105
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Solaris 8 and 9 ContainersNow available as a commercial product ($) from Sun
Uses brandz
Capture a Solaris 8 or Solaris 9 system via Archiver (aka P2V)
Updater Tool, processes Solaris 8 image and prepares it for new, virtualized environment
Create it as a container under S10
Apps think they are on S8 or S9
Sun “guarantees” compatibility
SPARC only106
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Solaris 8 and 9 Containers - cont http://www.sun.com/software/solaris/pdf/solaris8and9containers_datasheet.pdf
# zonecfg -z zone8
zonecfg:zone8> create -t SUNWsolaris8
zonecfg:zone8> set zonepath = /export/home/zones/zone8
zonecfg:zone8> add net
zonecfg:zone8:net> set address = <IP Address>
zonecfg:zone8:net> set physical = e1000g1
zonecfg:zone8:net> end
zonecfg:zone8> verify
zonecfg:zone8> commit
zonecfg:zone8> exit
# zoneadm -z zone8 install -a <FLAR_image_location> {-u|-p}
Try for 90 days via http://www.sun.com/software/solaris/containers/getit.jsp
107
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zonestatTool to monitor entire system performance, including per-zone
More information that prstat -Z
Download from http://opensolaris.org/os/project/zonestat/# ./zonestat
|--Pool--|Pset|-------Memory-----|
Zonename| IT|Size|Used| RAM| Shm| Lkd| VM|
------------------------------------------
global 0D 2 0.1 556M 0.0 0.0 331M
zone1 0D 2 0.0 26M 0.0 0.0 24M
==TOTAL= === 2 0.1 608M 0.0 0.0 355M
# ./zonestat -l
|----Pool-----|------CPU-------|----------------Memory----------------|
|---|--Size---|-----Pset-------|---RAM---|---Shm---|---Lkd---|---VM---|
Zonename| IT| Max| Cur| Cap|Used|Shr|S%| Cap|Used| Cap|Used| Cap|Used| Cap|Used
-------------------------------------------------------------------------------
global 0D 2 0.0 0.1 5 83 556M 18E 0.0 18E 0.0 18E 331M
zone1 0D 2 0.0 0.0 1 16 26M 18E 0.0 18E 0.0 18E 24M
==TOTAL= --- ---- 2 ---- 0.1 --- -- 3.1G 608M 3.1G 0.0 3.0G 0.0 4.0G 355M
108
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Zone Futures
Live migration
Improved networking via project crossbow
Not just ip-exclusive. Virtual network stack for each container
S10 containers?
109
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
LabsCreate a container with resource management (your choice)
What is your view of file systems?
What file systems are yours, what are shared?
What do the file systems look like from the global zone?
Test the resource management if possible
What does your networking look like?
What is your life like in a zone?
How are zones different from domains? From vmware?
What scheduler is in use in your zone?
If fair share, how many shares does your zone have?
110
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Labs (cont)
If you are not fair share scheduled, turn it on and enable shares for your container
Clone the zone
Detach and attach the zone (to the same system if necessary)
111
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
LDOMS
112
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
LDOMsLogical domains
Released April ’07
Only on Niagara and future CMT chips (Niagara II, Rock)
Like enterprise-system domains but within one chip
Slice the chip into multiple LDOMs, each with its own OS root, boot independently, et
Now can run multiple OSes on 1 SPARC chip113
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 114
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
LDOMs - DetailsCan create up to 1 LDOM per thread(!)
Best practice seems to be max one LDOM per core
i.e. 8 LDOMs on Niagara I and II
Nice intro bloghttp://blogs.sun.com/ash/entry/ultrasparc_t2_launched_today
And nice flash demohttp://www.sun.com/servers/coolthreads/ldoms/
Community cookbookshttp://wikis.sun.com/display/SolarisLogicalDomains/LDoms+Community+Cookbook
115
Saturday, May 2, 2009
116 1
LDOMS Introductionand
Hands-On-Training
With Thanks to: Tom GendronSPARC Systems Technical SpecialistSun Microsystems
Peter Baer GalvinChief TechnologistCorporate Technologies
Saturday, May 2, 2009
117
Agenda• Virtualization Comparisons• Concepts of LDOMs• Requirements of LDOMs• Examples• Best Practices
Saturday, May 2, 2009
118
The Data Center Today
Data
Cente
r Ma
nage
ment
Server
OS
ApplicationService
Storage
ClientDe
velop
erNE
TWOR
K
DatabaseDatabaseAppServer
MailServer
AppServerAverage server
utilization between 5 to 15 %
Server sprawl is hard to manage
Single application per server
Energy costs continue to rise
Saturday, May 2, 2009
119
A widely understood problem
Saturday, May 2, 2009
120
Virtualization: Who and Why
InformationWeek: Feb 12, 2007 http://www.informationweek.com/news/showArticle.jhtml?articleID=197004875
Saturday, May 2, 2009
121
Server VirtualizationHard Partitions Virtual Machines OS Virtualization Resource Mgmt.
Server
OS
App
> Very High RAS> Very Scalable> Mature Technology> Ability to run different OS
versions> Complete Isolation
> Very scalable and low overhead
> Single OS to manage> Cleanly divides system and
application administration> Fine grained resource
management
> Very scalable and low overhead
> Single OS to manage> Fine grained resource
management
> Live OS migration capability
> Improved Utilization> Ability to run different OS
versions and types> De-couples OS and HW
versions
Multiple OSs Single OS
CalendarServer Database Web
ServerSunRayServer
AppServerDatabaseMail
ServerWeb
ServerFile
ServerIdentityServer
AppServer Database
Saturday, May 2, 2009
122
Para vs. Full Virtualization• Para-virtualization:
> OS ported to special architecture> Uses generic “virtual” device drivers> More efficient since it is “hypervisor”
aware> “almost” native performance
• Full virtualization:> OS has no idea it is running virtualized> Must emulate real i/o devices> Can be slow/need help from hardware> May use traps, emulation or rewriting
MailServer
WebServer
FileServer
Server
OS
App
Para-Virtualization
MailServer
WebServer
FileServer
Server
OS
App
Full Virtualization
Control Domain
Saturday, May 2, 2009
123
What is an LDOM?• It is a virtual server• Has its own console and OBP instance• A configurable allocation of CPU, FPU, Disk, Memory and I/O components• Runs a unique OS/patch image and configuration• Has the capability to stop, start and reboot independently• Utilizes a Hypervisor to facilitate LDOMs
Saturday, May 2, 2009
124
Requirements for LDOMs• Sun T-Series server
> T1/2000 T5x20 rack servers> T6100, T6120 blade> Any future CMT based server
• Up to date Firmware on service processor http://sunsolve.sun.com/handbook_pub/validateUser.do?target=index
• minimum Solaris 10 11/06 on T1/2000, T6100 • minimum Solaris 10 08/07 T5x20, T6120• Ldom Manager Software 1.0.1 + patches
Saturday, May 2, 2009
125
Hypervisor• A thin interface between the Hardware and Solaris• The interface is called sun4v• Solaris calls the sunv4 interface to use hardware
specific functions• It is very simple and is implemented in firmware• It allows for the creation of ldoms• It creates communication channels between ldoms
Saturday, May 2, 2009
126
Key LDOMs components• The Hypervisor• The Control Domain• The Service Domain• Multiple Guest
Domains• Virtualised devices
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmd
vntsd CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResources
/dev/lofi/1
vol1
vsw0
/dev/dsk/c0d0s0
vdisk0
vnet0
vnet0 Guestldom2
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 08/07+app+patches
/dev/dsk/c0d0s0
vdisk1
vnet1
vnet0
primary-vds0primary-vsw0
MemMem Crypto
CPUCpu
drd
Primary/Control ldom1 ldom2
Saturday, May 2, 2009
127
LDOMs types• Different Ldom Types
- Control Domain - Hosts the Logical Domain Manager (LDM)
- Service Domains - Provides virtual services to other domains
- I/0 Domains - Has direct access to physical devices
- Guest Domains - Used to run user environments
• Control, Service and I/O domains can be combined or separate> One of the I/O domains must be the control domain
Saturday, May 2, 2009
128
Key LDOMs components• The Hypervisor• The Control Domain• The Service Domain• Multiple Guest
Domains• Virtualised devices
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmd
vntsd CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResources
ZFS FS
vol1
vsw0
/dev/dsk/c0d0s0
vdisk0
vnet0
vnet0 Guestldom2
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 08/07+app+patches
/dev/dsk/c0d0s0
vdisk1
vnet1
vnet0
primary-vds0primary-vsw0
MemMem Crypto
CPUCpu
drd
Primary/Control ldom1 ldom2
Saturday, May 2, 2009
129
'Control' Domain• Creates and manages other LDOMs • Runs the LDOM Manager software• Allows monitoring and reconfiguration of domains• Recommendation:
> Make this Domain as secure as possible
Saturday, May 2, 2009
130
Key LDOMs components• The Hypervisor• The Control Domain• The Service Domain• Multiple Guest
Domains• Virtualised devices
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmd
vntsd CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResources
ZFS FS
vol1
vsw0
/dev/dsk/c0d0s0
vdisk0
vnet0
vnet0 Guestldom2
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 08/07+app+patches
/dev/dsk/c0d0s0
vdisk1
vnet1
vnet0
primary-vds0primary-vsw0
MemMem Crypto
CPUCpu
drd
Primary/Control ldom1 ldom2
Saturday, May 2, 2009
131
'Service' Domain• Provides services to other domains
– virtual network switch– virtual disk service– virtual console service
• Multiple Service domains can exist with shared or sole access to system facilities
• Allows for IO load separation and redundancy within domains deployed on a platform
• Often Control and Service Domains are one and the same
Saturday, May 2, 2009
132
IO Domain• IO Domain has direct access to physical input and
output devices.• The number of IO domains is hardware dependent
> currently limited to 2> limited by PCI-E switch configuration
• One IO domain must also be the control domain
Saturday, May 2, 2009
133
Key LDOMs components• The Hypervisor• The Control Domain• The Service Domain• Multiple Guest
Domains• Virtualised devices
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmd
vntsd CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResources
ZFS FS
vol1
vsw0
/dev/dsk/c0d0s0
vdisk0
vnet0
vnet0 Guestldom2
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 08/07+app+patches
/dev/dsk/c0d0s0
vdisk1
vnet1
vnet0
primary-vds0primary-vsw0
MemMem Crypto
CPUCpu
drd
Primary/Control ldom1 ldom2
Saturday, May 2, 2009
134
'Guest' Domains• Contain the targeted applications the LDOMs were
created to service.• Multiple Guest domains can exist
> Constrained only by hardware limitations• May use one or more Service domains to obtain IO
> Various redundancy mechanisms can be used• Can be independently 'powered' and rebooted and
without affecting other domains
Saturday, May 2, 2009
135
Key LDOMs components• The Hypervisor• The Control Domain• The Service Domain• Multiple Guest
Domains• Virtualised devices
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmd
vntsd CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResources
ZFS FS
vol1
vsw0
/dev/dsk/c0d0s0
vdisk0
vnet0
vnet0 Guestldom2
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 08/07+app+patches
/dev/dsk/c0d0s0
vdisk1
vnet1
vnet0
primary-vds0primary-vsw0
MemMem Crypto
CPUCpu
drd
Primary/Control ldom1 ldom2
Saturday, May 2, 2009
136
Virtual devices• Virtual devices are hardware resources abstracted by the hypervisor
and made available for use by the other domains• Virtual devices are :
> CPU's - VCPU> Memory - > Crypto cores - MAU> Network switches - VSW> NICs - VNET> Disk servers - VDSDEV> Disks - VDISK> Consoles - VCONS
Saturday, May 2, 2009
137
Example 1Install Ldom Manager &
Setting up the Control Domain
Saturday, May 2, 2009
138
Example 1 steps• Update firmware to latest release• Install Supported version of Solaris • Install Logical Domain Manager (LDM) software• Configure the control domain • Save initial domain config• Reboot Solaris
Saturday, May 2, 2009
139
A note on system interfaces• Provide out-of-band management• Two types (iLOM and ALOM)• T1/2000 uses ALOM interface• T5x20 uses iLOM • iLOM “CLI” has a ALOM compatibility shell
> ALOM shell used in the examples• A web based interface available• (SC = system controller, SP = system processor)
> essentially the same thing.
Saturday, May 2, 2009
140
Web based iLOM interface
Saturday, May 2, 2009
141
ALOM compatibility shell• login to SP as root/changeme• -> create /SP/users/admin• -> set /SP/users/admin role=Administrator• -> set /SP/users/admin cli_mode=alom
– Creating user ...– Enter new password: ********– Enter new password again: ********– Created /SP/users/admin
• exit• login as admin
Saturday, May 2, 2009
142
Step 1Firmware verification and update
Saturday, May 2, 2009
143
System Identification and Update• Check the Service Processor of your system for firmware levels• using alom mode (showhost not available in bui)
sc> showhost Sun System Firmware 7.0.1 2007/09/14 16:31
Host flash versions: Hypervisor 1.5.1 2007/09/14 16:11 OBP 4.27.1 2007/09/14 15:17 POST 4.27.1 2007/09/14 15:43
Check SC Firmware version 7.0.1
• Upgrade your system firmware if needed...> flashupdate command> sysfwdownload (via Solaris on platform) > BUI
Saturday, May 2, 2009
144
Firmware update examplesc> showkeyswitchKeyswitch is in the NORMAL position.sc> flashupdate -s 10.8.66.15 -f /incoming//Sun_System_Firmware-6_4_6-Sun_Fire_T2000.binUsername: tgendronPassword: ********
SC Alert: System poweron is disabled.Update complete. Reset device to use new software.sc> sc> resetsc
telnet and login back in once up.
sc> showhostSun-Fire-T2000 System Firmware 6.5.5 2007/10/28 23:09
Saturday, May 2, 2009
145
Firmware update example 2Step 1: From Solaris running on T5120 with the SP to updateDownload the patch from Sun Solve 127580-05.zip
Step 2: unzip and cd into 127580-05Step 3:
run sysfwdownload [image].pkgStep 4:reboot solaris sc> resetsc
Saturday, May 2, 2009
146
Installing LDOM manager software• T5x20 requires Solaris 10 8/07 or greater• T1/2000 requires Solaris 10 11/06 or greater +
• 11/06 is minimum for guests• ldm 1.0.2 is current
> includes Solaris Security Toolkit (optional)
* 124921-02 at a minimum * 125043-01 at a minimum * 118833-36 at a minimum
Saturday, May 2, 2009
147
Install the LDM Software• Unzip and install w/installation script• Security of Control Domain is important
> Recommend selecting the JASS secure configuration• Once complete entire system is one LDOM• LDOM software installed in /opt/SUNWldm
# [cmt1/root] ldm listNAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIMEprimary active -n-cv SP 64 8064M 0.0% 3h 19m[cmt1/root]
All the system resource are in domain “primary”* Follow the Administration Guide to install required OS and patches
Saturday, May 2, 2009
148
Flag Definitions
- placeholderc control domaind delayed reconfigurationn normals starting or stoppingt transitionv virtual I/O domain
# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIMEprimary active -n-cv SP 32 32640M 0.1% 6d 20h 24m#
Saturday, May 2, 2009
149
Example 1Part 2
Setting up the Control Domain
Saturday, May 2, 2009
150
On naming things...• Choose LDOM component names carefully
> Names are used to manage the devices > Bad choices can be very confusing later on...> Keep names short and specific...
• You need names for ...> Disk Servers, and disk device instances> Network Virtual Switches, and network device instances> Domains
• Service and device names are only known to the Control and Service domains
– Guest domains just see virtual devices.
Saturday, May 2, 2009
151
HardwareShared CPU,Memory & IO
Control/Service Domain• On our 'Primary' Domain do the following ...• In this example Control and Service are combined
> Control domain runs the LDM> Service domain has these services set up:
• Set up the basic services needed.> vds - virtual disk service> vcc - virtual console concentrator> vsw - virtual network switch
• The service names in this example are below:> primary-vds0> primary-vcc0> primary-vsw0
• Allocate resources> CPU, Memory, Crypto, IO devices
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
Network
Solaris 10 08/07ldmdvntsd
PCI-E
CryptoCrypto
vcc0 vds0
vsw0
primary-vdsprimary-vsw
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
MemMem
CPUCpu
CPUCpu
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem
MemMem
MemMem
CryptoCrypto
CryptoCrypto
UnallocatedResources
CryptoCrypto
drd
Primary
72GB 72GB
Saturday, May 2, 2009
152
Control/Service Domain set-up (1)# Add services to the control domain # The mac address taken from a physical interface, e.g., e1000g0.ldm add-vds primary-vds0 primaryldm add-vcc port-range=5000-5100 primary-vcc0 primaryldm add-vsw mac-addr=0:14:4f:6a:9e:dc net-dev=e1000g0 primary-vsw0 primary# Activate the virtual network terminal server svcadm enable vntsd# Allocate resources to the control domain and saveldm set-mau 1 primaryldm set-vcpu 8 primaryldm set-memory 2G primaryldm add-spconfig my-initial# Reboot required to have the configuraiton take effect.init 6
Saturday, May 2, 2009
153
Crypto NoteNote–If you have any cryptographic devices in the control domain, you cannot dynamically reconfigure CPUs. So if you are not using cryptographic devices, set-mau to 0.
Saturday, May 2, 2009
154
Control/Service Domain set-up (2)# Verify the primary domain configuration ldm list-domainNAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIMEprimary active -n-cv SP 8 2 G 6.3% 6m# Enable Networking ifconfig vsw0 plumbifconfig e1000g0 down unplumbifconfig vsw0 10.8.66.208 netmask 255.255.255.0 broadcast + upifconfig -alo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 vsw0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3 inet 10.8.66.208 netmask ffffff00 broadcast 10.8.66.255 ether 0:14:4f:6a:9e:dc
Saturday, May 2, 2009
155
Ldom Service details
Saturday, May 2, 2009
156
Reconfiguration• Dynamic reconfiguration
> Resource changes that take effect w/out reboot of domain• Delayed reconfiguration
> Resource changes that take effect after a reboot• Resource examples:
> VCPU, Memory, IO devices• Currently only VCPUs are dynamic
Saturday, May 2, 2009
157
Virtual Disk Server device (vds)• VDS runs in a service domain• Performs disk I/O on corresponding raw devices• Device types can be
> A entire physical disk or LUN (can be san based) > Single slice of disk or LUN> Disk image in a filesystem (e.g. ufs, zfs) > Disk volumes (zfs, svm, VxVM) > lofi devices NOT supported
• Virtual Disk Client (vdc drivers) > Requests standard block IO via the VDS> Classic client/server architecture
DelayedReconfiguration
Saturday, May 2, 2009
158
Virtual Disk devices• Physical LUNS perform best• Disk image files efficient use of space• ZFS snapshots and clones give rapid provisioning• Network install not supported with
> zfs volumes > single slice
• Network install requries> entire disk> disk image file
Saturday, May 2, 2009
159
Virtual Network Switch services(vswitch)
• Implements a layer-2 network switch • Connects virtual network devices to
> To the physical network> or to each other (internal private network)
• vswitch not automatically used by service domain> must be plumbed
DelayedReconfiguration
Saturday, May 2, 2009
160
Virtual Console Concentrator(vcc)• Provides console access to LDoms• Service domain VCC driver communicates with all guest console
drivers over the Hypervisor> No changes required in guest console drivers (qcn)
• Makes each console available as a tty device on the Control/Service domain
• usage: telnet local host <port>
DelayedReconfiguration
Saturday, May 2, 2009
161
Virtual Network Terminal Server daemon (vntsd) • VCC implemented by vntsd• Runs in the Control/Service domain• Aggregates the VCC tty devices and makes them available over
network sockets > Accessible once a domain is configured and bound> Attach prior to domain start to watch domain OBP boot sequence
• Only one user at a time can view a serial console• Flexible support of port groups, IP's, port numbers etc
> Not visible outside the Control/Service domain by default
DelayedReconfiguration
Saturday, May 2, 2009
162
Example 2Setting up the Guest Domain
Saturday, May 2, 2009
163
HardwareShared CPU,Memory & IO
Guest Domain
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
Network
Solaris 10 08/07ldmdvntsd
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
CryptoCrypto
UnallocatedResources
ldm1-vol1
vsw0
/dev/c0t1d0s0
primary-vds0primary-vsw0
/dev/dsk/c0d0s0
ldm1-vdisk1
72GB
vnet0
drd
Primary ldm1
• Watch the console of ldom1 using ...> telnet localhost 5000
ldm add-domain ldm1ldm add-mau 1 ldm1ldm add-vcpu 4 ldm1ldm add-memory 4G ldm1ldm add-vnet vnet0 primary-vsw0 ldm1
ldm add-vdsdev /dev/dsk/c0t1d0s2 ldm1-vol1@primary-vds0ldm add-vdisk ldm1-vdisk1 ldm1-vol1@primary-vds0 ldm1
ldm set-var auto-boot\?=false ldm1ldm set-var boot-device=vdisk ldm1ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel-devices@200/network@0 ldm1ldm bind-domain ldm1ldm start-domain ldm1
72GB
/dev/e1000g0
In the control domain:T2000
Saturday, May 2, 2009
164
Disk Service Setup• Establish a Virtual Disk Service
– 'primary-vds'• Associate it with some form of
media.– A real device or slice /dev/dsk/
c0t1d0s0 or– or a disk image e.g. '/
ldmzpool/ldg1' • Create disk server device
instance to be exported to guest domains
– 'ldm1-vol1@primary-vds' ldm add-vdsdev /dev/dsk/c0t1d0s2
ldm1-vol1@primary-vds0 ldm add-vdisk ldm1-vdisk1 ldm1-
vol1@primary-vds0 ldm1 (The disk device name can vary - find it
via “ok show-devs”)
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 11/06
ldmdvntsd
Guestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06+app+patches
PCI-E
ldm1-vol1
primary-vds
ldm1-vdisk1
/dev/dsk/c0d0s0drd
/dev/c0t1d0s2
Primary ldm1
72GB
Saturday, May 2, 2009
165
Virtual Disk Client (vdc)
• vdc's are the objects passed to OBP and the Operating System in guest systems
• Guest domain OBP and Solaris sees normal SCSI devices• Domain administrators may setup devaliases or use raw vdisk
devices• vdc’s provide Guest domains with virtual disk devices (vdisks) via
device instances from Virtual Disk Servers running in the Service Domains(s)
• A future release will provide virtualised access to DVD/CD-ROM in service domains
DelayedReconfiguration
Saturday, May 2, 2009
166
Network Setup• Establish a Virtual Network Switch
Services– 'primary-vsw0'
> Automatically associated with a vsw device instance– 'vsw0'
• May or may not choose to associate it with media.
– 'e1000g0' a real NIC– or no NIC . in memory
• Create a network device instance to provide to guest domains
– 'vnet0@ldm1'
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmdvntsd
Guestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06+app+patches
PCI-E
primary-vsw
vnet0
vnet0@ldm1
drd
primary-vsw0
Primary ldom1
e100
0g0
Saturday, May 2, 2009
167
Virtual Network Device (vnet) • Implements an ethernet device in a domain
> Communicates with other vnets or the outside world over vswitch devices
• If the vSwitch is suitably configured, packets can be routed out of the server.
• vnet exports a GLDv3 interface> A simple virtual Ethernet NIC> Enumerates as a 'vnetx' device> For domain-domain transfers, vnets connect 'directly'.
DelayedReconfiguration
Saturday, May 2, 2009
168
Memory
• Memory is configured through the Control Domain• Minimum allocatable chunk is 8kB
> Minimum size is 12MB (for OBP) > Though most OS deployments will need > 512M
• If memory is added over time to a domain> Memory device bindings within a domain may appear to show that
memory fragmentation is occuring> Not a problem, all handled in HW by the MMU> No performance penalty
DelayedReconfiguration
Saturday, May 2, 2009
169
vCPU's• Each UltraSPARC T1 has up to 8 physical cores with 4 threads each
> Each thread is considered a vCPU, so up to 32 vCPUs or Domains• Each UltraSPARC T2 has up to 8 physical cores with 8 threads each
> Each thread is considered a vCPU, so up to 64 vCPUs or Domains• Maximum Granularity is 1 vCPU per domain• vCPU's can only be allocated to one Domain at a time.• Can be dynamically allocated with the Domain running,
> Take care if removing a vcpu from a running domain, will there be enough compute power left in the domain ?
ImmediateReconfiguration
Saturday, May 2, 2009
170
Example 3Guest Domains and ZFS
Saturday, May 2, 2009
171
Using ZFS (1) – setup zfs 1. Remove the disk from the service domainldm stop-domain ldm1LDom ldm1 stoppedldm unbind-domain ldm1ldm remove-vdsdev ldm1-vol1@primary-vds0
2. Create a zpoolroot@cmt1 > zfs create mypool/ldomsroot@cmt1 > zfs create mypool/ldoms/ldm1root@cmt1 > cd /export/ldoms/ldm1root@cmt1 > lsroot@cmt1 > mkfile 12G `pwd`/rootdisk
Saturday, May 2, 2009
172
Using ZFS (2) – setup guest domain 3. Configure the guest domain root@cmt1 > ldm add-domain ldm1root@cmt1 > ldm add-vcpu 8 ldm1root@cmt1 > ldm add-memory 1G ldm1root@cmt1 > ldm add-vnet vnet0 primary-vsw0 ldm1root@cmt1 > ldm add-vdsdev /export/ldoms/ldm1/rootdisk ldm1-vol1@primary-vds0root@cmt1 > ldm add-vdisk ldm1-vdisk1 ldm1-vol1@primary-vds0 ldm1
root@cmt1 > ldm set-var auto-boot\?=false ldm1root@cmt1 > ldm set-var boot-device=ldm1-vdisk1 ldm1root@cmt1 > ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel- devices@200/network@0 ldm1
Saturday, May 2, 2009
173
Using ZFS (3) – setup guest domain 4. Start the guest domain root@cmt1 > ldm bind-domain ldm1root@cmt1 > ldm start-domain ldm1LDom ldm1 started
5. Inspect the domainroot@cmt1 > ldm list-domainNAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIMEprimary active -n-cv SP 8 2G 0.7% 17h 12mldm1 active -t--- 5000 8 1G 13% 7s
telnet localhost 5000{ok} boot vnet0 - installinstallation goes forward
Saturday, May 2, 2009
174
Provision the guest 6. Set up for jumpstartDetermine the mac address root@cmt1 > ldm list-bindings ldm1[snip]NETWORK NAME SERVICE DEVICE MAC vnet0 primary-vsw0@primary network@0 00:14:4f:f8:2a:c4 PEER MAC primary-vsw0@primary 00:14:4f:46:41:b4
telnet localhost 5000{0} ok bannerSPARC Enterprise T5120, No Keyboard[snip]Ethernet address 0:14:4f:fb:7:42, Host ID: 83fb0742.
Saturday, May 2, 2009
175
Provision the guest (2){0} ok boot vnet0 - installBoot device: /virtual-devices@100/channel-devices@200/network@0ile and args: - installRequesting Internet Address for 0:14:4f:f8:2a:c4SunOS Release 5.10 Version Generic_120011-14 64-bit...
How to breaktelnet> send brkDebugging requested; hardware watchdog suspended.
c)ontinue, s)ync, r)eboot, h)alt? rResetting...
{0} ok
Saturday, May 2, 2009
176
Guest Domain (zfs) login{0} ok bootBoot device: ldm1-vdisk1 File and args: SunOS Release 5.10 Version Generic_120011-14 64-bitCopyright 1983-2007 Sun Microsystems, Inc. All rights reserved.Use is subject to license terms.Hostname: ldm1
ldm1 console login:
Saturday, May 2, 2009
177
Using ZFS (2) – cloning domains Snapshot and Clone the installed boot disktgendron@cmt1 > zfs listNAME USED AVAIL REFER MOUNTPOINTmypool 12.0G 54.9G 27.5K /exportmypool/ldoms 12.0G 54.9G 25.5K /export/ldomsmypool/ldoms/ldm1 12.0G 54.9G 12.0G /export/ldoms/ldm1
root@cmt1 > zfs snapshot mypool/ldoms/ldm1@initial
Create the clonesroot@cmt1 > zfs snapshot mypool/ldoms/ldm1@initialroot@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm2root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm3root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm4root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm5
Saturday, May 2, 2009
178
Using ZFS (2) – Leverage the clones 4. Create the new guest domains (should be easily to script this) ldm add-domain ldm2ldm add-vcpu 8 ldm2ldm add-memory 1G ldm2ldm add-vnet vnet0 primary-vsw0 ldm2
ldm add-vdsdev /export/ldoms/ldm2/rootdisk ldm2-vol1@primary-vds0ldm add-vdisk ldm2-vdisk1 ldm2-vol1@primary-vds0 ldm2
ldm set-var auto-boot\?=false ldm2ldm set-var boot-device=vdisk ldm2ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel-devices@200/network@0 ldm2ldm bind-domain ldm2ldm start-domain ldm2
Saturday, May 2, 2009
179
Boot the cloned ldom{0} ok bootBoot device: vdisk File and args: SunOS Release 5.10 Version Generic_120011-14 64-bitCopyright 1983-2007 Sun Microsystems, Inc. All rights reserved.Use is subject to license terms.WARNING: vnet0 has duplicate address 010.030.019.178 (in use by 00:14:4f:f8:2a:c4); disabledFeb 13 19:55:29 svc.startd[7]: svc:/network/physical:default: Method "/lib/svc/method/net-physical" failed with exit status 96.Feb 13 19:55:29 svc.startd[7]: network/physical:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details) Hostname: ldm1...
Saturday, May 2, 2009
180
Example 4Split Service Domains
Saturday, May 2, 2009
181
Sun Fire T2000 Block Diagram
Saturday, May 2, 2009
182
Split IO Example
Check which PCI bus ports we own and are currently using and be sure to only give away unused ones... i.e need to retain the Control Domain boot disk controller and network device...Providing a PCI bus to a Guest makes the selected Domain a Service domain, by definition –
access to physical IO = Service Domain.
• Setting up a second Service domain with split PCI busses...-bash-3.00# ldm list-bindings primaryName: primary...IO: pci@780 (bus_a) pci@7c0 (bus_b)...-bash-3.00# df // (/dev/dsk/c1t0d0s0 ):28233648 blocks 3450076 files-bash-3.00# ls -l /dev/dsk/c1t0d0s0lrwxrwxrwx 1 root root 65 Apr 11 13:25 /dev/dsk/c1t0d0s0 -> ../../devices/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0:a-bash-3.00# grep e1000g /etc/path_to_inst"/pci@780/pci@0/pci@1/network@0" 0 "e1000g""/pci@780/pci@0/pci@1/network@0,1" 1 "e1000g""/pci@7c0/pci@0/pci@2/network@0" 2 "e1000g""/pci@7c0/pci@0/pci@2/network@0,1" 3 "e1000g"-bash-3.00# ldm remove-io pci@780 primary..-bash-3.00# shutdown -i6 -y -g0..-bash-3.00# ldm add-io pci@780 second-svrc-dom-bash-3.00# ldm start second-srvc-dom-bash-3.00# ldm list-bindings..-bash-3.00#
Saturday, May 2, 2009
183
Sun Fire T5x20 Block Diagram
x8
16 x FB-DIMMs
Disk Chassis1RU 2RU/8
x4LSI1068E
x4
x4SAS links
x4
x1USB
to IDE
DVD
USB
2.0
2.0
2.0 IntelDualGbE
x4 x4
0 1 2 30
10GbESerDes
BCM8704
XFP
10GbEFibrePlugin
10GbECu PHY
BCMxxxx
x8
x8
x4 x4 x4x8
x4
USB 2.0Hub
PCI-ESwitch
PLX 8533
PCI-ESwitch
PLX 8533PCI-ESwitch
PLX 8517
PCI-Eto
USB
IntelDualGbE
10GbE10GbE
MPC885ILOM
ServiceProcessor
FPGA
Front Panel
USB Quad GbE Connectors
PCI-Ex16
PCI-Ex8
PCI-Ex8
PCI-Ex8
PCI-Ex8
PCI-Ex8
2RU Only
SerialMgt
NetworkMgt
POSIXSerial DB-9Rear Panel
SSI
Saturday, May 2, 2009
184
MPxIO considerations• MPxIO can be used in the Service/Control domain• Very straightforward to configure with defaults...
> Ensure you have two FC-AL HBA's in a single service domain attached to the the same SAN array
> Check that you have two paths to the same SAN devices ('ls /dev/dsk/') > Enable MPxIO by running the command 'stmsboot -e' and rebooting the control/
service domain> Check that you now have only a single path to the SAN devices...
Saturday, May 2, 2009
185
IPMP considerations• IPMP has several options for configurations
> Refer to the Admininstration Guide for worked examples...> Options are Multipathing in the Service Domain or Multipathing in the Guest
Domain
Saturday, May 2, 2009
186
Ldom 1.0.1Best Practice Guidence
Saturday, May 2, 2009
187
Ldom Best Practice (1)• Control Domain
> Runs LDM daemon processes> Must have adequate CPU and memory> Start w/ 1 core (4 or 8 threads) 1GB Memory> Make this domain as secure as possible
Saturday, May 2, 2009
188
Ldom Best Practice (2)• I/O and service domains
> Runs IO for other domains> Resources will be sized based on IO load> Start w/ 1 core and 1GB memory > 4GB of memory if zfs used for virtual disks images> Add complete cores as heavier I/O loads
Saturday, May 2, 2009
189
Ldom Best Practice (3)• Core/Thread Affinity
> Core resources are shared by threads> E.g. L1 cache and MAU, FPU
• Best to avoid allocating the threads of a core to separate domains
• Create larger Ldoms first using complete cores• Smaller domains last
Saturday, May 2, 2009
190
Ldom Best Practice (4)
• Cypto Units• Each T1/T2 physical CPU Core has a Crypto Unit
> 8 in total on a 8 core system> referred to as (MAU)
• Crypto cores can only be allocated to domains that have at least one vcpu(thread) on the same physical Core as the crypto unit
• Crypto cores cannot be shared, they are owned by exactly one (or no) Domain
• Probably best to allocate all four/eight threads on a Core to a domain that wants to use the Crypto core
DelayedReconfiguration
Saturday, May 2, 2009
191
More on Crypto Units• For example we define three domains in order of
LDOM1 then LDOM2 then LDOM 3...• LDOM1 has 3 threads (vCPUs) on Core 0
> Only has access to MAU0 since it only has threadson Core 0
• LDOM2 has 6 vCPUs spread across Cores 0, 1 & 2> Potentially has access to MAUs 0,1 & 2> BUT.. LDOM1 already binds MAU0> So only can take MAU1 and MAU2
• LDOM3 has 3 vCPUs on Core 2> But can't access any MAU's since LDOM2 has already taken MAU2
• Adding and removing vCPUs can cause access to previously accessible MAU's to be lost, currently you can't elect specific vCPU's, framework does that itself
• When MAU's are allocated to Domains, vCPU's become delayed reconfiguration properties in those domains
MAU0
T1 C
ore 0
MAU1
T1 C
ore 1
MAU2
T1 C
ore 2
LDOM1 LDOM2 LDOM3
Saturday, May 2, 2009
192
Ldom Best Practice (5)• Plan your LDOM configuration carefully, reconfiguration may become awkward• Use easy to understand names
> Try not to overload vds, vsw, ldom, vdisk,vnic etc...• Use MPxIO or VxVM, VxFS, Sun Cluster on service domains (only VxFS in
Guests) for resilient storage devices• Use IPMP on Guest or Service Domains for resilient network connections
Saturday, May 2, 2009
193
Ldom Best Practice (6)• For hi-speed inter-domain comms use device-less/in-memory VSW configs• For high disk performance, allocate a whole real device via a dedicated, properly
sized Virtual Disk Server and Service domain• Look at the server architecture when configuring devices to ensure you get the
bandwidth you expect• For critical applications consider hot/warm standby domains across multiple
physical servers, never rely on multiple instances within a single server.
Saturday, May 2, 2009
194
LDOM's v1.0.1 Notes• All domains can be Stopped and Started independently
> Beware, Guest domains attempting to perform IO using a rebooting Service domain will stall until the Service domain returns.
• LDOM SNMP MIB available now with traps and requests to the LDOM framework• MAC address on banner different from what is raprd for jumpstart• Only vcpu's can be dynamically reallocated
> BUT... if the domain has crypto cores this becomes a delayed reconfiguration> You cannot choose which vCPU's are allocated to a domain
• By default the Control/Service domain cannot network with Guest domains> Plumb the vSwitch vsw device to enable communications> Give the vsw device the e1000g devices MAC address
• Check you have the latest versions of the documents, Software & Firmware
Saturday, May 2, 2009
195
SVM, VxVM, ZFS Volume managers• SVM, VxVM and ZFS volumes can be exported from a Service Domain to Guest
domains and appear as virtual disks to the Guest Domains> Always appear as a disk with only one s0 slice> Can't be used as Solaris Install targets...yet, just use for data storage
• Can export a disk image file placed in one of these volumes as a full disk image to Guest domains> Allows use of the disk as Solaris Install Target> Doing this with ZFS allows very efficient re-use of images using ZFS
Snapshotting and Cloning and Compression> Invisibly bestows the benefits of the underlying Volume manager on the disks
available to the Guest domains> Using SVM allows either Guest or Service domain to access the disk image,
allowing for off-line maintenance of the guest domain filesystems (only one at a time can mount the filesystem)
• VxVM can only be used in the Service domain, not Guest domains
Saturday, May 2, 2009
196
Solaris Cluster 3.2 Support
• Sun Cluster 3.2 is now supported in IO Domains> i.e domains with real physical devices, PCI busses or NIU devices
• Please check the web site here for more infom on deployment scenarios> http://blogs.sun.com/SC/entry/announcing_solaris_cluster_support_in
Saturday, May 2, 2009
197
Logical Domains (LDoms) Roadmap
• LDoms 1.0> Niagara support> Up to 32 LDOMs per system, guest domain
may be rebooted independently> Virtualized console, ethernet, disk &
cryptographic acceleration> Live re-configuration of virtual CPUs> FMA diagnosis for each domain> Control domain hardening
* Requiring new Solaris 10 update
> LDoms 1.0.1 - CURRENT– Niagara2 support– I/O domain reboot support– Control domain minimization– SNMP MIB– Web management tool
(freeware/unsupported)
Saturday, May 2, 2009
198
References for further information
• http://www.sun.com/ldoms• Sun Blueprints relating to LDOMs
– http://www.sun.com/blueprints/0207/820-0832.html– http://www.sun.com/blueprints/0807/820-3023.html
• SDLC Release of LDOMs– http://www.sun.com/download/products.xml?id=46e5ba66
• Official Documentation for the SDLC release– http://www.sun.com/servers/coolthreads/ldoms/get.jsp
• LDOMs Blogs– http://blogs.sun.com/hlsu/entry/logincal_domains_1_0_1
• OpenSolaris LDOMs community– http://www.opensolaris.org/os/community/ldoms/
Saturday, May 2, 2009
199 1
LDOMS Introductionand
Hands-On-Training
With Thanks to: Tom GendronSPARC Systems Technical SpecialistSun Microsystems
Peter Baer GalvinChief TechnologistCorporate Technologies
Saturday, May 2, 2009
200
Agenda• Virtualization Comparisons• Concepts of LDOMs• Requirements of LDOMs• Examples• Best Practices
Saturday, May 2, 2009
201
The Data Center Today
Data
Cente
r Ma
nage
ment
Server
OS
ApplicationService
Storage
ClientDe
velop
erNE
TWOR
K
DatabaseDatabaseAppServer
MailServer
AppServerAverage server
utilization between 5 to 15 %
Server sprawl is hard to manage
Single application per server
Energy costs continue to rise
Saturday, May 2, 2009
202
A widely understood problem
Saturday, May 2, 2009
203
Virtualization: Who and Why
InformationWeek: Feb 12, 2007 http://www.informationweek.com/news/showArticle.jhtml?articleID=197004875
Saturday, May 2, 2009
204
Server VirtualizationHard Partitions Virtual Machines OS Virtualization Resource Mgmt.
Server
OS
App
> Very High RAS> Very Scalable> Mature Technology> Ability to run different OS
versions> Complete Isolation
> Very scalable and low overhead> Single OS to manage> Cleanly divides system and
application administration> Fine grained resource
management
> Very scalable and low overhead> Single OS to manage> Fine grained resource
management
> Live OS migration capability> Improved Utilization> Ability to run different OS
versions and types> De-couples OS and HW
versions
Multiple OSs Single OS
CalendarServer Database Web
ServerSunRayServer
AppServerDatabaseMail
ServerWeb
ServerFile
ServerIdentityServer
AppServer Database
Saturday, May 2, 2009
205
Para vs. Full Virtualization• Para-virtualization:
> OS ported to special architecture> Uses generic “virtual” device drivers> More efficient since it is “hypervisor” aware> “almost” native performance
• Full virtualization:> OS has no idea it is running virtualized> Must emulate real i/o devices> Can be slow/need help from hardware> May use traps, emulation or rewriting
MailServer
WebServer
FileServer
Server
OS
App
Para-Virtualization
MailServer
WebServer
FileServer
Server
OS
App
Full Virtualization
Control Domain
Saturday, May 2, 2009
206
What is an LDOM?• It is a virtual server• Has its own console and OBP instance• A configurable allocation of CPU, FPU, Disk, Memory and I/O components• Runs a unique OS/patch image and configuration• Has the capability to stop, start and reboot independently• Utilizes a Hypervisor to facilitate LDOMs
Saturday, May 2, 2009
207
Requirements for LDOMs• Sun T-Series server
> T1/2000 T5x20 rack servers> T6100, T6120 blade> Any future CMT based server
• Up to date Firmware on service processor http://sunsolve.sun.com/handbook_pub/validateUser.do?target=index
• minimum Solaris 10 11/06 on T1/2000, T6100 • minimum Solaris 10 08/07 T5x20, T6120• Ldom Manager Software 1.0.1 + patches
Saturday, May 2, 2009
208
Hypervisor• A thin interface between the Hardware and Solaris• The interface is called sun4v• Solaris calls the sunv4 interface to use hardware
specific functions• It is very simple and is implemented in firmware• It allows for the creation of ldoms• It creates communication channels between ldoms
Saturday, May 2, 2009
209
Key LDOMs components
• The Hypervisor• The Control Domain• The Service Domain• Multiple Guest
Domains• Virtualised devices
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmd
vntsd CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06
+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResources
/dev/lofi/1
vol1
vsw0
/dev/dsk/c0d0s0
vdisk0
vnet0
vnet0 Guestldom2
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 08/07
+app+patches
/dev/dsk/c0d0s0
vdisk1
vnet1
vnet0
primary-vds0primary-vsw0
MemMem Crypto
CPUCpu
drd
Primary/Control ldom1 ldom2
Saturday, May 2, 2009
210
LDOMs types• Different Ldom Types
- Control Domain - Hosts the Logical Domain Manager (LDM)
- Service Domains - Provides virtual services to other domains
- I/0 Domains - Has direct access to physical devices
- Guest Domains - Used to run user environments
• Control, Service and I/O domains can be combined or separate> One of the I/O domains must be the control domain
Saturday, May 2, 2009
211
Key LDOMs components
• The Hypervisor• The Control Domain• The Service Domain• Multiple Guest
Domains• Virtualised devices
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmd
vntsd CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06
+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResources
ZFS FS
vol1
vsw0
/dev/dsk/c0d0s0
vdisk0
vnet0
vnet0 Guestldom2
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 08/07
+app+patches
/dev/dsk/c0d0s0
vdisk1
vnet1
vnet0
primary-vds0primary-vsw0
MemMem Crypto
CPUCpu
drd
Primary/Control ldom1 ldom2
Saturday, May 2, 2009
212
'Control' Domain• Creates and manages other LDOMs
• Runs the LDOM Manager software
• Allows monitoring and reconfiguration of domains
• Recommendation:> Make this Domain as secure as possible
Saturday, May 2, 2009
213
Key LDOMs components
• The Hypervisor• The Control Domain• The Service Domain• Multiple Guest
Domains• Virtualised devices
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmd
vntsd CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06
+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResources
ZFS FS
vol1
vsw0
/dev/dsk/c0d0s0
vdisk0
vnet0
vnet0 Guestldom2
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 08/07
+app+patches
/dev/dsk/c0d0s0
vdisk1
vnet1
vnet0
primary-vds0primary-vsw0
MemMem Crypto
CPUCpu
drd
Primary/Control ldom1 ldom2
Saturday, May 2, 2009
214
'Service' Domain
• Provides services to other domains– virtual network switch– virtual disk service– virtual console service
• Multiple Service domains can exist with shared or sole access to system facilities
• Allows for IO load separation and redundancy within domains deployed on a platform
• Often Control and Service Domains are one and the same
Saturday, May 2, 2009
215
IO Domain• IO Domain has direct access to physical input and
output devices.• The number of IO domains is hardware dependent
> currently limited to 2> limited by PCI-E switch configuration
• One IO domain must also be the control domain
Saturday, May 2, 2009
216
Key LDOMs components
• The Hypervisor• The Control Domain• The Service Domain• Multiple Guest
Domains• Virtualised devices
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmd
vntsd CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06
+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResources
ZFS FS
vol1
vsw0
/dev/dsk/c0d0s0
vdisk0
vnet0
vnet0 Guestldom2
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 08/07
+app+patches
/dev/dsk/c0d0s0
vdisk1
vnet1
vnet0
primary-vds0primary-vsw0
MemMem Crypto
CPUCpu
drd
Primary/Control ldom1 ldom2
Saturday, May 2, 2009
217
'Guest' Domains
• Contain the targeted applications the LDOMs were created to service.
• Multiple Guest domains can exist> Constrained only by hardware limitations
• May use one or more Service domains to obtain IO > Various redundancy mechanisms can be used
• Can be independently 'powered' and rebooted and without affecting other domains
Saturday, May 2, 2009
218
Key LDOMs components
• The Hypervisor• The Control Domain• The Service Domain• Multiple Guest
Domains• Virtualised devices
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmd
vntsd CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06
+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResources
ZFS FS
vol1
vsw0
/dev/dsk/c0d0s0
vdisk0
vnet0
vnet0 Guestldom2
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 08/07
+app+patches
/dev/dsk/c0d0s0
vdisk1
vnet1
vnet0
primary-vds0primary-vsw0
MemMem Crypto
CPUCpu
drd
Primary/Control ldom1 ldom2
Saturday, May 2, 2009
219
Virtual devices• Virtual devices are hardware resources abstracted by the hypervisor and
made available for use by the other domains• Virtual devices are :
> CPU's - VCPU> Memory - > Crypto cores - MAU> Network switches - VSW> NICs - VNET> Disk servers - VDSDEV> Disks - VDISK> Consoles - VCONS
Saturday, May 2, 2009
220
Example 1Install Ldom Manager &
Setting up the Control Domain
Saturday, May 2, 2009
221
Example 1 steps• Update firmware to latest release• Install Supported version of Solaris • Install Logical Domain Manager (LDM) software• Configure the control domain • Save initial domain config• Reboot Solaris
Saturday, May 2, 2009
222
A note on system interfaces• Provide out-of-band management• Two types (iLOM and ALOM)• T1/2000 uses ALOM interface• T5x20 uses iLOM • iLOM “CLI” has a ALOM compatibility shell
> ALOM shell used in the examples• A web based interface available• (SC = system controller, SP = system processor)
> essentially the same thing.
Saturday, May 2, 2009
223
Web based iLOM interface
Saturday, May 2, 2009
224
ALOM compatibility shell• login to SP as root/changeme• -> create /SP/users/admin• -> set /SP/users/admin role=Administrator• -> set /SP/users/admin cli_mode=alom
– Creating user ...– Enter new password: ********– Enter new password again: ********– Created /SP/users/admin
• exit• login as admin
Saturday, May 2, 2009
225
Step 1Firmware verification and update
Saturday, May 2, 2009
226
System Identification and Update• Check the Service Processor of your system for firmware levels• using alom mode (showhost not available in bui)
sc> showhost Sun System Firmware 7.0.1 2007/09/14 16:31
Host flash versions: Hypervisor 1.5.1 2007/09/14 16:11 OBP 4.27.1 2007/09/14 15:17 POST 4.27.1 2007/09/14 15:43
Check SC Firmware version 7.0.1
• Upgrade your system firmware if needed...> flashupdate command> sysfwdownload (via Solaris on platform) > BUI
Saturday, May 2, 2009
227
Firmware update examplesc> showkeyswitchKeyswitch is in the NORMAL position.sc> flashupdate -s 10.8.66.15 -f /incoming//Sun_System_Firmware-6_4_6-Sun_Fire_T2000.binUsername: tgendronPassword: ********
SC Alert: System poweron is disabled.Update complete. Reset device to use new software.sc> sc> resetsc
telnet and login back in once up.
sc> showhostSun-Fire-T2000 System Firmware 6.5.5 2007/10/28 23:09
Saturday, May 2, 2009
228
Firmware update example 2Step 1: From Solaris running on T5120 with the SP to updateDownload the patch from Sun Solve 127580-05.zip
Step 2: unzip and cd into 127580-05Step 3:
run sysfwdownload [image].pkgStep 4:reboot solaris sc> resetsc
Saturday, May 2, 2009
229
Installing LDOM manager software• T5x20 requires Solaris 10 8/07 or greater• T1/2000 requires Solaris 10 11/06 or greater +
• 11/06 is minimum for guests• ldm 1.0.2 is current
> includes Solaris Security Toolkit (optional)
* 124921-02 at a minimum * 125043-01 at a minimum * 118833-36 at a minimum
Saturday, May 2, 2009
230
Install the LDM Software• Unzip and install w/installation script• Security of Control Domain is important
> Recommend selecting the JASS secure configuration• Once complete entire system is one LDOM• LDOM software installed in /opt/SUNWldm
# [cmt1/root] ldm listNAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIMEprimary active -n-cv SP 64 8064M 0.0% 3h 19m[cmt1/root]
All the system resource are in domain “primary”
* Follow the Administration Guide to install required OS and patches
Saturday, May 2, 2009
231
Flag Definitions
- placeholderc control domaind delayed reconfigurationn normals starting or stoppingt transitionv virtual I/O domain
# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIMEprimary active -n-cv SP 32 32640M 0.1% 6d 20h 24m#
Saturday, May 2, 2009
232
Example 1Part 2
Setting up the Control Domain
Saturday, May 2, 2009
233
On naming things...• Choose LDOM component names carefully
> Names are used to manage the devices > Bad choices can be very confusing later on...> Keep names short and specific...
• You need names for ...> Disk Servers, and disk device instances> Network Virtual Switches, and network device instances> Domains
• Service and device names are only known to the Control and Service domains
– Guest domains just see virtual devices.
Saturday, May 2, 2009
234
HardwareShared CPU,Memory & IO
Control/Service Domain• On our 'Primary' Domain do the following ...
• In this example Control and Service are combined> Control domain runs the LDM> Service domain has these services set up:
• Set up the basic services needed.> vds - virtual disk service> vcc - virtual console concentrator> vsw - virtual network switch
• The service names in this example are below:> primary-vds0> primary-vcc0> primary-vsw0
• Allocate resources> CPU, Memory, Crypto, IO devices
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
Network
Solaris 10 08/07
ldmd
vntsd
PCI-E
CryptoCrypto
vcc0 vds0
vsw0
primary-vdsprimary-vsw
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
MemMem
CPUCpu
CPUCpu
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem
MemMem
MemMem
CryptoCrypto
CryptoCrypto
UnallocatedResources
CryptoCrypto
drd
Primary
72GB 72GB
Saturday, May 2, 2009
235
Control/Service Domain set-up (1)# Add services to the control domain # The mac address taken from a physical interface, e.g., e1000g0.ldm add-vds primary-vds0 primaryldm add-vcc port-range=5000-5100 primary-vcc0 primaryldm add-vsw mac-addr=0:14:4f:6a:9e:dc net-dev=e1000g0 primary-vsw0 primary# Activate the virtual network terminal server svcadm enable vntsd# Allocate resources to the control domain and saveldm set-mau 1 primaryldm set-vcpu 8 primaryldm set-memory 2G primaryldm add-spconfig my-initial# Reboot required to have the configuraiton take effect.init 6
Saturday, May 2, 2009
236
Crypto NoteNote–If you have any cryptographic devices in the control domain, you cannot dynamically reconfigure CPUs. So if you are not using cryptographic devices, set-mau to 0.
Saturday, May 2, 2009
237
Control/Service Domain set-up (2)# Verify the primary domain configuration ldm list-domainNAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIMEprimary active -n-cv SP 8 2 G 6.3% 6m# Enable Networking ifconfig vsw0 plumbifconfig e1000g0 down unplumbifconfig vsw0 10.8.66.208 netmask 255.255.255.0 broadcast + upifconfig -alo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 vsw0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3 inet 10.8.66.208 netmask ffffff00 broadcast 10.8.66.255 ether 0:14:4f:6a:9e:dc
Saturday, May 2, 2009
238
Ldom Service details
Saturday, May 2, 2009
239
Reconfiguration• Dynamic reconfiguration
> Resource changes that take effect w/out reboot of domain• Delayed reconfiguration
> Resource changes that take effect after a reboot• Resource examples:
> VCPU, Memory, IO devices• Currently only VCPUs are dynamic
Saturday, May 2, 2009
240
Virtual Disk Server device (vds)• VDS runs in a service domain• Performs disk I/O on corresponding raw devices• Device types can be
> A entire physical disk or LUN (can be san based) > Single slice of disk or LUN> Disk image in a filesystem (e.g. ufs, zfs) > Disk volumes (zfs, svm, VxVM) > lofi devices NOT supported
• Virtual Disk Client (vdc drivers) > Requests standard block IO via the VDS> Classic client/server architecture
DelayedReconfiguration
Saturday, May 2, 2009
241
Virtual Disk devices• Physical LUNS perform best• Disk image files efficient use of space• ZFS snapshots and clones give rapid provisioning• Network install not supported with
> zfs volumes > single slice
• Network install requries> entire disk> disk image file
Saturday, May 2, 2009
242
Virtual Network Switch services(vswitch)
• Implements a layer-2 network switch • Connects virtual network devices to
> To the physical network> or to each other (internal private network)
• vswitch not automatically used by service domain> must be plumbed
DelayedReconfiguration
Saturday, May 2, 2009
243
Virtual Console Concentrator(vcc)
• Provides console access to LDoms• Service domain VCC driver communicates with all guest console
drivers over the Hypervisor> No changes required in guest console drivers (qcn)
• Makes each console available as a tty device on the Control/Service domain
• usage: telnet local host <port>
DelayedReconfiguration
Saturday, May 2, 2009
244
Virtual Network Terminal Server daemon (vntsd) • VCC implemented by vntsd• Runs in the Control/Service domain• Aggregates the VCC tty devices and makes them available over
network sockets > Accessible once a domain is configured and bound> Attach prior to domain start to watch domain OBP boot sequence
• Only one user at a time can view a serial console• Flexible support of port groups, IP's, port numbers etc
> Not visible outside the Control/Service domain by default
DelayedReconfiguration
Saturday, May 2, 2009
245
Example 2Setting up the Guest Domain
Saturday, May 2, 2009
246
HardwareShared CPU,Memory & IO
Guest Domain
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
Network
Solaris 10 08/07
ldmdvntsd
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06
+app+patches
PCI-E
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
CryptoCrypto
UnallocatedResources
ldm1-vol1
vsw0
/dev/c0t1d0s0
primary-vds0primary-vsw0
/dev/dsk/c0d0s0
ldm1-vdisk1
72GB
vnet0
drd
Primary ldm1
• Watch the console of ldom1 using ...> telnet localhost 5000
ldm add-domain ldm1ldm add-mau 1 ldm1ldm add-vcpu 4 ldm1ldm add-memory 4G ldm1ldm add-vnet vnet0 primary-vsw0 ldm1
ldm add-vdsdev /dev/dsk/c0t1d0s2 ldm1-vol1@primary-vds0ldm add-vdisk ldm1-vdisk1 ldm1-vol1@primary-vds0 ldm1
ldm set-var auto-boot\?=false ldm1ldm set-var boot-device=vdisk ldm1ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel-devices@200/network@0 ldm1ldm bind-domain ldm1ldm start-domain ldm1
72GB
/dev/e1000g0
In the control domain:
T2000
Saturday, May 2, 2009
247
Disk Service Setup• Establish a Virtual Disk Service
– 'primary-vds'• Associate it with some form of media.
– A real device or slice /dev/dsk/c0t1d0s0 or
– or a disk image e.g. '/ldmzpool/ldg1'
• Create disk server device instance to be exported to guest domains
– 'ldm1-vol1@primary-vds' ldm add-vdsdev /dev/dsk/c0t1d0s2 ldm1-
vol1@primary-vds0 ldm add-vdisk ldm1-vdisk1 ldm1-
vol1@primary-vds0 ldm1 (The disk device name can vary - find it via
“ok show-devs”)
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 11/06
ldmd
vntsd
Guestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06
+app+patches
PCI-E
ldm1-vol1
primary-vds
ldm1-vdisk1
/dev/dsk/c0d0s0drd
/dev/c0t1d0s2
Primary ldm1
72GB
Saturday, May 2, 2009
248
Virtual Disk Client (vdc)
• vdc's are the objects passed to OBP and the Operating System in guest systems
• Guest domain OBP and Solaris sees normal SCSI devices• Domain administrators may setup devaliases or use raw vdisk devices• vdc’s provide Guest domains with virtual disk devices (vdisks) via device
instances from Virtual Disk Servers running in the Service Domains(s)• A future release will provide virtualised access to DVD/CD-ROM in
service domains
DelayedReconfiguration
Saturday, May 2, 2009
249
Network Setup• Establish a Virtual Network Switch
Services– 'primary-vsw0'
> Automatically associated with a vsw device instance– 'vsw0'
• May or may not choose to associate it with media.
– 'e1000g0' a real NIC– or no NIC . in memory
• Create a network device instance to provide to guest domains
– 'vnet0@ldm1'
HardwareShared CPU,Memory & IO
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 08/07
ldmd
vntsd
Guestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06
+app+patches
PCI-E
primary-vsw
vnet0
vnet0@ldm1
drd
primary-vsw0
Primary ldom1
e100
0g0
Saturday, May 2, 2009
250
Virtual Network Device (vnet) • Implements an ethernet device in a domain
> Communicates with other vnets or the outside world over vswitch devices• If the vSwitch is suitably configured, packets can be routed out of the
server.• vnet exports a GLDv3 interface
> A simple virtual Ethernet NIC> Enumerates as a 'vnetx' device> For domain-domain transfers, vnets connect 'directly'.
DelayedReconfiguration
Saturday, May 2, 2009
251
Memory
• Memory is configured through the Control Domain• Minimum allocatable chunk is 8kB
> Minimum size is 12MB (for OBP) > Though most OS deployments will need > 512M
• If memory is added over time to a domain> Memory device bindings within a domain may appear to show that
memory fragmentation is occuring> Not a problem, all handled in HW by the MMU> No performance penalty
DelayedReconfiguration
Saturday, May 2, 2009
252
vCPU's• Each UltraSPARC T1 has up to 8 physical cores with 4 threads each
> Each thread is considered a vCPU, so up to 32 vCPUs or Domains• Each UltraSPARC T2 has up to 8 physical cores with 8 threads each
> Each thread is considered a vCPU, so up to 64 vCPUs or Domains• Maximum Granularity is 1 vCPU per domain• vCPU's can only be allocated to one Domain at a time.• Can be dynamically allocated with the Domain running,
> Take care if removing a vcpu from a running domain, will there be enough compute power left in the domain ?
ImmediateReconfiguration
Saturday, May 2, 2009
253
Example 3Guest Domains and ZFS
Saturday, May 2, 2009
254
Using ZFS (1) – setup zfs 1. Remove the disk from the service domainldm stop-domain ldm1LDom ldm1 stoppedldm unbind-domain ldm1ldm remove-vdsdev ldm1-vol1@primary-vds0
2. Create a zpoolroot@cmt1 > zfs create mypool/ldomsroot@cmt1 > zfs create mypool/ldoms/ldm1root@cmt1 > cd /export/ldoms/ldm1root@cmt1 > lsroot@cmt1 > mkfile 12G `pwd`/rootdisk
Saturday, May 2, 2009
255
Using ZFS (2) – setup guest domain 3. Configure the guest domain root@cmt1 > ldm add-domain ldm1root@cmt1 > ldm add-vcpu 8 ldm1root@cmt1 > ldm add-memory 1G ldm1root@cmt1 > ldm add-vnet vnet0 primary-vsw0 ldm1root@cmt1 > ldm add-vdsdev /export/ldoms/ldm1/rootdisk ldm1-vol1@primary-vds0root@cmt1 > ldm add-vdisk ldm1-vdisk1 ldm1-vol1@primary-vds0 ldm1
root@cmt1 > ldm set-var auto-boot\?=false ldm1root@cmt1 > ldm set-var boot-device=ldm1-vdisk1 ldm1root@cmt1 > ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel- devices@200/network@0 ldm1
Saturday, May 2, 2009
256
Using ZFS (3) – setup guest domain 4. Start the guest domain root@cmt1 > ldm bind-domain ldm1root@cmt1 > ldm start-domain ldm1LDom ldm1 started
5. Inspect the domainroot@cmt1 > ldm list-domainNAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIMEprimary active -n-cv SP 8 2G 0.7% 17h 12mldm1 active -t--- 5000 8 1G 13% 7s
telnet localhost 5000{ok} boot vnet0 - installinstallation goes forward
Saturday, May 2, 2009
257
Provision the guest 6. Set up for jumpstartDetermine the mac address root@cmt1 > ldm list-bindings ldm1[snip]NETWORK NAME SERVICE DEVICE MAC vnet0 primary-vsw0@primary network@0 00:14:4f:f8:2a:c4 PEER MAC primary-vsw0@primary 00:14:4f:46:41:b4
telnet localhost 5000{0} ok bannerSPARC Enterprise T5120, No Keyboard[snip]Ethernet address 0:14:4f:fb:7:42, Host ID: 83fb0742.
Saturday, May 2, 2009
258
Provision the guest (2){0} ok boot vnet0 - installBoot device: /virtual-devices@100/channel-devices@200/network@0ile and args: - installRequesting Internet Address for 0:14:4f:f8:2a:c4SunOS Release 5.10 Version Generic_120011-14 64-bit...
How to breaktelnet> send brkDebugging requested; hardware watchdog suspended.
c)ontinue, s)ync, r)eboot, h)alt? rResetting...
{0} ok
Saturday, May 2, 2009
259
Guest Domain (zfs) login{0} ok bootBoot device: ldm1-vdisk1 File and args: SunOS Release 5.10 Version Generic_120011-14 64-bitCopyright 1983-2007 Sun Microsystems, Inc. All rights reserved.Use is subject to license terms.Hostname: ldm1
ldm1 console login:
Saturday, May 2, 2009
260
Using ZFS (2) – cloning domains Snapshot and Clone the installed boot disktgendron@cmt1 > zfs listNAME USED AVAIL REFER MOUNTPOINTmypool 12.0G 54.9G 27.5K /exportmypool/ldoms 12.0G 54.9G 25.5K /export/ldomsmypool/ldoms/ldm1 12.0G 54.9G 12.0G /export/ldoms/ldm1
root@cmt1 > zfs snapshot mypool/ldoms/ldm1@initial
Create the clonesroot@cmt1 > zfs snapshot mypool/ldoms/ldm1@initialroot@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm2root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm3root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm4root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm5
Saturday, May 2, 2009
261
Using ZFS (2) – Leverage the clones 4. Create the new guest domains (should be easily to script this) ldm add-domain ldm2ldm add-vcpu 8 ldm2ldm add-memory 1G ldm2ldm add-vnet vnet0 primary-vsw0 ldm2
ldm add-vdsdev /export/ldoms/ldm2/rootdisk ldm2-vol1@primary-vds0ldm add-vdisk ldm2-vdisk1 ldm2-vol1@primary-vds0 ldm2
ldm set-var auto-boot\?=false ldm2ldm set-var boot-device=vdisk ldm2ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel-devices@200/network@0 ldm2ldm bind-domain ldm2ldm start-domain ldm2
Saturday, May 2, 2009
262
Boot the cloned ldom{0} ok bootBoot device: vdisk File and args: SunOS Release 5.10 Version Generic_120011-14 64-bitCopyright 1983-2007 Sun Microsystems, Inc. All rights reserved.Use is subject to license terms.WARNING: vnet0 has duplicate address 010.030.019.178 (in use by 00:14:4f:f8:2a:c4); disabledFeb 13 19:55:29 svc.startd[7]: svc:/network/physical:default: Method "/lib/svc/method/net-physical" failed with exit status 96.Feb 13 19:55:29 svc.startd[7]: network/physical:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details) Hostname: ldm1...
Saturday, May 2, 2009
263
Example 4Split Service Domains
Saturday, May 2, 2009
264
Sun Fire T2000 Block Diagram
Saturday, May 2, 2009
265
Split IO Example
Check which PCI bus ports we own and are currently using and be sure to only give away unused ones... i.e need to retain the Control Domain boot disk controller and network device...
Providing a PCI bus to a Guest makes the selected Domain a Service domain, by definition – access to physical IO = Service Domain.
• Setting up a second Service domain with split PCI busses...-bash-3.00# ldm list-bindings primaryName: primary...IO: pci@780 (bus_a) pci@7c0 (bus_b)...-bash-3.00# df // (/dev/dsk/c1t0d0s0 ):28233648 blocks 3450076 files-bash-3.00# ls -l /dev/dsk/c1t0d0s0lrwxrwxrwx 1 root root 65 Apr 11 13:25 /dev/dsk/c1t0d0s0 -> ../../devices/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0:a-bash-3.00# grep e1000g /etc/path_to_inst"/pci@780/pci@0/pci@1/network@0" 0 "e1000g""/pci@780/pci@0/pci@1/network@0,1" 1 "e1000g""/pci@7c0/pci@0/pci@2/network@0" 2 "e1000g""/pci@7c0/pci@0/pci@2/network@0,1" 3 "e1000g"-bash-3.00# ldm remove-io pci@780 primary..-bash-3.00# shutdown -i6 -y -g0..-bash-3.00# ldm add-io pci@780 second-svrc-dom-bash-3.00# ldm start second-srvc-dom-bash-3.00# ldm list-bindings..-bash-3.00#
Saturday, May 2, 2009
266
Sun Fire T5x20 Block Diagram
x8
16 x FB-DIMMs
Disk Chassis1RU 2RU/8
x4LSI1068E
x4
x4SAS links
x4
x1USB
to IDE
DVD
USB
2.0
2.0
2.0IntelDualGbE
x4 x4
0 1 2 30
10GbESerDes
BCM8704
XFP
10GbEFibrePlugin
10GbECu PHY
BCMxxxx
x8
x8
x4 x4 x4
x8
x4
USB 2.0Hub
PCI-ESwitch
PLX 8533
PCI-ESwitch
PLX 8533PCI-ESwitch
PLX 8517
PCI-Eto
USB
IntelDualGbE
10GbE
10GbE
MPC885ILOM
ServiceProcessor
FPGA
Front Panel
USB Quad GbE Connectors
PCI-Ex16
PCI-Ex8
PCI-Ex8
PCI-Ex8
PCI-Ex8
PCI-Ex8
2RU Only
SerialMgt
NetworkMgt
POSIXSerial DB-9Rear Panel
SSI
Saturday, May 2, 2009
267
MPxIO considerations• MPxIO can be used in the Service/Control domain• Very straightforward to configure with defaults...
> Ensure you have two FC-AL HBA's in a single service domain attached to the the same SAN array
> Check that you have two paths to the same SAN devices ('ls /dev/dsk/') > Enable MPxIO by running the command 'stmsboot -e' and rebooting the control/service
domain> Check that you now have only a single path to the SAN devices...
Saturday, May 2, 2009
268
IPMP considerations• IPMP has several options for configurations
> Refer to the Admininstration Guide for worked examples...> Options are Multipathing in the Service Domain or Multipathing in the Guest Domain
Saturday, May 2, 2009
269
Ldom 1.0.1Best Practice Guidence
Saturday, May 2, 2009
270
Ldom Best Practice (1)• Control Domain
> Runs LDM daemon processes> Must have adequate CPU and memory> Start w/ 1 core (4 or 8 threads) 1GB Memory> Make this domain as secure as possible
Saturday, May 2, 2009
271
Ldom Best Practice (2)• I/O and service domains
> Runs IO for other domains> Resources will be sized based on IO load> Start w/ 1 core and 1GB memory > 4GB of memory if zfs used for virtual disks images> Add complete cores as heavier I/O loads
Saturday, May 2, 2009
272
Ldom Best Practice (3)• Core/Thread Affinity
> Core resources are shared by threads> E.g. L1 cache and MAU, FPU
• Best to avoid allocating the threads of a core to separate domains
• Create larger Ldoms first using complete cores• Smaller domains last
Saturday, May 2, 2009
273
Ldom Best Practice (4)
• Cypto Units• Each T1/T2 physical CPU Core has a Crypto Unit
> 8 in total on a 8 core system> referred to as (MAU)
• Crypto cores can only be allocated to domains that have at least one vcpu(thread) on the same physical Core as the crypto unit
• Crypto cores cannot be shared, they are owned by exactly one (or no) Domain
• Probably best to allocate all four/eight threads on a Core to a domain that wants to use the Crypto core
DelayedReconfiguration
Saturday, May 2, 2009
274
More on Crypto Units• For example we define three domains in order of
LDOM1 then LDOM2 then LDOM 3...• LDOM1 has 3 threads (vCPUs) on Core 0
> Only has access to MAU0 since it only has threadson Core 0
• LDOM2 has 6 vCPUs spread across Cores 0, 1 & 2> Potentially has access to MAUs 0,1 & 2> BUT.. LDOM1 already binds MAU0> So only can take MAU1 and MAU2
• LDOM3 has 3 vCPUs on Core 2> But can't access any MAU's since LDOM2 has already taken MAU2
• Adding and removing vCPUs can cause access to previously accessible MAU's to be lost, currently you can't elect specific vCPU's, framework does that itself
• When MAU's are allocated to Domains, vCPU's become delayed reconfiguration properties in those domains
MAU0
T1 C
ore 0
MAU1
T1 C
ore 1
MAU2
T1 C
ore 2
LDOM1 LDOM2 LDOM3
Saturday, May 2, 2009
275
Ldom Best Practice (5)• Plan your LDOM configuration carefully, reconfiguration may become awkward• Use easy to understand names
> Try not to overload vds, vsw, ldom, vdisk,vnic etc...
• Use MPxIO or VxVM, VxFS, Sun Cluster on service domains (only VxFS in Guests) for resilient storage devices
• Use IPMP on Guest or Service Domains for resilient network connections
Saturday, May 2, 2009
276
Ldom Best Practice (6)• For hi-speed inter-domain comms use device-less/in-memory VSW configs
• For high disk performance, allocate a whole real device via a dedicated, properly sized Virtual Disk Server and Service domain
• Look at the server architecture when configuring devices to ensure you get the bandwidth you expect
• For critical applications consider hot/warm standby domains across multiple physical servers, never rely on multiple instances within a single server.
Saturday, May 2, 2009
277
LDOM's v1.0.1 Notes• All domains can be Stopped and Started independently
> Beware, Guest domains attempting to perform IO using a rebooting Service domain will stall until the Service domain returns.
• LDOM SNMP MIB available now with traps and requests to the LDOM framework• MAC address on banner different from what is raprd for jumpstart• Only vcpu's can be dynamically reallocated
> BUT... if the domain has crypto cores this becomes a delayed reconfiguration> You cannot choose which vCPU's are allocated to a domain
• By default the Control/Service domain cannot network with Guest domains> Plumb the vSwitch vsw device to enable communications> Give the vsw device the e1000g devices MAC address
• Check you have the latest versions of the documents, Software & Firmware
Saturday, May 2, 2009
278
SVM, VxVM, ZFS Volume managers
• SVM, VxVM and ZFS volumes can be exported from a Service Domain to Guest domains and appear as virtual disks to the Guest Domains> Always appear as a disk with only one s0 slice> Can't be used as Solaris Install targets...yet, just use for data storage
• Can export a disk image file placed in one of these volumes as a full disk image to Guest domains> Allows use of the disk as Solaris Install Target> Doing this with ZFS allows very efficient re-use of images using ZFS Snapshotting and
Cloning and Compression> Invisibly bestows the benefits of the underlying Volume manager on the disks available
to the Guest domains> Using SVM allows either Guest or Service domain to access the disk image, allowing for
off-line maintenance of the guest domain filesystems (only one at a time can mount the filesystem)
• VxVM can only be used in the Service domain, not Guest domains
Saturday, May 2, 2009
279
Solaris Cluster 3.2 Support
• Sun Cluster 3.2 is now supported in IO Domains> i.e domains with real physical devices, PCI busses or NIU devices
• Please check the web site here for more infom on deployment scenarios> http://blogs.sun.com/SC/entry/announcing_solaris_cluster_support_in
Saturday, May 2, 2009
280
Logical Domains (LDoms) Roadmap
• LDoms 1.0> Niagara support> Up to 32 LDOMs per system, guest domain may
be rebooted independently> Virtualized console, ethernet, disk &
cryptographic acceleration> Live re-configuration of virtual CPUs> FMA diagnosis for each domain> Control domain hardening
* Requiring new Solaris 10 update
> LDoms 1.0.1 - CURRENT– Niagara2 support– I/O domain reboot support– Control domain minimization– SNMP MIB– Web management tool
(freeware/unsupported)
Saturday, May 2, 2009
281
References for further information
• http://www.sun.com/ldoms• Sun Blueprints relating to LDOMs
– http://www.sun.com/blueprints/0207/820-0832.html– http://www.sun.com/blueprints/0807/820-3023.html
• SDLC Release of LDOMs– http://www.sun.com/download/products.xml?id=46e5ba66
• Official Documentation for the SDLC release– http://www.sun.com/servers/coolthreads/ldoms/get.jsp
• LDOMs Blogs– http://blogs.sun.com/hlsu/entry/logincal_domains_1_0_1
• OpenSolaris LDOMs community– http://www.opensolaris.org/os/community/ldoms/
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Domains
282
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Overview
283
Long-standing Sun server feature
E10Ks and all servers since then
Hard partition of system resources (bus, CPU, memory, I/O)
Options vary depending on hardware (how many domains, CPUs per domain)
Sometimes used in conjunction with Dynamic Reconfiguration (DR)
Controlled via firmware commands (XSCF on M-servers)
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Prep WorkDo this before installing Solaris / moving to production
Determine number of domains, resources per domain (CPU, memory, I/O)
Make sure I/O is redundant between allocation units (so for example a system board can be taken out of service without disabling I/O to a device)
PCI cards must support DR (per device)
Leave “kernel cage memory” enabled to minimize number of system boards kernel memory allocated to
Enabled by default in S10 (but costs a little performance)
Disable via set kernel_cage_enable=0 in /etc/system
284
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Prep Work (cont)
285
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
M-Servers
286
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 287
Server model Max system boards Max domains
M9000+EU 16 24
M9000 8 24
M8000 4 16
M5000 2 4
M4000 1 2
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Implementation
For M-servers, see http://docs.sun.com/source/819-3601-13
setupfru, showfru, setdcl, addboard, showdcl, showboards commands configure resources into domains in XSCF
288
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
DRCan install, remove, add , delete, move, register, configure, unconfigure, etc system boards
A system board is in one domain at a time
Move resources as needed between domains
Movement can be automated or manual
And I/O devices
While Solaris remains running
Good details in http://docs.sun.com/source/819-5992-12
289
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Implementation
XSCF used to configure DR
Shell and Web interfaces
Add to Domain command set showdcl, setdcl, addboard, deleteboard, moveboard, showdomainstatus
cfgadm and cfgadm_pci configures DR on I/O devices
Be sure to configure and implement all of this before going production - don’t plan on adding a domain to a production system without practice and experience
290
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
xVM Virtualbox
291
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Overview
292
Sun has a suite of xVM products
xVM ops center - patching x86 and SPARC (Linux too) plus provisioning
xVM virtualbox - desktop virtualization x86
xVM server (aka Xen) - hypervisor-like virtualization for x86
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Virtualbox
293
Open source (GPL) virtualization environment for x86 (and closed source commercial version)
(Sun bought the independent developer)
Completes Sun’s virtualization picture by adding desktop / workstation virtualization tool
Competes with VMWare workstation, Parallels, Fusion
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Platform Support
Runs on Windows, Linux, MacOS X, and OpenSolaris
Guest support is extensive, including Windows (NT 4.0, 98, 2000, XP, Server 2003, Vista), DOS/Windows 3.x, Linux (2.4 and 2.6), OpenBSD, Solaris, OpenSolaris
Full list at http://www.virtualbox.org/wiki/Guest_OSes
294
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
FeaturesModular design
Active community
VM descriptions in XML
Guest tools to add functionality to some guests
Shared folders
Multiple snapshots of VM states
Supports VT-x and AMD-V (enable per-VM)
Seamless windows on Windows guests, Linux, Solaris
Import of guest VMs in VMDK format
295
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Closed-source Features
Virtual USB controller
Remote Desktop Protocol (RDP) server support
Can connect to Virtualbox client from other systems, thin clients
USB over RDP works - guest can access local resources while displaying remotely
iSCSI initiator (can use iSCSI targets as virtual disks)
SATA controller (faster and less overhead than IDE)
296
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
XVM Server
297
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
xVM Server
298
Solaris-based bare-metal hypervisor based on Xen
Complete vm management
Goal is to be similar to VMWare ESX
Brand-new
Server itself is open source, is free to try
xVM Infrastructure Enterprise - multinode management of VMs
xVM Infrastructure Datacenter - multinode management of physical servers and physical and virtual nodes
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
FeaturesMS 2003, 2008, RedHat 4.6 / 5.2, Solaris and OpenSolaris guests
Live migration
Guest cloning / templating
xVM Ops Center integration
Java-based KVM access to guest OS consoles
Management is browser-based
VMDK-formatted guest OSes supported
Paravirtualized device drivers
NAS / CIFS storage support
Least privilege security model of services, management
DTrace integration (just how much?)
ZFS supported (guest OS file systems
299
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Implementation
TBD
300
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ReferencesYou Are Now Free to Move About
Solaris
301
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References [Kozierok] TCP/IP Guide, No Starch Press, 2005 [Nemeth] Nemeth et al, Unix System Administration
Handbook, 3rd edition, Prentice Hall, 2001 [SunFlash] The SunFlash announcement mailing list
run by John J. Mclaughlin. News and a whole lot more. Mail [email protected]
Sun online documents at docs.sun.com [Kasper] Kasper and McClellan, Automating Solaris Installations, SunSoft Press, 1995
302
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued)
[O’Reilly] Networking CD Bookshelf, Version 2.0, O’Reilly 2002
[McDougall] Richard McDougall et al, Resource Management, Prentice Hall, 1999 (and other "Blueprint" books)
[Stern] Stern, Eisler, Labiaga, Managing NFS and NIS, 2nd Edition, O’Reilly and Associates, 2001
303
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued) [Garfinkel and Spafford] Simson Garfinkel and Gene Spafford, Practical Unix & Internet Security, 3rd Ed, O’Reilly & Associates, Inc, 2003 (Best overall Unix security book)
[McDougall, Mauro, Gregg] McDougall, Mauro, and Gregg, Solaris Internals and Solaris Performance and Tools, 2007 (great Solaris internals, DTrace, mdb books)
304
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued) Subscribe to the Firewalls mailing list by sending
"subscribe firewalls <mailing-address>" to [email protected]
USENIX membership and conferences. Contact USENIX office at (714)588-8649 or [email protected]
Sun Support: Sun’s technical bulletins, plus access to bug database: sunsolve.sun.com
Solaris 2 FAQ by Casper Dik: ftp://rtfm.mit.edu/pub/usenet-by-group/comp.answers/Solaris2/FAQ
305
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued)Sun Managers Mailing List FAQ by John DiMarco: ftp://ra.mcs.anl.gov/sun-managers/faq
Sun's unsupported tool site (IPV6, printing)http://playground.sun.com/
Sunsolve STBs and Infodocshttp://www.sunsolve.com
306
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued) comp.sys.sun.* FAQ by Rob Montjoy: ftp://rtfm.mit.edu/pub/usenet-by-group/comp.answers/comp-sys-sun-faq
“Cache File System” White Paper from Sun: http://www.sun.com/sunsoft/Products/Solaris-whitepapers/Solaris-whitepapers.html
“File System Organization, The Art of Automounting” by Sun: ftp://sunsite.unc.edu/pub/sun-info/white-papers/TheArtofAutomounting-1.4.ps
Solaris 2 Security FAQ by Peter Baer Galvinhttp://www.sunworld.com/common/security-faq.html
Secure Unix Programming FAQ by Peter Baer Galvinhttp://www.sunworld.com/swol-08-1998/swol-08-security.html
307
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued) Firewalls mailing list FAQ: ftp://rtfm.mit.edu/pub/usenet-by-group/Comp.answers/firewalls-faq
There are a few Solaris-helping files available via anon ftp at ftp://ftp.cs.toronto.edu/pub/darwin/solaris2Peter’s Solaris Corner at SysAdmin Magazinehttp://www.samag.com/solaris
Marcus and Stern, Blueprints for High Availability, Wiley, 2000
Privilege Bracketing in Solaris 10http://www.sun.com/blueprints/0406/819-6320.pdf
308
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued)
Peter Baer Galvin's Sysadmin Column (and old Pete's Wicked World security columns, etc)http://www.galvin.info
My blog at http://pbgalvin.wordpress.comOperating Environments: Solaris 8 Operating Environment Installation and Boot Disk Layout by Richard Elling http://www.sun.com/blueprints (March 2000)Sun’s BigAdmin web site, including Solaris and Solaris X86 tools and information’http://www.sun.com/bigadmin
309
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued)
DTracehttp://users.tpg.com.au/adsln4yb/dtrace.html
http://www.solarisinternals.com/si/dtrace/index.php
http://www.sun.com/bigadmin/content/dtrace/
310
Saturday, May 2, 2009