© 2007 IBM Corporation
IBM Global Engineering Solutions
IBM Blue Gene/P
Blue Gene Bring Up
IBM Blue Gene/P System Administration
Linux on Service Node
SuSE SLES 10 A RAID array is recommended, typically either RAID1 or RAID5
depending on the number of disks available. Either 1 or 2 volume groups depending on the disk
configuration (rootvg and datavg).
IBM Blue Gene/P System Administration
Linux on Service Node
Partitions / - 1 GB - rootlv
/usr - 3 GB - usrlv
/var - 2 GB – varlv
/opt - 10 GB – optlv
/tmp - 10 GB – tmplv
swap - 4 GB - swap - swaplv
/dbhome - 20GB - dbhomelv
/bgsys - 10GB – bgsyslv
IBM Blue Gene/P System Administration
Linux on Service Node
RPMs cpp, gcc, libgcc, gcc-c++, gcc-64bit, glibc-devel, libgcc-64bit, bison,
texinfo, flex, termcap, termcap-64bit, gcc-fortran, gmp, gmp-64bit, gmp-devel, gmp-devel-64bit, ncurses-devel, ncurses-devel-64bit
vacpp.rte-8.0.1-2.ppc64.rpm xlsmp.rte-1.6.1-3.ppc64.rpm xlsmp.msg.rte-1.6.1-3.ppc64.rpm
bgp_os, bgp_base, bgptoolchain
Interfaces Functional network
Service network
Public network
IBM Blue Gene/P System Administration
Linux on Service Node
Groups
db2rasdb
db2iadm1
db2fadm1
db2asgrp
Users
bgpsysdb
bgpdb2c
bgpadmin
mpirun
NFS
IONodes mount /bgsys to finish their boot process, as such /bgsys is exported on the functional network via NFS
bgpuser
bgpdeveloper
bgpadmin
bgpservice
IBM Blue Gene/P System Administration
Front End Node
Groups bgpadmin
bgpservice
bgpdeveloper
bgpuser Users
mpirun Profile
/etc/profile.d/bgp.sh
IBM Blue Gene/P System Administration
Group Roles (set using bguser.pl)
Role Capability
user submit Jobs via mpirun read access to small amount of data (job/block status) on Service node, via Navigator access to the Front End nodes complete access compilers/tool chain/etc for development on the Front End nodes
developer submit jobs via mpirun read access to some data (job/block status) on Service node, via Navigator controlled and limited access to Service node - requires userid on SN doesn’t have root access but has elevated privileges complete access compilers/tool chain/etc for development on the Front End nodes. debugs with coreprocessor
admin complete access to Blue Gene/P functions on the Service Node and Front End Node(s)
service access to required debug tools, system logs, read access to database
IBM Blue Gene/P System Administration
DB2 Structure
IBM Blue Gene/P System Administration
DB2 - Why use a Database?
Need a software representation of the hardware A machine of such large scale requires a persistent means of
storing errors (RAS events), job history, block definitions, environmental readings, etc.
Operational state of the machine can be obtained without touching the hardware
IBM Blue Gene/P System Administration
Other Benefits of a Database
Setting values in the database can trigger actions in other components
Can simplify the design by having policy stored in the database itself via procedures, triggers, and constraints instead of the code
Information can be obtained using existing tools or SQL
IBM Blue Gene/P System Administration
DB2
Product Description Restricted license
Enterprise Server Edition (ESE)
Client Database Location
/dbhome/bgpsysdb Instances
bgpsysdb (server)
bgpdb2c (client)
IBM Blue Gene/P System Administration
DB2 concepts
SchemaThe collection of database objects such as tables, views, indexes, and triggers that define the database.
TablesA named data object that consists of a specific number of columns and some unordered rows.
ViewsA logical table that consists of data that a query generates.
IBM Blue Gene/P System Administration
DB2 Naming Guidelines for BG/P
Tables always start with TBGP, such as TBGPNodeCard, or TBGPLinkCard
Names are NOT case sensitive in SQL For each of the tables, there is a view that has the more user-friendly
columns, such as location, and without VPD These are named without the T, such as BGPNodeCard In cases where some information is omitted from the view, there is also an
extra view for diags, such as BGPNodeCardAll If there is no need for any derived columns in the view, or omitted
columns, then an alias is created i.e. BGPClockCard The net effect is that almost all the time, using the “BGP” name will show you
what you want If there is a history being kept, then _history is added to the end
IBM Blue Gene/P System Administration
BG/P Tables
TBGPBlock TBGPBPBlockMap TBGPSmallBlock TBGPLinkBlockMap TBGPProductType TBGPMachine TBGPMachineSubnet TBGPMidplane TBGPNodeCard TBGPNode TBGPServiceCard TBGPLinkCard TBGPClockCard TBGPBulkPowerSupply TBGPSwitch TBGPCable TBGPClockCable TBGPLinkChip TBGPICON TBGPFanModule TBGPJob TBGPEthGateway TBGPEGWMachineMap TBGPPortBlockMap TBGPBlockUsers TBGPMidplaneSubnet TBGPNodeSubnet TBGPServiceAction TBGPUserPrefs
TBGPReplacement_history TBGPMachine_history TBGPMidplane_history TBGPNodeCard_history TBGPNode_history TBGPServiceCard_history TBGPLinkCard_history TBGPClockCard_history TBGPLinkChip_history TBGPIcon_history TBGPFanModule_history TBGPJob_history TBGPServiceCardEnvironment TBGPFanEnvironment TBGPClockCardEnvironment TBGPBULKPOWEREnvironment TBGPNodeCardPOWEREnvironment TBGPLinkCardPOWEREnvironment TBGPSrvcCardPOWEREnvironment TBGPLinkChipEnvironment TBGPLinkCardEnvironment TBGPNodeEnvironment TBGPNodeCardEnvironment TBGPEventLog TBGPERRCodes TBGPDiagRuns TBGPDiagBlocks TBGPDiagResults TBGPDiagTests
IBM Blue Gene/P System Administration
BG/P Views
BGPMidplane BGPMidplaneAll BGPNodeCard BGPNodeCardAll BGPNode BGPNodeAll BGPServiceCard BGPServiceCardAll BGPLinkCard BGPLinkCardAll BGPClockCardAll BGPBulkPowerSupplyAllBGPLinkChip BGPLinkChipAllBGPFanModule BGPFanModuleAll BGPLink BGPClockCardEnvironmentBGPDiagTests
BGPNodeCardCountBGPLinkCardCountBGPServiceCardCountBGPNodeCountBGPBasePartitionBGPBPBlockStatusBGPSwitchLinksBGPLinkBlockStatusBGPSwitchPortBGPPortBlockStatusBGPBlockSize
IBM Blue Gene/P System Administration
Database setup
Database PopulateThis is a Perl script that populates the database with the expected configuration for the Blue Gene system.
InstallServiceActionVerifies that the predefined structure matches the actual configuration
VerifyCablesConfirms that the torus network cabling is correct
VerifyIpAddressesConfirms that the IO card IP addresses are correct
IBM Blue Gene/P System Administration
DB2/SQL examples
List all tables/viewslist tables
Describe table/viewdescribe table TBGPmidplane
Extracting dataselect * from TBGPmidplaneMore complex
select a.position,count(isionode), a.status, a.seqid
from tbgpnodecard a left outer join bgpnode b
on b.midplanepos = a.midplanepos and b.nodecardpos = a.position and b.isionode = 'T' and b.status <>'M'
where a.midplanepos = ‘R00-M0'
group by a.position,a.status,a.seqid
order by 1
IBM Blue Gene/P System Administration
Exercise
Logon to service node as bgpadmin db2 conect to bgdb0 user bgpsysdb List tables in the database List the serial numbers of the nodecards List only the compute cards
IBM Blue Gene/P System Administration
BGP RPMs
RPMs bgp_os bgpbase bgptoolchain
Directory tree /bgsys /bgsys/drivers/ppcfloor – symbolic link to current driver sw /bysys/drivers/ppcfloor/bin - binaries /bgsys/drivers/ppcfloor/bareMetal – service actions scripts
IBM Blue Gene/P System Administration
Site Specific Configuration
Templates are located in /bgsys/local/etc rc scripts UIDs and GIDs profiles
/etc/profile.d/bgp.sh
IBM Blue Gene/P System Administration
Shutdown
Run a service action on the clock cards in each rack:tertiary, secondary, primary clock cards
‘bgpmaster stop’ stop db2 Power down rack(s) Shutdown FEN Shutdown service node
IBM Blue Gene/P System Administration
Startup
Service node Front end node Power up racks ‘bgpmaster start’ End service actions on clock cards (primary, secondary,
tertiary) Verify all hardware is seen
IBM Blue Gene/P System Administration
Unexpected Power Outage
Power off all systems Power up and boot service node Power up and boot FEN Power up rack(s) ‘bgpmaster start’ Run install service action
IBM Blue Gene/P System Administration
Exercise
Shutdown and startup system Verify all is well