www.epikh.eu The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN+siteBDII Installation and configuration Bouchra RAHIM([email protected]) Africa 6 2010 - Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators Rabat, 01.06.2011
The EPIKH Project. (Exchange Programme to advance e-Infrastructure Know-How). CE+WN+siteBDII Installation and configuration. Bouchra RAHIM([email protected]) Africa 6 2010 - Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators Rabat, 01.06.2011. www.epikh.eu. Outline. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
www.epikh.eu
The EPIKH Project(Exchange Programme to advance e-Infrastructure Know-How)
• Computing Element overview• Worker Node overview• CE CREAM overview• gLite stack overview• gLite CE siteBDII• gLite CE cream and WN
3
gLite stack overview
4
gLite overview
worker node
5
glite overview• User Interface: it’s the point of access for users to
glite grid services• WMS: it’s the component that optimize resource
usage.• CE: the machine who manage worker nodes• WN: the machines who actually execute applications• SE: machines where files are stored• LFC: used to “find” files on the grid• BDII: services responsible to publish all info of your
sites• Logging and Bookkeping: as it’s name says it’s a
logger and alert user when job is finisched
6
Computing Element Overview
• Computing Element provides some of main services of a site.
• Main functionalities:– job management (job submission, job control)– job status updated for WMS– Communicate with BDII site that publishes all information
regarding the computing element
• It can runs several kinds of batch system:– Torque + MAUI– LSF– SGE– Condor
7
Torque + MAUI
• Torque server service:– pbs_server provides basic batch services such as
receiving/creating a batch job.
• Torque client service:– psb_mom places jobs into execution. It’s is also
responsible for returning job’s output to the user.
• MAUI system service:– job_scheduler contains site’s policy to decide which job is
going to be executed and when.
8
Site BDII*
• By default it was installed on CE but now it’s better to install it on a dedicated server, physical or virtual.
• It collect all site GRISes* (for example SE,RB,LFC,etc...)
• Service is named bdii
• Log file: /opt/bdii/var/bdii.log
• *BDII = Berkeley Database Information Index• **GRIS = Grid Resouce Information Service
9
Worker Node Element Overview
• They are machines which really execute your job.
• User can only access their services by a Computing Element.
• Their characteristics are collected by Computing Element that publishes all information by BDII services
• Computing Resource Execution And Management
• Accept job submission requests belonging from a WMS and other job management request.
• It exposes a web services interface
10
CE Cream overview
11
Requirements
• Three or more machine:– One will be used to perform CE installation;– One will be used to perform site BDII installation;– Others will be used to perform WN installation;
• Architecture: 64 bit• Operating System: Scientific Linux 5• Two machines with a public ip address, direct and
reverse address resolution on a DNS (CE and BDII ) • The CE machine must be equipped with an X509
certificate
1212
BDII Installation)
13
Preparing the Linux machine
• Network Time Protocol settings
# yum install ntp• Copy the ntp.conf file and the ntp directory from
ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp)• Synchronize the date
# /etc/init.d/ntpd stop# ntpdate ntp.marwan.ma
# /etc/init.d/ntpd start# chkconfig ntpd on
• Start the ntpd service and configure it to start on boot
Yaim Configuration• All the configuration samples files are located in /opt/glite/yaim/examples/siteinfo directory
• it’s better to make a copy of the original files
18
Yaim Configuration• You can find some template files in : ftp://repo.magrid.ma/pub/CE_WN_BDII/• Edit the site-info.def file and change the following variables:
– SITE_NAME=MA-ZZ-School (Name of the site)– CE_HOST=pcXX.magrid.ma (XX the machine that will be a CE)– SITE_BDII_HOST=pcYY.magrid.ma(the current machine)
• Edit the services/glite-bdii_site file and change the following variables:– SITE_NAME=MA-ZZ-School– SITE_DESC="MA-ZZ-School"
• Disable Selinux: make sure /etc/selinux/config contains line:
SELINUX=disabled
# /etc/init.d/iptables stop# chkconfig iptables off
• Stop iptables
• Please check If you have a valid hostname
#hostname –f# cat /etc/hosts
Preparing the Linux machine
• Reboot
23
Repository set up-CE
• Add to system repository ones specific for middleware to install
# cd /etc/yum.repos.d/# mv dag.repo dag.repo.stopexport MREPO=http://repo.magrid.ma/yumrepo/glite32
# REPO="dag lcg-CA glite-CREAM glite-TORQUE_server glite-TORQUE_utils"# for name in $REPOS;do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done
24
package installation-CE
• Use yum to install needed packets# yum clean all # yum install lcg-CA ca-policy-egi-core ca-policy-lcg# yum install glite-CREAM# yum install glite-TORQUE_server glite-TORQUE_utils
• Due to a dependency problem within the Tomcat distribution in SL5 first install xml-commons-apis:
yum install xml-commons-apis
25
Before configuration-HostCertificates• Some preliminary steps before configuration:
YAIM configuration-CE• Main file to edit is site-info.def, where you specify some
general settings and other component’s parameters (CE Cream)
• Other file to be edited are: wn-list.conf, users.conf,groups.conf, services/glite-creamce
• Set variables with corrected values replacing example ones.
# vi services/glite-creamceCEMON_HOST=pcXX.$MY_DOMAINCREAM_DB_USER=eumedCREAM_DB_PASSWORD=grid2011BLPARSER_HOST=pcXX.$MY_DOMAIN
27
YAIM configuration-CE
# vi wn-list.conf pcAA.magrid.ma pcBB.magrid.ma
Declare the worker nodes in wn-list.conf
28
YAIM configuration-CECE_HOST=pcYY.magrid.maCE_CPU_MODEL=XEON #cat /proc/cpuinfoCE_CPU_VENDOR=IntelCE_CPU_SPEED=2230CE_OS=ScientificSL CE_OS_RELEASE=5.5 #cat /etc/redhat-releaseCE_OS_VERSION="Boron"CE_OS_ARCH=x86_64CE_MINPHYSMEM=512 #cat /proc/meminfo on WNCE_MINVIRTMEM=512 CE_PHYSCPU=1 #total cpu in site CE_LOGCPU=4 CE_SMPSIZE=4CE_OUTBOUNDIP=TRUECE_INBOUNDIP=FALSECE_OTHERDESCR="Cores=4,Benchmark=6.5-HEP-SPEC06”
• For example if you have an Intel XEON 5520 2.23 GHz with no Hyper Threading will find in the table of previous link a value of 95 and a conversion factor of 1HS06=40 so:
WN - YAIM Configuration• You can use same configuration file edited on CE:
- this can be done on all worker node of a site;
- so you don’t neet to re-edit anything!
• Copy configuration files from CE machine using scp command:mkdir /opt/glite/yaim/etc/siteinfo/
mkdir /opt/glite/yaim/etc/siteinfo/services
#Copy the following files site-info.def ,users.conf,groups.conf and wn-list.conf from ceroot@pcYY:/opt/glite/yaim/etc/siteinfo/site-info.def#copy the glite-wn from examples/services
Tests on CE• SSH access to CE to test if CE can see WN and to test if all main
service are up & running
# pbsnodes # /etc/init.d/gLite status
43
Tests on CE
• SSH access to CE and then become a gilda user:
# su – eumed001
$ vi test.sh#!/bin/sh sleep 20 #(it's useful to see the job status) hostname
• Create a file and add the following:
• Set right permission to be executable:
$ chmod 700 test.sh
44
Tests on CE
• Launch job locally on CE
$ qsub –q eumed test.sh
• Then check list of job in execution on CE
$ qstat –a
ce.localdomain: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - ----0.pc22.magrid.ma eumed001 short test.sh 5839 -- -- -- 00:15 R --
• In case you want to abort a job execution:
$ qdel 3 #that is jobid
• In case you want to more info:
$ qstat -f 3
45
Tests on CE
• If typing “qstat -a” command you didn’t get no output, no jobs are being executed on CE and this means your previous job terminated so now you can list output.
Enter GRID pass phrase: [grid2011]$ voms-proxy-init --voms eumedpassword[grid2011]#glite-ce-job-submit –r pc22.magrid.ma:8443/cream-pbs-eumed –o ID hostname-cream.jdl#glite-ce-job-status –i ID
48
Troubleshooting
• Which logs are supposed to be open if something goes wrong?:–/var/log/message, for general errors–/opt/glite/var/log (especially glite-
ce-cream.log)–/var/spool/pbs/server_priv/
accounting/<data>, if even local submission on batch system doesn’t work.