12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATOR www.eu-eela.org E-infrastructure shared between Europe and Latin America CE + WN installation and configuration Vanessa Hamar Universidad de Los Andes – Mérida, Venezuela 12 th EELA Tutorial Lima, 24-29 September,2007
24
Embed
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATOR E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATOR
www.eu-eela.org
E-infrastructure shared between Europe and Latin America
CE + WN installation and configuration Vanessa HamarUniversidad de Los Andes – Mérida, Venezuela12th EELA TutorialLima, 24-29 September,2007
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
Outline
• What is a Computing Element (CE) ?• What is a Torque Server ?• What is a Worker Node?• How to install and configure a Computing Element with
Torque Server.• How to install and configure a Worker Node with
Torque
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
What is CE?
• The CE is a service representing a computing resource.
• Its main functionality is job management
(job submission, job control, etc.).
• For job submission, the CE can work in:– push modelpush model (where the job is pushed to a CE for its execution).
– pull modelpull model (where the CE asks the WMS for jobs).
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
• TORQUETORQUE (Tera-scale Open-source Resource and QUEue management) is a resource management providing control over batch jobs and distribuited compute resource.
• The Torque System is composed by a:– pbs_serverpbs_server which provides the basic batch services
such as receiving/creating a batch job or protecting
the job against system crashes.– job_schedulerjob_scheduler which contains the site's policy used
to decide which job must be executed.– pbs_mompbs_mom which places the job into execution. It is also responsible
for returning the job’s output to the user.
What is Torque?
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
What is a Worker Node?
• The Worker Node (WN) is a set of clients required
to run jobs sent by the CE via the Local Resource
Management System. It currently includes the:
– gLite I/O Client, – the Logging and Bookkeeping Client, – the R-GMA Client and – the WMS Checkpointing library.
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
Installing CE + Torque Server
WN + Torque
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
Preliminary and common steps
• Start from an instalation of SLC 3.0.8• Install JAVA SDK• Remove LAM and Postfix• Check the hostname• Install and configure ntp daemon• Install X.509 host certificates /etc/grid-security and
check their file permissions.• Install the latest version of glite-yaim• Install the middleware
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
Installing pre-requisites
• JAVA is not included in distribution. Install it separately (>= 1.4.2_08)
• apt-get install j2sdk
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
Installing pre-requisites
• Depending on the packages set you selected when installing the operating system, it may be possible that lam package is installed on your WN. Please remove lam.
apt-get remove lam
• There is a known installation conflict between the 'torque-clients' rpm and the 'postfix' mail client (Savannah. bug #5509). If you are going to install Torque, uninstall postfix package
apt-get remove postfix
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
Installing pre-requisites
• Check the FQDN hostname
– Ensure that the hostnames of your machines are correctly set. Run the command:
hostname -f
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
Installing pre-requisites
• Syncronization among all gLite nodes is mandatory. Install ntp if not already available for your system:– apt-get install ntp
• Add your time server in /etc/ntp.conf– restrict <time_server_IP_address> mask 255.255.255.255 nomodify notrap
noquery – server <time_server_name> – (you can use ntp-1.infn.it – IP 193.206.144.10)
• Edit /etc/ntp/step-tickers adding your(s) time server(s) hostname• If you are running a firewall, you will have to allow inbound
comminication on the NTP port:– -A INPUT -s <NTP-serverIP-1> -p udp --dport 123 -j ACCEPT
• Activate the ntpd service with the following commands: ntpdate <your ntp server name> service ntpd start chkconfig ntpd on
– You can check ntpd’s status with:
ntpq -p
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
• Request host certificates for the CE to a CA–
• Copy host certificate (hostcert.pem and hostkey.pem) in /etc/grid-certificates.
• Change the permisions– chmod 644 hostcert.pem– chmod 400 hostkey.pem
Installing pre-requisites
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
Installing CE+Torque Server via apt
• All the configuration values to sites have to be configured in a site configuration file using key-value pairs.
• This file is shared among all the different gLite node types. So edit once and keep it in a safe place
• Create a copy of /opt/glite/yaim/examples/site-info.def template (coming from the glite-yaim-core package) to your reference directory for the installation (e.g. /root/siteinfo):– cp /opt/glite/yaim/examples/site-info.def /root/siteinfo/site-info.def
• A good syntax test for your site configuration file is to try to source it manually running the command:– source site-info.def
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
Installing CE+Torque Server via apt
• The configuration is stored in a directory structure which will be extended in the near future. Currently the following files are used: site-info.def and the vo.d directory.
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS – Lima, 24-29 September, 2007
E-infrastructure shared between Europe and Latin America
Installing CE+Torque Server via apt
• The /root/siteinfo/vo.d directory
• Each file name in this directory has to be the lower-cased version of e VO name defined in site-info.def. The matching file should contain the definitions for that VO and will overwrite the ones which are defined in site-info.def.