Top Banner
CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration
14

CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

Jan 21, 2016

Download

Documents

Albert Dorsey
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

CE: compute element

TP: CE & WNCompute Element Worker Node

Installation configuration

Page 2: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

CE presentation

The Computing Element is the central service of a site.

• Its main functionally are:

– manage the jobs (job submission, job control)

– update to WMS the status of the jobs

– publish all site informations (about site, queue, number of total,free CPUs)

• It can run several kinds of batch system:

–Torque + MAUI

– LSF

– Condor

Page 3: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

TORQUE server presentation

• The Torque server is composed by a:

– pbs_server pbs_server which provides the basic batch services such as

receiving/creating a batch job.

• The Torque client is composed by a:

– pbs_mompbs_mom which places the job into execution. It is also responsible for

returning the job’s output to the user

• The MAUI system is composed by a:

– job_schedulerjob_scheduler which contains the site's policy to decide which job must be

executed.

Page 4: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

CE: site-info.def variables (1)

Main variables of the site configuration file for the CE :CE_HOST=ce1.$MY_DOMAIN

# Jobmanager specific settings

JOB_MANAGER=lcgpbs

CE_BATCH_SYS=torque

BATCH_BIN_DIR=/usr/bin

BATCH_VERSION=torque-1.0.1b

BATCH_LOG_DIR=/var/spool/pbs/server_priv/accounting

# Architecture and enviroment specific settings

CE_CPU_MODEL=PIV

CE_CPU_VENDOR=intel

CE_CPU_SPEED=1001

CE_OS="Scientific Linux SL"

CE_OS_RELEASE="SL"

CE_OS_VERSION=3.0.5

CE_MINPHYSMEM=1024

Page 5: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

CE : site-info.def variables (2)

CE_MINVIRTMEM=2048

CE_SMPSIZE=1

CE_SI00=381

CE_SF00=0

CE_OUTBOUNDIP=TRUE

CE_INBOUNDIP=FALSE

CE_RUNTIMEENV=" LCG-2 LCG-2_1_0 … GLITE-3_0_0 R-GMA "

# TORQUE - Change this if your torque server is not on the CE

TORQUE_SERVER=$CE_HOST

Worker Node list defined for the site “private.griprototype” :

WN_LIST=/opt/glite/yaim/travail/wn-list.conf

ce1.private.gridprototype

se1.private.gridprototype

Page 6: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

WN: worker node & Torque client presentation

The Torque client is composed by a:

pbs_mompbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user

The Worker Node is a service where the jobs run. Its main functionally are:

execute the jobsupdate to Computing Element the status of the jobs

It can run several kinds of client batch system:TorqueLSF

Page 7: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

CE certification:

cd /etc/grid-security/

ln -s ce1.private.gridprototype.crt hostcert.pem

ln -s ce1.private.gridprototype.key hostkey.pem

chmod 644 hostcert.pem

chmod 400 hostkey.pem

For the CE1 machine, certificates are files named :

ce1.private.gridprototype.crt

ce1.private.gridprototype.key

Certificates installation in /etc/grid-security directory on CE

Get certificates from the BEINGRID CA Certification Authority:

http://voms.beingrid.fr.cgg.com/ca/

backup the certificate as a <host>.p12 file and extract public and private keys

openssl pkcs12 –nocert –in ce1.p12 –out ce1….cert

openssl pkcs12 –nocert –in ce1.p12 –out ce1….key

Page 8: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

List of mandatory configuration files :

the WN list defined for the site “private.griprototype” :

WN_LIST=/opt/glite/yaim/travail/wn-list.conf

the mapped-users list defined for the site “private.griprototype” :

/opt/glite/yaim/travail/users.conf

the mapped-groups list defined for the site “private.griprototype” :

/opt/glite/yaim/travail/groups.conf

Page 9: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

CE installation and configuration

gLite-yaim generic command:

install_node site-info.def lcg-CE_torque glite-WN

The CE is a certified machine, install certificates in the

directory /etc/grid-security/

configure_node site-info.def CE_torque WN_torque BDII_site

Page 10: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

CE publication test

The CE should publish information to the BDII:

lcg-infosites --vo egeode ce valor del bdii: rb1.private.gridprototype:2170

#CPU Free Total Jobs Running Waiting ComputingElement

-------------------------------------------------------

2 2 0 0 0

ce1.private.gridprototype:2119/jobmanager-lcgpbs-egeode

The CE should publish status of jobs queues: As egeode005 user locally, it should match the WN list defined in /opt/glite/…/wn-list.conf

pbsnodes -a se1.private.gridprototype ce1.private.gridprototype

state = free state = free

np = 1 np = 1

properties = lcgpro properties = lcgpro

ntype = cluster etc… ntype = cluster etc…

Page 11: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

Local job submission on the CE

To be able to submit jobs locally the user must be mapped egeode005 user on

the new installed CE machine.

cat test.sh #!/bin/sh

/bin/hostname

/bin/sleep 300

qsub -q egeode test.sh 35.ce1.private.gridprototype

qstat -a ce1.private.gridprototype:

Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time

--------------- -------- -------- ---------- ------ --- --- ------ -----

35.ce1.private. egeode00 egeode test.sh 11239 -- -- -- 48:00 R

Page 12: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

UI/GUI JAVA graphical interface

commands : edj-wl-ui-jobmonitor.sh edj-wl-ui-jdleditor.sh …

Page 13: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

CE Torque/Maui documentation

TORQUE ADMIN GUIDE http://www.clusterresources.com/wiki/doku.php?id=torque:torque_wiki

MAUI ADMIN GUIDE http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml

Page 14: CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

Sample Image

Questions on the CE ?