Top Banner
Upgrade D0 farm
34

Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Mar 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Upgrade D0 farm

Page 2: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Reasons for upgrade

• RedHat 7 needed for D0 software

• New versions of – ups/upd v4_6– fbsng v1_3f+p2_1– sam

• Use of farm for MC and analysis

• Integration in farm network

Page 3: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

MC production on farm

• Input: requests

• Request translated in mc_runjob macro

• Stages:1. mc_runjob on batch server (hoeve)

2. MC job on node

3. SAM store on file server (schuur)

Page 4: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

farm server file server

node

SAM DB

datastore

fbs(rcp,sam)

fbs(mcc)

mcc request

mcc input

mcc output

1.2 TB

40 GB

FNALSARA

control

data

metadata

fbs job:1 mcc2 rcp3 sam

100 cpu’s

Page 5: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

farm server file server

node

SAM DB

datastore

fbs(rcp[,sam])

fbs(mcc)

mcc request

mcc input

mcc output

1.2 TB

40 GB

FNALSARA

control

data

metadata

fbs job:1 mcc2 rcp

100 cpu’s

cron:sam

Page 6: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

fbsuser:cpfbsuser:mcc

fbsuser: rcp

willem:sam

hoeve node schuur

fbsuser:mc_runjob

fbs submit

fbs submit

data

control

cron

Page 7: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

SECTION mcc EXEC=/d0gstar/curr/minbias-02073214824/batch NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/curr/minbias-02073214824/stdout STDERR=/d0gstar/curr/minbias-02073214824/stdoutSECTION rcp EXEC=/d0gstar/curr/minbias-02073214824/batch_rcp NUMPROC=1 QUEUE=IOQ DEPEND=done(mcc) STDOUT=/d0gstar/curr/minbias-02073214824/stdout_rcp STDERR=/d0gstar/curr/minbias-02073214824/stdout_rcp

Page 8: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

#!/bin/sh

. /usr/products/etc/setups.shcd /d0gstar/mcc/mcc-dist. mcc_dist_setup.sh

mkdir -p /data/curr/minbias-02073214824cd /data/curr/minbias-02073214824cp -r /d0gstar/curr/minbias-02073214824/* .touch /d0gstar/curr/minbias-02073214824/.`uname -n`sh minbias-02073214824.sh `pwd` > logtouch /d0gstar/curr/minbias-02073214824/`uname -n`/d0gstar/bin/check minbias-02073214824

#!/bin/shi=minbias-02073214824if [ -f /d0gstar/curr/$i/OK ];thenmkdir -p /data/disk2/sam_cache/$icd /data/disk2/sam_cache/$inode=`ls /d0gstar/curr/$i/node*`node=`basename $node`job=`echo $i | awk '{print substr($0,length-8,9)}'`rcp -pr $node:/data/dest/d0reco/reco*${job}* .rcp -pr $node:/data/dest/reco_analyze/rAtpl*${job}* .rcp -pr $node:/data/curr/$i/Metadata/*.params .rcp -pr $node:/data/curr/$i/Metadata/*.py .rsh -n $node rm -rf /data/curr/$irsh -n $node rm -rf /data/dest/*/*${job}*touch /d0gstar/curr/$i/RCPfi

batchruns on node

batch_rcpruns on schuur

Page 9: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

#!/bin/shlocate(){file=`grep "import =" import_${1}_${job}.py | awk -F \" '{print $2}'`sam locate $file | fgrep -q [return $?}. /usr/products/etc/setups.shsetup samSAM_STATION=hoeveexport SAM_STATION

tosam=$1LIST=`cat $tosam`

for job in $LISTdo cd /data/disk2/sam_cache/${job} list='gen d0g sim' for i in $list do until locate $i || (sam declare import_${i}_${job}.py && locate ${i}) do sleep 60; done done

list='reco recoanalyze' for i in $list do sam store --descrip=import_${i}_${job}.py --source=`pwd` return=$? echo Return code sam store $returndonedoneecho Job finished ...

declare gen, d0g, sim

store reco, recoanalyze

runs on schuurcalled by fbs or cron

Page 10: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Filestream

• Fetch input from sam

• Read input file from schuur

• Process data on node

• Copy output to schuur

Page 11: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

rcp

d0exe

rcp

sam

hoeve node schuur

mc_runjob

fbs submit

fbs submit

data

control

cron

attach filestream

Page 12: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Analysis on farm

• Stages:– Read files from sam– Copy files to node(s)– Perform analysis on node– Copy files to file server– Store files in sam

Page 13: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

farm server file server

node

SAM DB

datastore

1.2 TB

40 GB

FNALSARA

control (fbs)

data

metadata

100 cpu’s

1. sam + rcp2. analyze3. rcp + sam

fbs(1), fbs(3)

fbs(2)

Page 14: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

triviaal node-2

fbsuser:rcp

fbsuser:rcp

fbsuser:

analysisprogram

willem:sam

willem:sam

input

output

Page 15: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

SECTION sam EXEC=/home/willem/batch_sam NUMPROC=1 QUEUE=IOQ STDOUT=/home/willem/stdout STDERR=/home/willem/stdout

#!/bin/sh

. /usr/products/etc/setups.shsetup samSAM_STATION=triviaalexport SAM_STATION

sam run project get_file.py --interactive > log

/usr/bin/rsh -n -l fbsuser triviaal rcp -r /stage/triviaal/sam_cache/boo node-2:/data/test >> log

batch.jdf

batch_sam

Page 16: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

farm server file server

node

SAM DB

datastore

1.2 TB

40 GB

FNALSARA

control (fbs)

data

metadata

100 cpu’s

1. sam2. rcp + analyze + rcp3. rcp + sam

fbs(1), fbs(3)

fbs(2)

Page 17: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

triviaal node-2

fbsuser:rcpanalysisprogram

rcp

willem:sam

willem:sam

input

output

fbsuser:fbs submit

Page 18: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

SECTION sam EXEC=/d0gstar/batch_node NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/stdout STDERR=/d0gstar/stdout

#!/bin/shuname -adate

rsh -l fbsuser triviaal fbs submit ~willem/batch_node.jdf

Page 19: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

#!/bin/sh. /usr/products/etc/setups.shsetup fbsngsetup samSAM_STATION=triviaalexport SAM_STATIONsam run project get_file.py --interactive > log/usr/bin/rsh -n -l fbsuser triviaal fbs submit /home/willem/batch_node.jdf

SECTION sam EXEC=/home/willem/batch NUMPROC=1 QUEUE=IOQ STDOUT=/home/willem/stdout STDERR=/home/willem/stdout

SECTION ana EXEC=/d0gstar/batch_node NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/stdout STDERR=/d0gstar/stdout

#!/bin/shrcp -pr server:/stage/triviaal/sam_cache/boo /data/test. /d0/fnal/ups/etc/setups.shsetup root -q KCC_4_0:exception:opt:threadsetup kailibroot -b -q /d0gstar/test.C

{gSystem->cd("/data/test/boo");gSystem->Exec("pwd");gSystem->Exec("ls -l");}

Page 20: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

## This file sets up and runs a SAM project.#import os, sys, string, time, signalfrom re import *from globals import *import run_projectfrom commands import *########################################### Set the following variables to appropriate values

# Consult database for valid choicessam_station = "triviaal"

# Consult Database for valid choicesproject_definition = "op_moriond_p1014"

# A particular snapshot version, last or newsnapshot_version = 'new'

# Consult database for valid choicesappname = "test"version = "1"group = "test"

# The maximum number of files to get from sammax_file_amt = 5

# for additional debug info use "--verbose"#verbosity = "--verbose"verbosity = ""

# Give up on all exceptionsgive_up = 1

def file_ready(filename): # Replace this python subroutine with whatever # you want to do # to process the file that was retrieved. # This function will only be called in the event of # a successful delivery. print "File ",filename," has been delivered!"# os.system('cp '+filename+' /stage/triviaal/sam') return

get_file.py

Page 21: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Disk partitioning hoeve

/d0

/fnal

/d0dist /d0usr

/mcc

/mcc-dist /mc_runjob /curr/ups

/db /etc /prd

/fnal -> /d0/fnal/d0usr -> /fnal/d0usr/d0dist -> /fnal/d0dist/usr/products -> /fnal/ups

/fbsng

Page 22: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

ana_runjob

• Is analogous to mc_runjob

• Creates and submits analysis jobs

• Input– get_file.py with SAM project name

• Project defines files to be processed

– analysis script

Page 23: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Integration with grid (1)

• At present separate clusters:– D0, LHCb, Alice, DAS cluster

• hoeve and schuur in farm network

Page 24: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Present network layout

hoeve schuur

switch

node node node

router

hefnet

surfnet

ajax

NFS

Page 25: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

New network layout

farmrouter

switch switch switch

D0LHCb

hefnet

lambda

hoeve schuur

alice

ajax

NFS

booder

Page 26: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

New network layout

farmrouter

switch switch switch

D0LHCb

hefnet

lambda

hoeve schuur

alice

ajax

NFS

booder

das-2

Page 27: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Server tasks

• hoeve– software server– farm server

• schuur– fileserver– sam node

• booder– home directory server– in backup scheme

Page 28: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Integration with grid (2)

• Replace fbs with pbs or condor– pbs on Alice and LHCb nodes– condor on das cluster

• Use EDG installation tool LCGF– Install d0 software with rpm

• Problem with sam (uses ups/upd)

Page 29: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Integration with grid (3)

• Package mcc in rpm

• Separate programs from working space

• Use cfg commands to steer mc_runjob

• Find better place for card files

• Input structure now created on node

Page 30: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Grid job

#!/bin/sh

macro=$1

pwd=`pwd`

cd /opt/fnal/d0/mcc/mcc-dist. mcc_dist_setup.sh

cd $pwddir=/opt/fnal/d0/mcc/mc_runjob/py_scriptpython $dir/Linker.py script=$macro

[willem@tbn09 willem]$ cat test.pbs# PBS batch job script

#PBS -o /home/willem/out#PBS -e /home/willem/err#PBS -l nodes=1

# Changing to directory as requested by user

cd /home/willem

# Executing job as requested by user

./submit minbias.macro

PBS job submit

Page 31: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

RunJob class for gridclass RunJob_farm(RunJob_batch) : def __init__(self,name=None) : RunJob_batch.__init__(self,name) self.myType="runjob_farm"

def Run(self) : self.jobname = self.linker.CurrentJob() self.jobnaam = string.splitfields(self.jobname,'/')[-1] comm = 'chmod +x ' + self.jobname commands.getoutput(comm) if self.tdconf['RunOption'] == 'RunInBackground' : RunJob_batch.Run(self) else : bq = self.tdconf['BatchQueue'] dirn = os.path.dirname(self.jobname) print dirn comm = 'cd ' + dirn + '; sh ' + self.jobnaam + ' `pwd` >& stdout' print comm runcommand(comm)

Page 32: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

To be decided

• Location of minimum bias files

• Location of MC output

Page 33: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Job status

• Job status is recorded in– fbs– /d0/mcc/curr/<job_name>– /data/mcc/curr/<job_name>

Page 34: Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

SAM servers

• On master node:– station– fss

• On master and worker nodes:– stager– bbftp