Top Banner
USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th , 2011
22
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

USCMS T2 Site Admin Toolkit

Samir CuryMTF Meeting – May 26th, 2011

Page 2: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

How it began

OSG All Hand Meeting 2010 Fermilab

Yearly T2 Workshop Gathering of site admins A lot of ideas/comments Some code – Scripts

Page 3: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

About site admins

Frontline of site management They have in a Daily basis :

Many requests

• Many issues Many workarounds

– What happen with these?

Relevant feedback for CMS Leak of features in existing software Leak of monitoring in existing systems

May lead to Blindly operating it

Is there always someone to listen? Thanks Monitoring Task Force!

Page 4: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

WorkaroundsFrom the past slide, this toolkit is all about that. Not always complaining is the best way

It may never be implemented Not everyone will see the benefits/cost

Different needs Not always developers think about all user/ops needs Scripts are done to cover these needs These scripts can give a different approach to the ops Monitoring tools focused in admin's needs.

Can improve response time / error/waste detection

» Example – GridFTP Spy» JobView / CPU Efficiency on T1's

Not essential, but normally saves some time.

Page 5: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

The goal What is really missing

– Official place for unofficial code

– People get encouraged to share

Call for tools Get the generic ones –> package into RPM Get the specific ones

Turn into generic, then package into RPM

Standard place (repository) Standard deploy procedure

If it's not quick, no one tries. → RPM's

Helping us to help ourselves.

Page 6: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

What it is Full documentation/reference available :

https://twiki.cern.ch/twiki/bin/view/CMS/SIteAdminToolkit

Where we document each tool included in the toolkit, future plans, etc.

A gathering of scripts, that may need some work to get it working

We also try to avoid that by having RPMs and all dependencies included – packages or in the repos.

A free-time-task for every involved person

We normally don't have schedules, but a plan. Shameless “coders” - that's what we need!

We don't care how “bad written” it is, as long as it works

Page 7: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

What certainly is not Something that is maintained by a lot of people

But some that contribute with tools A dependency-solver / packager (me)

Would appreciate some help

Something that will solve all the problems That is not the goal, just to put together specific

tools

Something that has “professional quality” Involved people are very capable, but

proportionaly time-constrained

Page 8: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

What we can learn “Sites” can also generate some useful code

They probably will do it for themselves, so don't expect High quality code Something that has not a lot of dependencies

Expect Tools that you can adapt for your site with little effort To contribute and make it better instead of complaining

“Sites” should be shameless enough to publish (and send us) tools they find useful.

Ken bloom gave me space for a contribution on a USCMS T2 support meeting so I could present the proposal, then, some tools showed up. (Thanks, Ken!)

T2 Coordinators could inform us when they see something useful in their support meetings, and also remind these sites that the toolkit is there

Page 9: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

What I did learn Since getting the script until the RPM gives more

work than I thought – many details, dependencies, etc...

We will live better if we have a step before this :

https://github.com/samircury/US-CMS-T2-Admin-Toolkit

People can download/edit from there, and is a shortcut for the ones that really want to spend some time understanding and deploying the tools that still don't have the RPM.

It helped me to patch Stale Data improving the CLI

Page 10: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

Tools we have right now

CondorView (Caltech) - RPM ready

GridFTP Spy (Caltech) – RPM ready

Condor4Web (UERJ) - RPM ready

Stale Data (Nebraska) – tested, needs packaging

Condor Extract Mail (Nebraska) – to be tested

Dcache tools (Wisconsin) – to be tested

Your tool here

Page 11: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

CondorView GUI for managing condor

List every single job Can list ALL classAds for a given job Can do what you see in the menu

Run from the cluster frontend Have the ability of SSH to the node, exactly into the running job temp dir

Run from the site's CE Have the ability of killing/releasing/restart jobs

Page 12: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

GridFTP Spy Shows in near real time active GridFTP transfers Very useful for link usage / server settings

optimizing Somewhat tricky to deploy

Needs a shared FS for harvesting logs How it does is reading the logs in real time and

gathering interesting info

Never tested it myself – testers are welcome!

Page 13: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

Condor4web

Real time batch system monitoring Visible from any corner of the world Your users like it

They know what's going on with their jobs, after the CE MC People like it

For the same reason.

Live demos :

http://monitor.hepgrid.uerj.br/condor/

http://www.cmsaf.mit.edu/condor4web/

If you don't use Condor, try JobView :

https://twiki.cern.ch/twiki/bin/viewauth/CMS/AnalysisOpsT2Monitoring

Page 14: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

Stale Data Looks like the (un)popularity data service

Shows which datasets people didn't run a single job against

Tested. Works fine, has a lot of dependencies which should be included in the RPM

date = 15-12-2010 , Starting Date = 01-12-2010

Getting json http://dashb-datasets.cern.ch/dashboard/request.py/inputCollectionsTable_JSON?collec_name=&sites=T2_BR_UERJ&date1=01-12-2010&date2=15-12-2010

Datasets idle since 01-12-2010

/JetMET/Run2010A-Dec4ReReco_v1/AOD , 2474.004614433 GB , Owned by AnalysisOps

/G2Jets_Pt-20to60_TuneZ2_7TeV-alpgen/Fall10-START38_V12-v1/AODSIM , 190.267690679 GB , Owned by top

/W2Jets_ptW-0to100_TuneZ2_7TeV-alpgen-tauola/Fall10-START38_V12-v1/GEN , 0.686380407 GB , Owned by DataOps

/QCD6Jets_Pt120to280-alpgen/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO , 42.528487201 GB , Owned by top

/W1Jets_ptW-800to1600_TuneD6T_7TeV-alpgen-tauola/Fall10-START38_V12-v1/AODSIM , 11.951159415 GB , Owned by top

(Suppressed)

Space taken by stale datasets = 408.164419749117 TB

Broken down by group:

tracker-dpg => 9.250565041201

top => 40.841314603557

AnalysisOps => 157.50586599848

undef => 15.736526476068

FacOps => 1.899973228744

b-tagging => 18.694190177731

local => 164.130428192715

DataOps => 0.105556030621

Page 15: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

“Condor Extract Mail”

Fetches from grid proxies in your CE's, mails from the users running jobs in your cluster

[root@red ~]# ~bbockelm/extract_email "Bockelman"

[email protected]

Page 16: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

What CMS can profit Better than the code, the ideas

Usability – you may find here potential features for existing real software

Adapt ideas or tools that diserve to CMS central monitoring like cmsweb

Gives an overview of site admin needs and what they would like to see in the software they use.

Some become patches – like Brian Bockelman's script The model / idea of a free software community is a

good example to follow – Small patches from many people turn small things into great ones. Share!

Page 17: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

Thanks all involved Ken Bloom, Michael Thomas – Initial effort to set up and make

everything public Authors that submitted tools :

Caltech – Michael Thomas CondorView GridFTP Spy

Nebraska – Carl Lundsted and Brian Bockelman Condor Extract Mail Stale Data

Wisconsin - Will dCache Tools

UERJ – Samir Condor4Web

Page 18: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

Feel free to send :

Tools Suggestions Help

But first, we recommend some (small) reading here :

https://twiki.cern.ch/twiki/bin/view/CMS/SIteAdminToolkit

Page 19: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

For the future

2 Trainees interested in help packaging @ UERJ

Migrate YUM Repos to CERN webservers Finish testing/package tools we already have.

Page 21: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

Recommended toolkit

http://datagrid.ucsd.edu/toolkit/

Page 22: USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

Thanks!