Top Banner
1 Part III: PROOF Marco Meoni - CERN Jan Fiete Grosse- Oetringhaus – CERN Andrei Gheata - CERN V3.1 – 19.02.10
44

Part III: PROOF

Jan 19, 2016

Download

Documents

YAKOV

Part III: PROOF. Marco Meoni - CERN Jan Fiete Grosse-Oetringhaus – CERN Andrei Gheata - CERN V 3.1 – 19.02.10. PROOF. P arallel ROO T F acility Interactive parallel analysis on a local cluster Parallel processing of (local) data (trivial parallelism) Fast Feedback - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Part III: PROOF

1

Part III: PROOF

Marco Meoni - CERN

Jan Fiete Grosse-Oetringhaus – CERN

Andrei Gheata - CERN

V3.1 – 19.02.10

Page 2: Part III: PROOF

2

PROOF

Parallel ROOT FacilityInteractive parallel analysis on a local cluster

Parallel processing of (local) data (trivial parallelism)Fast FeedbackOutput handling with direct visualizationNot a batch system

PROOF itself is not related to GridCan access Grid files

The usage of PROOF is transparentThe same code can be run locally and in a PROOF system (certain rules have to be followed)

PROOF is part of ROOT

Page 3: Part III: PROOF

3

root

Remote PROOF Cluster

Data

root

root

root

Client – Local PC

ana.C

stdout/result

node1

node2

node3

node4

ana.C

root

PROOF Schema

Data

Proof masterProof slave

Result

Data

Result

Data

Result

Result

Page 4: Part III: PROOF

4

Event based (trivial) Parallelism

Page 5: Part III: PROOF

5

Terminology

ClientYour machine running a ROOT session that is connected to a PROOF master

MasterPROOF machine coordinating work between slaves

Slave/WorkerPROOF machine that processes data

QueryA job submitted from the client to the PROOF system.A query consists of a selector and a chain

SelectorA class containing the analysis codeIn ALICE we use the Analysis Framework, therefore a AliAnalysisTask is sufficient

ChainA list of files (trees) to process (more details later)

Page 6: Part III: PROOF

6

How to use PROOF

The analysis framework is usedFiles to be analyzed are put into a chain TChainAnalysis written as a task (already introduced in previous tutorial) AliAnalysisTaskSEThe same analysis like written previously can be used

If additional libraries are needed, these have to be distributed as a "package"

Analysis(AliAnalysisTaskSE)

Input Files(TChain)

Output

Page 7: Part III: PROOF

7

once on your client

once on each slave

for each tree

for each event

Classes derived from AliAnalysisTaskSE can run locally, in PROOF and in AliEn

"Constructor"

UserCreateOutputObjects()

ConnectInputData()

UserExec()

Terminate()

AliAnalysisTaskSE

Page 8: Part III: PROOF

8

Class TTree

A tree is a container for data storageIt consists of several branches

These can be in one or several filesBranches are stored contiguously (split mode)When reading a tree, certain branches can be switched off speed up of analysis when not all data is needed

Set of helper functions to visualize content(e.g. Draw, Scan)Compressed

Tree

Bra

nc

h

Bra

nc

h

Bra

nc

h

point

x

y

z

x x x x x x x x x x

y y y y y y y y y y

z z z z z z z z z z

Branches File

Page 9: Part III: PROOF

9

TChain

A chain is a list of trees (in several files)

Normal TTree functions can be used

Draw(...), Scan(...) these iterate over all elements of

the chain

Chain

Tree1 (File1)

Tree2 (File2)

Tree3 (File3)

Tree4 (File3)

Tree5 (File4)

Page 10: Part III: PROOF

10

Merging

The analysis runs on several slaves, therefore partial results have to be mergedObjects are identified by nameStandard merging implementation for histograms availableOther classes need to implement Merge(TCollection*)When no merging function is available all the individual objects are returned

Result fromSlave 1

Result fromSlave 2

Final result

Merge()

Page 11: Part III: PROOF

11

Chain

Tree1 (File1)

Tree2 (File2)

Tree3 (File3)

Tree4 (File3)

Tree5 (File4)

Workflow Summary

Analysis(AliAnalysisTask)

Input

proof

proof

proof

Page 12: Part III: PROOF

12

Workflow Summary

Analysis(AliAnalysisTask)

proof

proof

proof

Output

Output

Output MergedOutput

Page 13: Part III: PROOF

13

Packages

PAR files: PROOF ARchive. Like Java jar

Gzipped tar filePROOF-INF directory

• BUILD.sh, building the package, executed per slave

• SETUP.C, set environment, load libraries, executed per slave

API to manage and activate packagesUploadPackage("package")EnablePackage("package")

Page 14: Part III: PROOF

14

CERN Analysis Facility

The CERN Analysis Facility (CAF) will run PROOF for ALICE

Prompt analysis of pp dataPilot analysis of PbPb dataCalibration & Alignment

Available to the whole collaboration but the number of users will be limited for efficiency reasons

Design goals500 CPUs100 TB of selected data locally available

Page 15: Part III: PROOF

15

Evaluation of PROOF

CAF1 since May 200640 machines, 2 CPUs each, 200 GB disk

CAF2 since Oct 200814 machines, 8 cores each, 2.33 TB disk

Tests performedUsability testsSpeedup plotEvaluation of different query typesEvaluation of the system when running a combination of query types

Goal: Realistic simulation of users using the system

Page 16: Part III: PROOF

16

Hands-On

Getting ready...

Run a task that accesses ESDLocallyPROOFModify it...

Run a task that accesses MCPROOF

Reading log files, resetting session, etc.

What about “interactive” grid ?AliEn plug-in hands on

Page 17: Part III: PROOF

17

Warm up

Log into LXPLUS with your accountPreconditions

Use bash shell (type “bash”)Grid certificate (usercert.pem/userkey.pem) in ~/.globus

Howto: convert from .p12 to .pemopenssl pkcs12 -clcerts -nokeys -out usercert.pem -in cert.p12openssl pkcs12 -nocerts -out userkey.pem -in cert.p12

On the tutorial page, save “Files for the PROOF tutorial (tgz)” to your home dir and extract it

Set up environmentExecute the commandsource /afs/cern.ch/alice/caf/caf-lxplus.sh –alien v4-17-Release

You will be prompted for your certificate password

Check ROOTStart it. Does it show ROOT version 5.24/00?

NOT NEEDED FOR THIS SESSION

Page 18: Part III: PROOF

18

Files to be used

CreateESDChain.CCreates a chain from a list of file namesESD_LHC08b1.txtList of PDC08 files (First physics pp, Pythia6, 5kG, 10TeV) distributed on the CAFAF-v4-19-04-AN.parPar archive for PDC10 data and analysis frameworkAliAnalysisTaskPt.{cxx,h}Task that creates an uncorrected pT spectrum from ESD tracksAliAnalysisTaskPtMC.{cxx,h}Task that creates an pT spectrum from the MC particles

Page 19: Part III: PROOF

19

Run a task locally

Start ROOTTry the following lines and once they work add them to a macro run.C (enclose in {})Load needed libraries

gSystem->Load("libTree.so"); gSystem->Load("libGeom.so"); gSystem->Load("libVMC.so"); gSystem->Load("libPhysics.so"); gSystem->Load("libSTEERBase.so"); gSystem->Load("libESD.so"); gSystem->Load("libAOD.so"); gSystem->Load("libANALYSIS.so"); gSystem->Load("libANALYSISalice.so");

Add the AliRoot include path (only needed for local case)

gROOT->ProcessLine(".include $ALICE_ROOT/include");

Page 20: Part III: PROOF

20

Run a task locally (2)

Create the analysis managermgr = new AliAnalysisManager(“testAnalysis");

Create the analysis task and add it to the manager

gROOT->LoadMacro("AliAnalysisTaskPt.cxx++g");

• "+" means compile; "g" means debugtask = new AliAnalysisTaskPt(“TaskPt”);mgr->AddTask(task);

Add the ESD handler (to access the ESD)esdH = new AliESDInputHandler;mgr->SetInputEventHandler(esdH);

Add the lines to the macro run.C

Page 21: Part III: PROOF

21

Run a task locally (3)

Create a chaingROOT->LoadMacro(“$ALICE_ROOT/PWG0/CreateESDChain.C");chain = CreateESDChain(“files.txt", 10);

Attach the input (the chain)cInput = mgr->GetCommonInputContainer();mgr->ConnectInput(task, 0, cInput);

Create a place for the output (a histogram: TH1)

cOutput = mgr->CreateContainer("cOutput", TList::Class(), AliAnalysisManager::kOutputContainer, "Pt.root");mgr->ConnectOutput(task, 1, cOutput);

Enable debug (optional)mgr->SetDebugLevel(2);

Add the lines to the macro run.C

Page 22: Part III: PROOF

22

Run a task locally (4)

Initialize the managermgr->InitAnalysis();

Print the status (optional)mgr->PrintStatus();

Run the analysismgr->StartAnalysis("local"

, chain);

Add the lines to the macro run.CAfter running look at the output and check the content of the file Pt.root

Page 23: Part III: PROOF

23

run.C

Page 24: Part III: PROOF

24

Package Management

Connecting to the PROOF clustergEnv->SetValue("XSec.GSI.DelegProxy", "2");TProof::Open(”alicecaf");

Managing packagesUpload (= copy to the cluster)

• gProof->UploadPackage(“AF-v4-19-04-AN");Enable (= compile)

• gProof->EnablePackage("AF-v4-19-04-AN");Clean (= remove)

• gProof->ClearPackage("AF-v4-19-04-AN");• Known issue on AFS: Removal may fail. Try again

after few seconds…Clean all (in case some libraries are messed up)

• gProof->ClearPackages();

Page 25: Part III: PROOF

25 25

PROOF datasets

A dataset represents a list of files (e.g. physics run X)

Correspondence between AliEn collection and PROOF dataset

Users register datasetsThe files contained in a dataset are automatically staged from AliEn (and kept available)Datasets are used for processing with PROOF

• Contain all relevant information to start processing (location of files, abstract description of content of files)

Datasets are public for reading, common datasets are available (for data of common interest)Learn about dataset at

http://aliceinfo/Offline/Activities/Analysis/CAF

Page 26: Part III: PROOF

26

Running a task in PROOFCopy run.C to runProof.C

Add connecting to the clustergEnv->SetValue("XSec.GSI.DelegProxy", "2");

TProof::Open(”alicecaf");

Replace the loading of the libraries with uploading the packages

gProof->UploadPackage("AF-v4-19-04-AN");

gProof->EnablePackage("AF-v4-19-04-AN");

Replace the loading of the task withgProof->Load("AliAnalysisTaskPt.cxx++g");

Replace in StartAnalysis"local" with "proof”

The chain with dataset “/COMMON/COMMON/LHC09d10_run10482X”(more on dataset on next slide)

Add only 100000 entries to be processed As last parameter of StartAnalysis()

Run it!

20 files

1850 files

Page 27: Part III: PROOF

27

runProof.C

Page 28: Part III: PROOF

28

Progress dialog

Query statistics

Abort query andview resultsup to now

Abort query anddiscard results

Show logfiles

Show processing rate

Page 29: Part III: PROOF

29

Looking at the task

ConstructorCalled once when the task is createdInput/Output is connected

UserCreateOutputObjects Called once per slaveCreate histograms

UserExecCalled once per eventTrack loop, tracks are counted, histogram filled, output "posted"

TerminateCalled once on the client (your laptop/PC)Histogram read back from the output stream, visualized, saved to disk

Page 30: Part III: PROOF

30

Changing the task

Add a || < 0.5 cutFloat_t eta = track->Eta();if (TMath::Abs(eta) > 0.5)

continue;

Page 31: Part III: PROOF

31

Changing the task (2)

Add a second plot: distributionHeader file (.h file)

• Add new member: TH1F* fEta; // eta distribution

Constructor• Initialize member: fEta(0)• Add second output slot: DefineOutput(2,

TH1F::Class())

UserCreateOutputObjects• Create histogram

fEta = new TH1F("fEta", "#eta distribution", 20, -2, 2);

UserExec• Get like in previous example• Fill histogram: fEta->Fill(eta);• Post output: PostData(2, fEta)

Page 32: Part III: PROOF

32

Changing the task (3)

Terminate Read histogram from the output slotfEta = dynamic_cast<TH1F*> (GetOutputData(2));Introduce an if statement if the object was retrievedif (!fEta) { Printf("ERROR: fEta was not found"); return; }Draw the histogramnew TCanvas;fEta->DrawCopy();

Copy runProof.C to runProof2.C and change:Add second output slotcOutput2 = mgr->CreateContainer("cOutput2", TH1::Class(), AliAnalysisManager::kOutputContainer,

"Pt.root");mgr->ConnectOutput(task, 2, cOutput2);

Page 33: Part III: PROOF

33

Read Monte Carlo tracks

Use task AliAnalysisTaskPtMC.{h,cxx}Copy runProof.C to runProofMC.CChange AliAnalysisTaskPt to AliAnalysisTaskPtMCAdd access to the MC event handlerhandler =

new AliMCEventHandler;mgr->SetMCtruthEventHandler

(handler);

Change output filename to PtMC.rootRun it!

Page 34: Part III: PROOF

34

runProofMC.C

Page 35: Part III: PROOF

35

Looking at the MC task

Very similar to ESD track caseInstead of looping over content of fESD, MC event is retrieved by

AliMCEventHandler* eventHandler = dynamic_cast<AliMCEventHandler*>

(AliAnalysisManager::GetAnalysisManager()->GetMCtruthEventHandler());

if (!eventHandler) { Printf("ERROR: Could not retrieve MC event handler"); return;}

AliMCEvent* mcEvent = eventHandler->MCEvent();if (!mcEvent) { Printf("ERROR: Could not retrieve MC event");

return;}

Page 36: Part III: PROOF

36

Reading log files

When your task crashesYou can access the output of the last query by clicking on the “Show Log” button in the PROOF progress window

You can retrieve the output from any previous query

• Open ROOT• Get a PROOF manager object

mgr = TProof::Mgr(”alicecaf")• Get the log files from the last session

logs = mgr->GetSessionLogs(0) // 0=last query• Display them

logs->Display()• Search for a special word (e.g. segmentation violation)

logs->Grep("segmentation violation")• Save them to a file

logs->Save("*", "logs.txt")

Page 37: Part III: PROOF

37

Some Goodies...

Resetting environmentTProof::Reset(”alicecaf")

Compile with debugLoad("<task>+g")

Create a package from AliROOTmake PWG0base.par

Page 38: Part III: PROOF

38

A helper for AliEn analysis

Works as a plugin for the analysis manager (as event handlers)

One has to create and configure a AliAnalysisAlien objectSee: http://aliceinfo.cern.ch/Offline/Activities/Analysis/AnalysisFramework/AlienPlugin.html

Creates dataset, JDL, analysis macro, execution+validation scripts

Submits your job and merges the results

Page 39: Part III: PROOF

39

Important plug-in settings

plugin->SetRunMode(const char *mode)“full” : generate files, copy in grid, submit, merge“offline”: generate files, user can change them“submit”: copy files in grid, submit, merge“terminate”: merge available results“test”: generate files + a small dataset, run locally as a remote job

• plugin->SetNtestFiles(Int_t nfiles) – default 1

plugin->SetROOTVersion(rootver)

plugin->SetAliRootVersion(alirootver)Change whenever neededSee command: aliensh[] packages

Page 40: Part III: PROOF

40

Describing the input data

plugin->SetGridDataDir(datadir)Put here the alien path before run numbersSee pcalimonitor.cern.ch for relevant data paths

plugin->SetDataPattern(pattern)Use uniquely identifying patterns

• i.e. */pass3/*/AliESDs.root

Plugin supports making datasets on ESD, ESD tags or AOD

plugin->SetRunRange(min,max)Sets the run range to be analyzedEnumeration of run numbers allowedFor existing data collections, use AddDataFile()

Page 41: Part III: PROOF

41

Other settings

Using par filesplugin->EnablePackage(“package.par”)

Using other external libraries available in AliEn

plugin->AddExternalPackage("fastjet::v2.4.0")

Compiling single source filesplugin->SetAnalysisSource(“mySource.cxx”)But files have to be uploaded to AliEn fron current directory

• plugin->SetAdditionalLibs(“mySource.cxx mySource.h”)

• Extra libraries to be loaded (besides AF ones) have to be enumerated in the same method.

Page 42: Part III: PROOF

42

Configuring and running the AliEn plugin

Open CreateAlienHandler.C

Change working/output directories

Modify number of files/worker

Make sure the run mode is set to “full”

Run macro runGrid.C

Inspect the job status

Modify the run mode to “terminate” once job finished

Run again runGrid.C

Page 43: Part III: PROOF

43

Expert settings

Define outputs:Output directory: plugin->SetGridOutpuutDir()

• Absolute or relative path

Custom: plugin->SetOutputFiles(“file1 file2 …”);Default: plugin->SetDefaultOutputs()Output archive: plugin->SetOutputArchive()

Number of files per jobplugin->SetSplitMaxInputFileNumber();

Number of runs per master jobplugin->SetNrunsPerMaster()

Number of files to merge in a chunkplugin->SetMaxMergeFiles()

Prefix run numbers to match reconstructed dataplugin->SetRunPrefix("000"); plugin->SetRunRange(103313, 103350);

Page 44: Part III: PROOF

44

References

More information on http://aliceinfo.cern.ch/Offline/Activities/Analysis/CAF

Read the FAQ on the webpage above

Please join the mailing [email protected] by going to http://listboxservices.web.cern.ch/listboxservices