Top Banner
Genova 10-Dec-20 01 Andreas Pfeiffer, CERN/I T-API, andreas.pfeiffer@ cern.ch 1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer CERN IT/API [email protected]
64

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected] Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Jan 19, 2016

Download

Documents

Jared Malone
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

1

AnapheOO Libraries for Data Analysis using

C++ and Python

Andreas PfeifferCERN IT/[email protected]

Page 2: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

2

OutlineOutline

Motivation LHC computing challenge

Anaphe Components C++

Lizard: Interactive Data Analysis Python

Software quality controlSummary

Page 3: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

3

LHC Computing challenge

Page 4: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

4

LHC & The Alps

27km circumference

~100m deepInteractionPoints

Page 5: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

5

The Large Hadron Collider

A completely new particle collider (start-up in 2006)

the largest superconductor installation in the world

A collision will take place every 25 nanoseconds

But only one in a billion will be interesting…

And only one in a trillion will be really interesting !!!

Real-time data filtering: Petabytes per second to Gigabytes per second

Accumulated data: Petabytes per year

Data mining by thousands of geographically dispersed scientists in hundreds of teams

Page 6: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

6

LHC Computing Challenge

4 experiments will create huge amount of data >1 PetaByte/year for each experiment !

1015 Bytes 1,000 TeraBytes 20,000 Redwood tapes 100,000 dual-sided DVD-RAM disks 1,500,000 sets of the Encyclopaedia Britannica (w/o photos)

Need lots of CPU power to reconstruct/analyse about 1000 PC boxes per experiment (2005 ones !)

40.000 of today’s boxes (dual P-III 800 MHz) complex data models

reconstruction s/w is also used for online filtering needs high quality s/w in order not to waste beam time

Page 7: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

7

Lifetime of LHC software = 25 yrs

WWW Thanks to Dino Ferrero Merlino(IT)

Page 8: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

8

Technology (R)Evolution

10 yrs major cycle length (HW,SW,OS) ~12 evolutionary changes in the market 1 revolutionary change towards greater diversity don’t forget changes of requirements

Consequences s/w written today most probably will be

rewritten tomorrow we must anticipate changes

Page 9: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

9

Anaphe: what it is

Modular (OO/C++) replacement of CERNLIB functionality for use in HEP experiments memory management I/O foundation classes histogramming minimizing/fitting visualization interactive data analysis

Trying to use standards wherever possible Trying to re-use existing class libraries This talk will not cover detector simulation (GEANT-4)

Page 10: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

10

Anaphe Components

Data Analysis Lizard - AIDA

Custom graphics (2-D) Qt - Qplotter

Basic graphics (3-D) OpenInventor – OpenGL

Basic math NAG C

HEP foundation CLHEP

Minimization/Fitting FML - Gemini

Histograms HTL

Database HepODBMS

Persistency ODMG/Objectivity DB

C++ Standard Libraries

Page 11: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

11

Use of Components withAbstract Interfaces

User Code uses only Interface classes IHistogram1D * hist = histoFactory->

create1D(‘track quality’, 100, 0., 10.)

Actual implementations are selected at run-time loading of shared libraries

No change at all to user code but keep freedom to choose implementation

Histo-Impl. 2

Histo-IF Fitter-IF

User Code

Fitter-Impl. Y

Histo-Impl. 1

Fitter-Impl. X

Page 12: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

12

The AIDA project

AIDA project (Abstract Interfaces for Data Analysis) was initiated at the HepVis’99 workshop in Orsay

Presently active mainly developers from existing packages Tony Johnson (JAS) Andreas Pfeiffer (Lizard/Anaphe) Guy Barrand (OpenScientist ) Mark Dönszelmann (Wired) Developers from LHCb/Gaudi

more on AIDA tomorrow ...

Page 13: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

13

‘Layered’ Approach

Basic functionalities (histograms, fitting, etc.) are available as individual C++ class libraries.

Easy replacing one part without throwing away everything Objectivity/DB to provide persistence HepODBMS library (“insulating layer”, “tags”) Histogram library (HTL) Fitting libraries (Gemini, HepFitting) Graphics libraries (Qt, Qplotter)

Insulate components through Abstract Interfaces “wrapper” layer to implement Interfaces in terms of existing libs

Apply s/w quality control tools code checking, testing

Page 14: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

14

Anaphe Components: Overview

Page 15: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

15

Anaphe Internals: Abstract Interfaces

Page 16: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

16

Anaphe components

Page 17: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

17

Basic 3D Graphic Libraries

OpenGL (basic graphics) De-facto industry standard for basic 3D graphics Used in CAD/CAE, games, VR, medical imaging

OpenInventor (scene mgmt.) OO 3D toolkit for graphics Cubes, polygons, text, materials Cameras, lights, picking 3D viewers/editors,animation

Based on OpenGL/MesaGL

Page 18: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

18

2D Graphics libraries

Qt multi-platform C++ GUI toolkit

C++ class library, not wrapper around C libs superset of Motif and MFC available on Unix and MS Windows no change for developer

commercial but with public domain version www.troll.no

Qplotter “add-on” functionality for HEP

“HIGZ/HPLOT”

Page 19: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

19

Mathematical Libraries

NAG (Numerical Algorithms Group) C Library Covers a broad range of functionality

Linear algebra differential equations quadrature, etc.

Special functions of CERNLIB added to Mark-6 release mostly for theory and accelerator Quality assurance extensive testing done by NAG

www.nag.com

Page 20: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

20

CLHEP - foundation classes

HEP foundation class library Random number generators Physics vectors

3- and 4- vectors Geometry Linear algebra System of units more packages recently added

will continue to evolve

wwwinfo.cern.ch/asd/lhc++/clhep/

Page 21: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

21

Histograms: the HTL package

Histograms are the basic tool for physics analysis Statistical information of density distributions

Histogram Template Library (HTL) design based on C++ templates Modular : separation between sampling and display Extensible : open for user defined binning systems Flexible: support transient/persistent at the same time Open: large use of abstract interfaces

recent addition: 3D histograms

Page 22: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

22

Fitting and Minimization

Fitting and Minimization Library (FML) common OO interface

NAG-C, MINUIT based on Abstract Interfaces

IVector, IModelFunction, … fitting as a special case of minimization

minimize “distance” between data and model replacement for HepFitting (and Gemini)

Gemini common interface to minimizer engine very thin layer

Page 23: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

23

Opening bracket: Persistency

Page 24: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

24

Object persistencyTwo concepts: serial and page I/O

“Sequential access to objects” (streaming) good in networking context or serial writes to file(s) much like “good old Fortran” often perceived to be “simpler” to implement (“<<“, “>>”)

“Navigational access to objects” (buffered) I/O on demand for complex data models location transparent (for user) access to object

typically by de-referencing of a smart pointer optimized for (random) disk access (disks deliver pages) sequential write to file(s) still ok

Both concepts need to take care about changes of the internal structure of the objects (schema evolution)

Page 25: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

25

Architectural Issue:Persistency (“Object-I/O”)

Brings a completely new quality into the design

Objects have now lifetime don’t “delete” until you really are sure you want to persistency is kind of “intended memory leak” would like to see no difference between memory and disk

“Layout” of objects may change during (extended) life “schema evolution” additions/deletions of attributes changes of inheritance relations

Page 26: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

26

Architectural Issue:Persistency (“Object-I/O”) (II)

Objects can be placed (“clustering”) de-coupling of logical and physical view of data

Special care needed to ensure consistency in data set avoid reading group of objects (tracks, events,...) for which

writing/updating is not (yet) complete clean up if only part of the objects are written typically taken care of by using transactions

Complications possible in distributed computing need to protect disk access now like memory access in past

(“Segmentation violation”)

Page 27: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

27

Physical Model and Logical Model

• Physical model may be changed to optimise performancePhysical model may be changed to optimise performance• Existing applications continue to work Existing applications continue to work transparentlytransparently ! !

Page 28: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

28

Object Model

Thanks to Vincenzo Innocente (CMS)

Page 29: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

29

Physical clustering

Thanks to Vincenzo Innocente (CMS)

Page 30: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

30

Closing bracket: Persistency

Page 31: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

31

“Tags”, Ntuples and Events

Tags - a special kind of Ntuple Always associated with an underlying persistent store Tags may be used to store “ntuple-like” data

extracted from all over the event minPt, maxEmiss, nJets, nMuon, trigger, …

Main use: speedup data selection for analysis … Tag simplifies selection without loosing complexity

Events more complex than a tree structure (“CWN”) lots of cross-references between classes, containers

Association from the Tag to the Event may be used to navigate to any other part of the Event even from an interactive visualization program

Page 32: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

32

AIDA compliance of Anaphe

Presently (Anaphe 3.x) only AIDA 1.0 compliant Plan to implement AIDA 2.2 Interfaces by end 2001

(Anaphe 4.x) initially as wrappers to existing interfaces/packages

Will maintain 3.x for some time ensures stability for users

Development will concentrate on 4.x while AIDA will evolve further

Similar timeschedule as JAS (Tony Johnson) OpenScientist (Guy Barrand) already there

Page 33: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

33

Lizard: a tool for Interactive Data

Analysis

Page 34: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

34

Interactive Data Analysis

Aim: “OO replacement for PAW” (at least) analysis of “ntuple-like data” (“Tags”, “Ntuples”, …) visualisation of data (Histograms, scatter-plot, “Vectors”) fitting of histograms (and other data) access to experiment specific data/code

Maximize flexibility and re-use Foresee customization/integration

allow use from within experiment’s s/w

Plan for extensions “code for now, design for the future”

Ensure maintainability use of s/w quality control tools

Page 35: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

35

Lizard

Un tool di analisi interattiva AIDA compatibile Python scripting Visualizzazione con Qt Istogrammi HTL (via AIDA) Persistenza con Objectivity Fitting con NAG Libraries (o Minuit)

Componenti disponibili come shared libraries indipendenti dal linguaggio di scripting si possono usare anche in programmi C++ (Geant4)

Page 36: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

36

Scripting - why

Typical use of scripting is quite different from programming (reconstruction, analysis, ...) history “go back to where I was before” repetition/looping - with “modifiable parameters”

avoid “one size fits all” or “using power-tool as hammer” rapid prototyping in “scripting language”

quick turn-around times performance critical code in “core language”

exploit richer set of features/functionality (e.g. templates in C++)

scripting languages usually less susceptible to changes than “mainstream languages” potentially longer lifes

Page 37: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

37

Python - why

Python - OO (scripting) language no “strange $!%-variables” sensitive to indentation

More easy for users as Java

Lots of user supplied modules available and ready for use scientific, numerics, graphics, GUI, network, OS, games, DBs, … example: http://www.vex.net/parnassus/

Parnassus Totals: 1173 items in 49 categories.

Also usable in Java (Jython) used in JAS for scripting minimize changes needed within AIDA compliant environments

Page 38: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

38

Python - how

SWIG to (semi-) automatically create connection to chosen scripting language allows flexibility to choose amongst several scripting languages Python, Perl, Tcl, Guile, Ruby, (Java) …

Very easy to use swig -c++ -python -shadow -c myClass.h create shared lib from myClass.cpp and myClass_wrap.c start python and import myClass.h to use it

Very easy to extend simply inherit from “swiggified” class in python modifications can later be fed back into C++

performance, type safety, special language features (templates), …

Page 39: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

39

PAW -> Lizard translation

Ntuple projection Lizard lizard --useHBook :-) nt = ntm.findNtuple(“higgscand.hbk::cands”) :-) nplot1D(nt, “mass”, “quality=5 && cut > 198”)

Ntuple projection PAW pawX11 paw> h/file 1 higgscand.hbk paw> nt/pl 10.mass quality=5.and.cut>198

Assuming file higgscand.hbk contains ntuple with number 10 and title cands

Any valid C++ expression

Page 40: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

40

Example script (ntuple)

# get list of names of all tuples from tuplemanagerntm.listTuples()nt1=ntm.findNtuple(“Charm1”) # retrieve tuple by name# create 1D histos to project into h1=hm.create1D(10, “mass” ,100, 0., 5000.) h2=hm.create1D(20, “mass for pt1>10” ,100, 0., 5000.)

# project the attribute ”MASS" into histo h1 without cut ("")nt1.project1D( h1, “” , “MASS”)

# project the attribute ”MASS" into histo h2 with cut (”PT1>10")nt1.project1D( h2, “PT1>10” , “MASS”)

Page 41: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

41

Page 42: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

42

Lizard: History and Present Status

Started after CHEP-2000Full version out since June 2001

“PAW like” analysis functionality plus: on-demand loading of compiled code using

shared librariesgives full access to experiment’s analysis code and data

based on Abstract Interfacesflexible and extensible

“License free” version since Sep. 2001 HBook for RWNtuples and Histogram storage Minuit as minimizer engine

Page 43: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

43

Users and Collaborations

AIDA spoken here!

IGUANA (CMS visualization) GAUDI (LHCb/HARP) framework ATHENA (Atlas) framework Analyzer modules in Geant 4 JAS Open Scientist …you?

Page 44: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

44

Software quality control

Page 45: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

45

Software quality control

Using tools for testing/checking has started Insure++, CodeWizard

Package dependencies: Ignominy Set of perl and shell scripts

by Lassi Tuura (CMS) Ignominy scans…

Make dependency data produced by the compilers (*.d files) Source code for #includes (resolved against the ones actually

seen) Shared library dependencies (“ldd” output) Defined and required symbols (“nm” output)

And maps… Source code and binaries into packages #include dependencies into package dependencies Unresolved/defined symbols into package dependencies

ignominy: dishonour, disgrace, shame; infamy; the condition of being in disgrace, etc.

(Oxford English Dictionary)

Page 46: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

46

Ignominy Analysis of Anaphe

Distribution of tools and utilities for LHC era physics Combination of commercial, free and HEP software Claims to be a toolkit

Seems to live up to its toolkit claims Good work on modularity Clean design is evident

in many places Dependency diagrams

often split naturally into functional units

Thanks to Lassi Tuura (CMS)

Page 47: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

47

Package MetricsProject Release Packages

Average #of direct

dependencies

Cycles(Packages Involved)

# of levels ACD* CCD* NCCD* Size

Anaphe 3.6.1 31 2.6 -- 8 5.4 167 1.3 630/170kATLAS 1.3.2 230 6.3 2 (92) 96 70 16211 10 1350k

1.3.7 236 7.0 2 (92) 97 77 18263 11 1350kCMS/ORCA 4.6.0 199 7.4 7 (22) 35 24 4815 3.6 420kCMS/COBRA 5.2.0 87 6.7 4 (10) 19 15 1312 2.7 180kCMS/IGUANA 2.4.2 35 3.9 -- 6 5.0 174 1.2 150/38kGeant4 4.3.2 108 7.0 3 (12) 21 16 1765 2.8 680kROOT 2.25/05 30 6.4 1 (19) 22 19 580 4.7 660k*) John Lakos, Large-Scale C++ Programming

Size = total amount of source code (not normalised across projects!) ACD = average component dependency (~ libraries linked in) CCD = cumulative component dependency

sum of single-package component dependencies over whole release NCCD = Measure of CCD compared to a balanced binary tree

A good toolkit’s NCCD will be close to 1.0< 1.0: structure is flatter than a binary tree (= independent packages)> 1.0: structure is more strongly coupled (vertical or cyclic)Aim: NCCD ~ 1 for given software/functionality

Thanks to Lassi Tuura (CMS)

Page 48: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

48

Metrics: NCCD vs Cycles

0

2

4

6

8

10

12

0% 10% 20% 30% 40% 50% 60% 70%

Fraction of Packages in Cycles

NC

CD

Toolkits &Framework

s

ATLAS

ORCA

IGUANA

COBRAG4

ROOT

Thanks to Lassi Tuura (CMS)

IncludesFortran

NCCD (“spaghetti index”)

1.0: good toolkit< 1.0: indep. packages> 1.0: strongly-coupled

NCCD (“spaghetti index”)

1.0: good toolkit< 1.0: indep. packages> 1.0: strongly-coupled

Anaphe

Page 49: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

49

Future enhancements

Access to other implementations of components HBOOK CWNtuples

Reading of ROOT (> V3.0) files similar to Tony Johnson’s (Java) RootIO package

AIDA Ntuple/Histo store optimized for Ntuples, Histograms as (compressed) XML

Communication with Java tools/packages (JAS, Wired) via AIDA

Adding other “scripting” languages Perl , Tcl, cint ?

Page 50: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

50

Challenge: Distributed Computing

Motivation move code to data parallel analysis

Techniques services via AI late binding plug-in architecture

End-user (Lizard) look-and-feel of local analysis

R&D started and first prototype available soon CORBA based

Page 51: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

51

Summary

The architecture of Anaphe shows some important items for flexible and modular data analysis: weak coupling between components through use of Abstract

Interface basic functionality is covered by individual C++ class libraries emphasis on usability and maintainability

Major criteria are flexibility, extensibility and interoperability recent example: GEANT-4 examples (based on AIDA)

Lizard is an Interactive Data Analysis Tool based on Anaphe components and the Python scripting language (through SWIG) Lizard is young but has very solid base in mature Anaphe libraries real plug-in structure

Software quality control is important tools help to optimize dependencies / minimize maintenance effort

Page 52: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

52

More information

cern.ch/Anaphecern.ch/Anaphe/Lizardaida.freehep.org/cern.ch/DBwwwinfo.cern.ch/asd/lhc++/clhep/

Page 53: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

53

Additional slides

Page 54: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

54

Analysis of Geant4

Fairly large C++ project Very fine-grained (and multi-level) package structuring Seems quite clean from the preliminary analysis

Fine package subdivision helps in many ways but makes analysis and code understanding more complicated

One subsystemseems stronglycoupled andneeds attention

Need to studythe use of theinternal commandsystem

Thanks to Lassi Tuura (CMS)

Page 55: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

55

Analysis of ROOT

ROOT developers have done a formidable job of breaking binary (shared library) dependencies, but… For example: By static analysis, nothing seems to use the

postscript package directly (no incoming dependencies), but there is this code:void TPad::Print (const char *filename, Option_t *option) { […]

TVirtualPS *psave = gVirtualPS;if (gROOT->LoadClass("TPostScript","Postscript")) return;gROOT->ProcessLineFast("new TPostScript()");gVirtualPS->Open(psname,pstype);gVirtualPS->SetBit(kPrintingPS); […] }

Taking these and global objects into account makes the dependency diagrams very different

Sign of fast growth? Need a “next evolutionary step”? So “coherent” that replacing parts could get painful…

Thanks to Lassi Tuura (CMS)

Page 56: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

56

Analysis of ROOT…

Binary only Binary + Source + Logical = RealThanks to Lassi Tuura (CMS)

Page 57: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

57

Metrics: NCCD vs ACD

0

2

4

6

8

10

12

0% 10% 20% 30% 40% 50% 60% 70%

Av. Component Deps (Fraction of Packages)

NC

CD

Toolkits &Frameworks

ATLAS

ORCA

AnapheIGUANACOBRAG4

ROOT

Thanks to Lassi Tuura (CMS)

Page 58: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

58

Metrics: NCCD vs Size

0

2

4

6

8

10

12

0 200 400 600 800 1000 1200 1400 1600

Size (k-lines of source [files])

NC

CD

Toolkits &Framework

s

ATLAS

ORCA

AnapheIGUANACOBRA

G4

ROOT

Thanks to Lassi Tuura (CMS)

Page 59: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

59

Metrics: NCCD vs AID

0

2

4

6

8

10

12

0% 5% 10% 15% 20% 25%

Av. Immediate Deps (Fraction of Packages)

NC

CD

Toolkits &Framework

s

ATLAS

ORCA

Anaphe IGUANA

COBRAG4

ROOT

Thanks to Lassi Tuura (CMS)

Page 60: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

60

Metrics: Packages vs Size

0

50

100

150

200

250

0 200 400 600 800 1000 1200 1400 1600

Size (Own Only)

Pac

kage

s

Toolkits &Frameworks

ATLAS

ORCA

AnapheIGUANA

COBRA

G4

ROOT

Thanks to Lassi Tuura (CMS)

Page 61: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

61

Metrics: Packages vs Size

0

50

100

150

200

250

0 200 400 600 800 1000 1200 1400 1600

Size (All)

Pac

kage

s

Toolkits &Frameworks

ATLAS

ORCA

AnapheIGUANA

COBRA

G4

ROOT

Thanks to Lassi Tuura (CMS)

Page 62: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

62

Scripting in Lizard

User

PythonController Shadow classes

C++ interfaces

C++ implementations

Automatically generated by SWIG

AIDA Interfaces

Anaphe implementations

Page 63: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

63

Software life cycle for LHC expts.

LHC starts ~ 2006 at least 10 yr of running additionally at least 5 yr of

data analysis

Page 64: Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer@cern.ch1 Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, [email protected]

64

Lifetime of LHC software = 25 yrs

WWW

SPS 1969

LEP1989

W and Z1983

LEP ends2000

XML 1.0

1997

Linux V 0.011991

C++1985

Ethernet

standard

1983IBM PC1981

K&R C1978

Unix V6 first public

version1975

Java1995

Intel Pentiu

m1992