Top Banner
The ROOT Project in the multi-core CPU era CHEP06, Mumbai 15 February 2006 René Brun CERN
43

The ROOT Project in the multi-core CPU era

Jan 06, 2016

Download

Documents

mahina

The ROOT Project in the multi-core CPU era. CHEP06, Mumbai 15 February 2006 Ren é Brun CERN. Plan of talk. ROOT: 11 years old !! Still many developments Multi Core cpus: parallelism ROOT, Software Obesity and the GRID. ROOT: a long story. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The ROOT Project  in the multi-core CPU era

The ROOT Project in the multi-core CPU

eraCHEP06, Mumbai

15 February 2006

René Brun

CERN

Page 2: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 2

Plan of talk

• ROOT: 11 years old !!• Still many developments

• Multi Core cpus: parallelism

• ROOT, Software Obesity and the GRID

Page 3: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 3

ROOT: a long story

• Started in January 1995. ROOT had to face many sociological obstacles at a time when most users were changing experiments, languages and lost in many fights. “Every problem has its root in failure of a relationship”

(The Times of India Tuesday 14 February)

• This initial opposition has been a key element for the success of the project. By spotting the inevitable weaknesses of some early designs, it forced the team to react quickly. The development method involving more and more users has been essential to get feedback. Designing a large system like ROOT is an iterative process. This process has involved many people in many experiments.

• ROOT is now strongly supported at CERN and FNAL. Many thanks to the management and my colleagues in the LCG project for facilitating a convergent process.

Page 4: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 4

ROOT project: some numbers

• The ROOT project is comparable in size and complexity to the software of each LHC experiment. See, for instance, the evaluation by the sloccount tool

• sloccount by John Wheeler assumes

Total Physical Source Lines of Code (SLOC) = 1,709,170Development Effort Estimate, Person-Years (Months) = 495.97 (5,951.63)Schedule Estimate, Years (Months) = 5.66 (67.97)Estimated Average Number of Developers = 87.57Total Estimated Cost to Develop = $ 66,998,665

(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) (average salary = $56,286/year, overhead = 2.40).

Page 5: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 5

ROOT person power

CERN + FNALOnly people working full time on the project

Page 6: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 6

Presentations about ROOT & co at CHEP0698 - PROOF - The Parallel ROOT Facilit Distributed Data Analysis - Monday 13 February 15:00 Presenter: GANIS, Gerardo (CERN)

187 - ROOT GUI, General Status Software Tools and Information Systems - Monday 13

February 16:40 Presenter: RADEMAKERS, Fons (CERN)

188 - From Task Analysis to the Application Design Software Tools and Information Systems - Monday 13

February 17:00 Presenter: Mr. RADEMAKERS, Fons (CERN)

129 - ROOT I/O for SQL databases Software Components and Libraries - Monday 13

February 17:40 Presenter: Dr. LINEV, Sergey (GSI DARMSTADT)

185 - Reflex, reflection for C++ Software Components and Libraries - Tuesday 14

February 14:00 Presenter: Dr. ROISER, Stefan (CERN)

Xxx Recent Developments in the ROOT I/O and TTrees Software Components and Libraries - Monday 13 February 16:00 Presenter: Dr. Canal, Philippe (FNAL)

227 - New Developments of ROOT Mathematical Software Libraries

Software Components and Libraries - Tuesday 14 February 16:00

Presenter: Dr. MONETA, Lorenzo (CERN)

383 - New features in ROOT geometry modeller for representing non-ideal geometries

Software Components and Libraries - Wednesday 15 February 14:00

Presenter: CARMINATI Federico (CERN)

93 – ROOT 3D graphics Software Components and Libraries - Wednesday 15 February

16:00 Presenter: BRUN, Rene (CERN)

407 - Performance and Scalbility of xrootd Distributed Data Analysis - Wednesday 15 February 17:00 Presenter: HANUSHEVSKY, Andrew (Stanford Linear

Accelerator Center)

92 - ROOT 2D graphics visualisation techniquesPoster - Monday 13 February 11:00

91 - ROOT 3D graphics overview and examplesPoster - Monday 13 February 11:00

189 - Recent User Interface Developments Poster - Monday 13 February 11:00

186 - ROOT/CINT/Reflex integrationPoster - Monday 13 February 11:00

228 - The structure of the new ROOT Mathematical Software Libraries

Poster - Wednesday 15 February 09:00

249 - XrdSec - A high-level C++ interface for security services in client-server applications

Poster - Wednesday 15 February 09:00

408 - xrootd Server ClusteringPoster - Wednesday 15 February 09:00

Page 7: The ROOT Project  in the multi-core CPU era

Multi Core cpus

Impact on ROOT

Page 8: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 8

Multi Core CPUs

http://www.intel.com/technology/computing/archinnov/platform2015/

This is going to affect the evolution of ROOT in many areas

Page 9: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 9

Moore’s law revisited

Your laptop in 2016 with32 processors

16 Gbytes RAM16 Tbytes disk

> 50 today’s laptop

Page 10: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 10

• There are many areas in ROOT that can benefit from a multi core architecture. Because the hardware is becoming available on commodity laptops, it is urgent to implement the most obvious asap.

• Multi-Core often implies multi-threading. There are several areas to be made not only thread-safe but also thread aware.• PROOF obvious candidate. By default a ROOT

interactive session should run in PROOF mode. It would be nice if this could be made totally transparent to a user.

• Speed-up I/O with multi-threaded I/O and read-ahead• Buffer compression in parallel• Minimization function in parallel• Interactive compilation with ACLIC in parallel• etc..

Impact on ROOT

Page 11: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 11

CPU/Node hierarchy

latency 100 nanos 100 micros 100 millis

Laptop node

1->32->??N cpus

Local cluster

1000xN cpus

GRID(s)

100x1000 nodes

Batch jobs pushed to the GRID

Maximum number of jobs run in one week/month

Interactive jobs run on the laptop and use processors on the GRID

Real Time important for short/medium queries

Analysis mainly on laptop and ONE cluster on the GRID

Page 12: The ROOT Project  in the multi-core CPU era

Software ObesityUse local power as much as possible.Can we simplify software installation on the GRID?

A proposal

Page 13: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 13

• A considerable amount of time is spent in installing software (up to one day for an expert).

• Porting to a new platform is non trivial.• Dependency problems in case many packages

must be installed.• Only a small subset of the software is used.• The installation may require a huge amount of

disk space. Users are scared to download a new version.

• This is not fitting well with the GRID concept.

• The GRID should be used to simplify this process and not to make it more complex.

Observations

Page 14: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 14

AliceAlice AtlasAtlas CMSCMS ROOTROOTnumber of lines in header files

102282 698208 104923 153775

classes total 1815 8910 ??? 1500

classes in dict 1669 >41202140

835 1422

lines in dict 479849 ??? 103057 698000

classes c++ lines

577882 1524866 277923 857390

total linesClasses+dict

1057731 ??? 380980 1553390

totalf77 lines

736751 928574 ??? 3000

directories 540 19522 <500 958

comp time 25’ 750’ 90’ 30’

lines compiled/s 1196 50 (70) 71 863

LHC software

Page 15: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 15

Page 16: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 16

Source of inefficiencies with Shared Libs

• fPIC (Position Independent Code) introduces a 20 per cent degradation (10 to 30%)

• In case of many shared libs, the percentage of classes and code used is small =>swapping (20%)

• Because shared libs are generated for maximum portability, one cannot use the advanced features of the local processor when compiling. The same optimization level is used everywhere

• But a very large fraction of the code does not need to be optimized: no gain at execution, big loss when compiling

• A small fraction of the code should be compiled with the highest possible optimization (10%)

• May be a factor 2 loss !!!

Page 17: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 17

• In the Fortran era, often one subroutine/file• Loader takes only the subroutines really

referenced. However the percentage of referenced but not used code has increased with time.

• Shared libs were efficient at a time when code could be shared between different tasks on time sharing systems.

• Shared libs have solved partially the link time problem.

• Shared libs are not a solution for the long term.• Archive libs are unusable in a large system, but

nice to build static modules

• What to do ?What to do ?

Shared Libs vs Archive Libs

Page 18: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 18

Fraction of ROOT code

really used in a batch job

Share

d lib

siz

e in b

yte

s

Page 19: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 19

Fraction of ROOT code really used in a

job with graphics

Page 20: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 20

%classes used

%functions used

Fraction of code really used in one program

Page 21: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 21

*.cxx, *.h100 Mb

c++800 l/s ld myapp

memory

*.so76 Mb

*.o110 Mb

Cint10000 l/s

We are waisting a lot of time in

writing/reading .o or .so files to/from disk

Page 22: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 22

BOOTIntroducing

A Software Bootstrap system

Proposal for a new scenario

Page 23: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 23

• A small system to facilitate the life of many users doing mainly data analysis with ROOT and their own classes (users + experiment).

• It is a very small subset of ROOT (5 to 10 per cent)

• The same idea could be extended to other domains, like simulation and reconstruction.

What is BOOT?

R

O

O

TBOOT

Page 24: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 24

• A small, easy to install, standalone executable A small, easy to install, standalone executable modulemodule ( < 5 Mbytes) • One click in the web browser

• It must be a stable system that can cope with old and new versions of other packages including ROOT itself.

• It will include:• A subset of ROOT I/O, network and Core classes• A subset of Reflex• A subset of CINT (could also have a python flavor)• Possibly a GUI object browser

• From the BOOT GUI or command line, the referenced software (URL) will be automatically downloaded and locally compiled/cached in a transparent way.

What is BOOT?

Page 25: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 25

• BOOT must be able to run with the existing codes, may be with reduced possibilities.

• In the next slides, a few use cases to illustrate the ideas.

• Do not take the syntax as a final word.

BOOT and existing applications

Page 26: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 26

• Assumes BOOT already installed on your machine [email protected]

• Nothing else on the machine , except the compiler (no ROOT, etc)

• Import a ROOT file containing histograms, Trees and other classes (usecase1.root)

• Browse contents of file• Draw an histogram

BOOT: Use Case 1

R

O

O

TBOOT

Page 27: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 27

Usecase1.root(2 Mbytes)

Contains references(URL) to classes in

namespace ROOT

[email protected]

http://root.cern.ch/coderoot.root

This is a compressed ROOT filecontaining the full ROOT source tree

automatically built from CVS(25 Mbytes)

+

ROOT classes dictionary DSgenerated by Reflex

(5 Mbytes)+

The full classes documentationObjects generated by the source

parser(5 Mbytes)

[email protected]

Local cache withthe source of the

classes really used+

binaries for the classes or functions

that are automatically generated from the

interpreter (like ACLIC mechanism)

Use Case 1

Page 28: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 28

code.root

usecase1.root

Use Case 1 pictures

Page 29: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 29

//This code can be interpreted line by line

//executed as a script or compiled with C/C++

//after corresponding code generation

use ROOT, YYYY=http://cms.cern.ch/packages/yyyy

h = new TH1F(“h’,”example”,100,0,1);v = new LorentzVector(….);gener = new myClass(v.x());

h.Fill(gener.Something());

h.Draw();

Use Case 2

• BOOT already installed• Want to write the shortest possible program

using some classes in namespace ROOT and some classes from another namespace YYYY

Page 30: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 30

use ROOT, YYYY=http://cms.cern.ch/packages/yyyy

use ROOT6=http://root.cern.ch/root6/code.root

use ROOT6::LorentzVector

h = new TH1F(“h’,”example”,100,0,1);

v = new LorentzVector(….);

gener = new myClass(v.x());

h.Fill(gener.Something());

Use Case 3

• A variant of Use Case 2• A bug has been found in class LorentzVector of

ROOT and fixed in new version ROOT6

Page 31: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 31

use ROOT

use ATLFAST=http://atlas.cern.ch/atlfast/atlfastcode.root

TFile f(“mcrun.root”);

for each entry in f.Tree

for each electron in Electrons

h.Fill(electron.m_Pt);

h.Draw

Use Case 4

• High Level ROOT Selector understanding named collections in memory (ROOT,STL) or collections in ROOT files.

Page 32: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 32

Event data in a Tree

C++ scripts

Use Case 5: Event Displays

• In general, Event Displays require the full experiment infrastructure (Pacific, Obelix, WonderLand, Crocodile).

• This is complex and not good for users and OUTREACH.

• A data file with the visualization scripts is far more powerful

• This implies that the This implies that the GUI must be fully GUI must be fully scriptablescriptable. This is the case for ROOT GUI.

Page 33: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 33

Requirements: work to do

• libCore has already all the infrastructure for client-server communications and for accessing remote files on the GRID.

• We must understand how to use subsets of the compilers and linkers to bypass disk I/O.

• We must understand how to emulate a dynamic linker using pre-compiled objects in memory.

• We have to investigate various code generation tools and the coupling with an extended version of CINT (and possibly python).

• We must understand how to use the STL functionality without its penalty. Dynamic templates are also necessary.

Page 34: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 34

Procedure

• These are just ideas. Making a firm proposal requires more investigations and prototyping.

• It must be clear that the top priority is the consolidation of ROOT to be ready for LHC data taking. This should not be an excuse to not look forward.

• This work will continue as a background activity.

Page 35: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 35

Conclusions

• After more than 10 years of intensive development, the CORE work packages are consolidated.

• Important developments in PROOF, Math, CINT, Reflex, 3-D graphics.

• All packages must be adapted to a multi-threading environment made necessary by the multi core cpus.

• .Instead of pushing gigabytes of source or shared libs to the GRID working nodes, BOOT could greatly optimize and simplify the use of the GRID. BOOT will use a PULL technique to download only the software necessary (source) to run an application and in an incremental way.

• Hoping to show a working BOOT at the next CHEPHoping to show a working BOOT at the next CHEP.

Page 36: The ROOT Project  in the multi-core CPU era

Spare Slides

Page 37: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 37

“Classic” approach

G. Ganis, CHEP06, 15 Feb 2006

StorageBatch farm

queues

manager

outputs

catalog

query

“static” use of resources jobs frozen, 1 job / worker node

“manual” splitting, merging limited monitoring (end of single job)

submit

files

jobsdata file splitting

myAna.C

mergingfinal analysis

Page 38: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 38

The PROOF approach

G. Ganis, CHEP06, 15 Feb 2006

catalog StoragePROOF farm

scheduler

query

farm perceived as extension of local PC more dynamic use of resources real time feedback automated splitting and merging

MASTER

PROOF query:data file list, myAna.C

files

final outputs

(merged)

feedbacks

(merged)

Page 39: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 39

Atlas packages with > 10000 lines

211677 dice fortran=211641187691 atrecon fortran=138126,cpp=49354129793 MuonSpectrometer fortran=121321,python=3715,csh=2613,sh=2136118504 Tools cpp=67337,ansic=19012,python=13770,sh=7373,yacc=5659, fortran=3024,lex=1971116327 PhysicsAnalysis cpp=107348,python=6070,sh=1649,csh=1260115143 geant3 fortran=115040,ansic=67112445 TileCalorimeter cpp=108580,python=2209,csh=920,sh=736108200 atutil fortran=108000,ansic=16480866 Applications fortran=71764,cpp=6961,ansic=186574721 Calorimeter cpp=65917,python=7854,sh=490,csh=46067822 atlfast fortran=6778664838 Tracking cpp=60255,python=2092,csh=1380,sh=110459429 Generators fortran=28136,cpp=25538,python=4123,sh=872,csh=76049926 graphics java=40719,cpp=8312,python=321,sh=255,csh=22040058 AtlasTest cpp=25159,python=5131,sh=4815,perl=4145,csh=51739576 Control cpp=22030,python=15904,sh=907,csh=69331192 DetectorDescription ansic=29540,csh=680,sh=562,python=34329500 TestBeam cpp=27433,python=1491,csh=320,sh=25625001 Reconstruction sh=10297,fortran=7559,python=5393,csh=166718989 atlsim fortran=17561,cpp=138018328 InnerDetector python=11466,csh=2860,sh=2641,ansic=134317291 Simulation python=13653,sh=2126,csh=1302,fortran=16916139 Database perl=8310,sh=4299,java=2209,csh=709,python=56614250 Event cpp=13522,python=296,csh=240,sh=19212930 gcalor fortran=1289411955 Trigger python=7860,csh=1780,sh=1673,perl=63411195 LArCalorimeter python=6133,ansic=2045,csh=1620,sh=1347

3 million lines of code1200 packages

Page 40: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 40

Alice packages with > 10000 lines

398742 PDF fortran=398729,ansic=13146414 PYTHIA6 fortran=140748,cpp=5413,ansic=153,pascal=100128337 HLT cpp=127601,ansic=605,sh=100,csh=31128103 ITS cpp=128010,sh=93105763 MUON cpp=105673,sh=9094548 DPMJET fortran=94267,cpp=28172400 STEER cpp=7240052443 HBTAN cpp=51260,fortran=118351489 TPC cpp=51479,sh=1050932 PHOS cpp=50639,csh=29346176 TRD cpp=4617641998 ISAJET fortran=40483,cpp=1494,pascal=2139407 RALICE cpp=29764,ansic=9355,sh=28835916 EMCAL cpp=35410,fortran=383,csh=12331820 ANALYSIS cpp=3182027751 HERWIG fortran=27246,cpp=477,ansic=2827025 FMD cpp=27021,sh=426667 TOF cpp=2666724258 EVGEN cpp=2425821588 HIJING fortran=21099,cpp=48920562 JETAN cpp=19687,fortran=87518344 RAW cpp=1834415232 STRUCT cpp=1523213142 PMD cpp=1314212945 RICH cpp=1294510966 FASTSIM cpp=1096610944 MONITOR cpp=1094410659 ZDC cpp=10659

1.5 million lines of code

Page 41: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 41

libGraf-------

…TGraphTGaxisTPave

libX11-------

drawlinedrawtext

pm

libCore-------

…I/O

TSystem…

libHist-------

…TH1TH2…

libHistPainter-------

…THistPainter

TPainter3DAlgorithms…

libGpad-------

…TPadTFrame

h.Draw()

CINT

local mode

(Plug-in Manager)

pm

pm

pm

pm

Page 42: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 42

• STL containers are very nice. HoweverHowever they have a very high cost in a real large environment.

• Compiling code with STL is much much slower because of inlining (STL is only in header files). The situation improves a bit with precompiled headers (eg in gcc4), but not much.

• Object modules are bigger• Compiler or linker is able to eliminate duplicate code in

ONE object file or shared lib, not across libraries.• If you have 100 shared libs, it is likely that you have the

code for std:vector push_back or iterators 100 times!• In-lining is nice if used with care (or toy benchmarks). It

may have an opposite effect, generating more cache misses in a real application.

• Templates are statically defined and difficult to use in an dynamic interactive environment.

Problem with STL Inlining

Page 43: The ROOT Project  in the multi-core CPU era

René Brun, CERN ROOT in the multi-core cpu era 43

Can we gain with a better packaging?

• Yes and no• One shared lib per class implies more

administration, more dictionaries, more dependencies.

• 80 shared libs for ROOT is already a lot. 500 would be non sense

• A CORE library is essential. However some developers do not like this and penalize/complicate the life of the vast majority of users.

• Plug-in Manager helps