Top Banner
3rd LHC Computing Workshop - 30th Sep 1999 BABAR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory for BABAR Experiment [email protected]
26

3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

3rd LHC Computing Workshop - 30th Sep 1999

BABAR Operational Experience with Objectivity ODBMS

David R. Quarrie

Lawrence Berkeley National Laboratory

for BABAR Experiment

[email protected]

Page 2: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

23rd LHC Computing Workshop - 30th Sept 1999

Database Goals

Provide storage and access for event data Event store

Provide storage and access for detector conditions data Environmental conditions that vary with time Conditions & Ambient databases

Configuration Management Keyed access to unique configurations

TriggerDetector setpoints

Configuration Database

Not production management Handle distribution and access across whole collaboration

Wide area as well as local area access

Page 3: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

33rd LHC Computing Workshop - 30th Sept 1999

Experiment Characteristics

Characteristic SizeNo. of Detector Subsystems 7No. of Electronic Channels ~250,000Raw Event Size ~32kBytesDAQ to Level 3 Trigger 2000Hz 50MByte/secLevel 3 to Reconstruction 100Hz 2.5MByte/secReconstruction 100Hz 7.5MByte/secEvent Rate 109 events/yearStorage Requirements (real& simulated data)

~300TByte/year

Page 4: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

43rd LHC Computing Workshop - 30th Sept 1999

Performance Requirements

Online Prompt Reconstruction Baseline of 200 processing nodes 100 Hz total (physics plus backgrounds)

30 Hz of Hadronic Physics Fully reconstructed

70 Hz of backgrounds, calibration physics Not necessarily fully reconstructed

Physics Analysis DST Creation

2 users at 109 events in 106 secs (1 month) DST Analysis

20 users at 108 events in 106 secs Interactive Analysis

100 users at 100events/secs

Page 5: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

53rd LHC Computing Workshop - 30th Sept 1999

Functionality Summary

Basic design/functionality ok No performance or scaling problems with conditions,

ambient and configuration databases Security and data protection APIs added

Internal to a federation Access to different federations

Problems Significant performance/scaling problems with event store

Online Prompt ReconstructionPhysics Analysis

Data Distribution problems Internal within SLACExternal to/from remote Institutions

Focus of the remainder of the talk

Page 6: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

63rd LHC Computing Workshop - 30th Sept 1999

Computing Review 2-4 Aug 1999

Identified database performance as major technical concern Recommended database reviews in Feb and Aug 2000 Recommended development of limited-function short-term

non-Objy solution for micro-DST analysis Recommended setting up of a dedicated Objectivity testbed

in order to perform detailed scaling and performance tests

Page 7: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

73rd LHC Computing Workshop - 30th Sept 1999

Production Federations

Two groups Physics

OnlineAnalysisReprocessing

SimulationGenerationAnalysisReprocessing

Motivations Minimization of interference (particularly with online) Increase the available number of databases

Operational experience caused the Online to be split IR2 OPR

Page 8: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

83rd LHC Computing Workshop - 30th Sept 1999

SLAC Design Hardware Configuration

Page 9: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

93rd LHC Computing Workshop - 30th Sept 1999

SLAC Configuration at time of Review

Page 10: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

103rd LHC Computing Workshop - 30th Sept 1999

Testbed Hardware Configuration

Testbed hardware available from about 7th August Two datamovers (450) 100+ bronco clients (Ultra-5) Conditions & catalog servers (250) Journal servers (250) Lock servers

Two sets of tests Online Prompt Reconstruction (OPR) Physics Analysis

Initial tests have focussed on OPR Already well instrumented Expect any performance improvements to apply to analysis as well Dedicated analysis performance tests later

Page 11: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

113rd LHC Computing Workshop - 30th Sept 1999

Baseline Configuration

We baselined the testbed against the production system to ensure that we started off with the same performance

Turned off filtering All input events are being fully reconstructed Easier to understand event rate Will turn it back on again later on in the testing

Some of tests are preliminary and we need to go back & redo them Don’t fully understand all the numbers yet

The tests are still underway Numbers are not final

Page 12: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

123rd LHC Computing Workshop - 30th Sept 1999

Baseline Results at time of Review

0

4

8

12

16

20

24

28

32

36

40

0 20 40 60 80 100 120 140 160 180 200# nodes

events

/sec

baseline

noObjy

Production set point

Asymptotic limit

Page 13: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

133rd LHC Computing Workshop - 30th Sept 1999

Knobs to twiddle (tests so far)

Minimize catalog operations Separate conditions DB server Separate catalog server Tune AMS server Client file descriptors Client cache sizes Initial container sizes Transaction lengths TCP configuration Multiple AMS processes Database clustering Autonomous partitions Disable filters Singleton Federations

Veritas Filesystem optimization Decrease payload per event LM starvation? Loadbalance across datamovers More datamovers Database pre-creation Gigabit lockserver Caching handles Local bootfile Unlock instead of mini-transaction Run OPR with no output Run on shire to bypass AMS

Page 14: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

143rd LHC Computing Workshop - 30th Sept 1999

Results so far

0

4

8

12

16

20

24

28

32

36

40

0 20 40 60 80 100 120 140 160 180 200# nodes

even

ts/s

ecbaselineinitNrPages2dbClustersnoObjy2AMSes2AMSes+2dbClusters(segr)1Gb metadata4AMSes+4dbClusters(segr)fixedAMS (4 used)fixed AMS (1 used)FS defragm

4 datamovers

Page 15: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

153rd LHC Computing Workshop - 30th Sept 1999

Significant Items

Minimize Catalog operations e.g. Named containers

Linkable AMS server slow (~3-4 Mbytes/sec) Not the normal AMS - the special one allowing migration/staging Inefficiency in handling 16k file descriptors Located in Objy code First improvement by Andy Hanushevsky

Probably more improvements to come

Extending containers is expensive During persistent object creation Contrary to advice from Objy engineer

For a single process it’s low overheadCauses locking

Presize to 50% of average final size

Page 16: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

163rd LHC Computing Workshop - 30th Sept 1999

Significant Items (2)

Database clusters Grouping of nodes to databases Reduce the number of processes accessing each database Undocumented locking operation to extend containers

Multiple AMS processes per server Currently single threaded Definite improvement with 4 – we’ll try 8

N.B. Most servers have 4 cpus Won’t be necessary in 5.2 - the AMS is (finally) multi-threaded

Veritas filesystem configuration Single-threaded tests show 40MB/sec read & write Random-write tests (non-Objy) show 7MB/sec throughput We’re seeing about this with 180 nodes Work in progress on optimization

Managed 8 MB/sec

More datamovers

Page 17: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

173rd LHC Computing Workshop - 30th Sept 1999

Problem - Payload per event

Component Predicted Real Data Simulated Data

SIM 100 kBytes 331 kBytes

TRU 40 kBytes 81 kBytes

RAW 30 kBytes 124 kBytes 230 kBytes

REC 100 kBytes 150 kBytes 370 kBytes

ESD 20 kBytes 2.8 kBytes 6.8 kBytes

AOD 2 kBytes 17.4 kBytes 47.2 kBytes

TAG 200 Bytes 1020 Bytes 720 Bytes

Total Real 152 kBytes 295 kBytesTotal Sim 292 kBytes 1066 kBytes

- Problem is our poor implementation, not Objectivity overhead- Work is underway to redesign/reimplement this

Page 18: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

183rd LHC Computing Workshop - 30th Sept 1999

Future Prompt Reconstruction Tests

Reduce payload per event Autonomous partitions

Slight hint of lock server saturation (cpu load)

Veritas filesystem optimization TCP configuration About to try 250 nodes

The bottom line: We’ve met the design goals (with filtering re-enabled) Still lots of possibilities for improvements

Page 19: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

193rd LHC Computing Workshop - 30th Sept 1999

Physics Analysis

No quantitative tests yet Expect that improvements shown by prompt reconstruction

will also improve performance for physics analysis Also expect to find and apply read-only optimizations 3 “typical” jobs being setup

CPU bound Medium cpu “skim” Fast physics analysis

Testing about to start Also using shire (E10000) as database server Objy 5.2 (with SLAC extensions) will support dynamic

load-balancing across multiple servers 20MB/sec per server?

Page 20: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

203rd LHC Computing Workshop - 30th Sept 1999

Data Distribution Issues

Internal to SLAC Sweeps of data between production federations Database id allocation scheme works well HPSS catalog used as primary location Shadowing of databases as well as copying Bookkeeping is biggest outstanding problem

Getting better but a ways to go…

External to SLAC Use of 10GB databases has caused major problems

Lots of unexpected infrastructure problems (perl, tsch, etc.) Bugs in size calculation has caused some nominally 2 GB

databases to exceed this limitFix being installed into production now

Bandwidth of toolsFile copies between computers at SLAC

Page 21: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

213rd LHC Computing Workshop - 30th Sept 1999

Scaling Problems

Total number of database files Being addressed by longRefs in future release Avoids the current need for database files >2GB

Cause significant infrastructure problems Timescale “6-9 months”

Number of nodes for parallel loading We’re essentially there In process of applying lessons from testbed to production

Administration tools operate slower Still an issue

Update “starvation” Administration problem since multiple read accesses prevent

updates from being applied MROW access expected to solve this

Page 22: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

223rd LHC Computing Workshop - 30th Sept 1999

Reliability Problems

Lock collisions Better understanding of lock management Avoid leaving lock trails behind Automatic cleanup at end of job Automatic cleanup at regular intervals

Separate Online and OPR federations Separated for reliability & OPR lock “firestorms” Unable to provide full calibration feedback Firestorms not in fact an interference between Online and OPR

Solved by Objy bug fix and lock optimization New design allows closed loop calibration feedback with separate

federations

We’re gaining operational experience in production Earlier tests (e.g. MDC2) didn’t scale

Page 23: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

233rd LHC Computing Workshop - 30th Sept 1999

Lack of automation problems

Goal was to achieve understanding and hence reliability using manual procedures, then install automatic procedures Automatic procedures only work once we understand the issues

and achieve reliable operation

Most of underlying tools now in place e.g. Sweeping of data from one federation to another

Still lack necessary bookkeeping Automatic procedures and logging mechanisms (e.g. web

pages) slowly being put into place More personnel now available to work on this

Still a lot of learning to be done in this area

Page 24: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

243rd LHC Computing Workshop - 30th Sept 1999

Risk Analysis - Alternatives to Objectivity?

Should we be looking into an alternative? We have attempted to minimize direct dependency on

Objectivity Successful for reconstruction/analysis code Not successful for infrastructure

MakefilesAdministration toolsData distribution

MicroDST based on ROOT I/O Takes advantage of Converters & Modules classes for Objy.

Page 25: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

253rd LHC Computing Workshop - 30th Sept 1999

Objectivity Usage Statistics

>30 sites using Objectivity USA, UK, France, Italy, Germany

~650 licensees People who have signed the license agreement

~400 users People who have created a test federation

>100 simultaneous users Monitoring distributed oolockmon statistics

60 developers Have created or modified a persistent class A wide range of expertise

10-15 experts

485 persistent classes

Page 26: 3rd LHC Computing Workshop - 30th Sep 1999 B A B AR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory.

David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

263rd LHC Computing Workshop - 30th Sept 1999

Conclusions

Basic design and technology ok Serious performance/scaling problems at startup Lots of learning about how to manage production

environment Dedicated testbed has demonstrated good results

Prompt Reconstruction now achieving design performance Similar improvements in physics analysis expected

Not all these improvements have been fed back into production environments Underway now

Is Objectivity suitable for use within HEP? Yes

Is it the only solution? No