Top Banner

Click here to load reader

of 64

Computing for Belle CHEP2004 September 27, 2004 Nobu Katayama KEK.

Jan 18, 2018

Download

Documents

Osborne Simmons

2004/09/27Nobu Katayama(KEK)3 Belle detector
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Computing for Belle CHEP2004 September 27, 2004 Nobu Katayama KEK 2004/09/27Nobu Katayama(KEK)2 Outline Belle in general Software Computing Production Networking and Collaboration issues Super KEKB/Belle/Belle computing upgrades 2004/09/27Nobu Katayama(KEK)3 Belle detector 2004/09/27Nobu Katayama(KEK)4 Integrated luminosity May 1999:first collision July 2001:30fb Oct. 2002:100fb July 2003:150fb July 2004:287fb July 2005:550fb Super KEKB 1~10ab /year! >1TB raw data/day more than 20% is hadronic events ~1 fb /day ! 2004/09/27Nobu Katayama(KEK)5 No need to stop run Always at ~max. currents and therefore maximum luminosity Continuous Injection [CERN courier Jan/Feb 2004] ~30% more Ldt both KEKB & PEP-II HER current LER current continuous injection (new) normal injection (old) Luminosity Time ~1 fb -1 /day ! (~1x10 6 BB) - 2004/09/27Nobu Katayama(KEK)6 Summary of b sqq CPV 2.4 sin2 1 ( A : consistent with 0) Something new in the loop?? - We need a lot more data 2004/09/27Nobu Katayama(KEK)7 IHEP, Moscow IHEP, Vienna ITEP Kanagawa U. KEK Korea U. Krakow Inst. of Nucl. Phys. Kyoto U. Kyungpook Natl U. U. of Lausanne Jozef Stefan Inst. Aomori U. BINP Chiba U. Chonnam Natl U. Chuo U. U. of Cincinnati Ewha Womans U. Frankfurt U. Gyeongsang Natl U. U. of Hawaii Hiroshima Tech. IHEP, Beijing U. of Melbourne Nagoya U. Nara Womens U. National Central U. Natl Kaoshiung Normal U. Natl Lien-Ho Inst. of Tech. Natl Taiwan U. Nihon Dental College Niigata U. Osaka U. Osaka City U. Panjab U. Peking U. Princeton U. Riken-BNL Saga U. USTC Seoul National U. Shinshu U. Sungkyunkwan U. U. of Sydney Tata Institute Toho U. Tohoku U. Tohuku Gakuin U. U. of Tokyo Tokyo Inst. of Tech. Tokyo Metropolitan U. Tokyo U. of A and T. Toyama Natl College U. of Tsukuba Utkal U. VPI Yonsei U. The Belle Collaboration 13 countries, institutes, ~400 members 2004/09/27Nobu Katayama(KEK)8 Collaborating institutions Collaborators Major labs/universities from Russia, China, India Major universities from Japan, Korea, Taiwan, Australia Universities from US and Europe KEK dominates in one sense 30~40 staffs work on Belle exclusively Most of construction and operating costs are paid by KEK Universities dominates in another sense Young students to stay at KEK, help operations, do physics analysis Human resource issue Always lacking man power Software 2004/09/27Nobu Katayama(KEK)10 Core Software OS/C++ Solaris 7 on sparc and RedHat 6/7/9 on PCs gcc /3.0.4/3.2.2/3.3 (code compiles with SunCC) No commercial software except for batch queuing system and hierarchical storage management system QQ, EvtGen, GEANT3, CERNLIB (2001/2003), CLHEP(~1.5), postgres 7 Legacy FORTRAN code GSIM/GEANT3/ and old calibration/reconstruction code) I/O:home-grown stream IO package + zlib The only data format for all stages (from DAQ to final user analysis skim files) Index file (pointer to events in data files) are used for final physics analysis 2004/09/27Nobu Katayama(KEK)11 Framework (BASF) Event parallelism on SMP (1995~) Using fork (for legacy Fortran common blocks) Event parallelism on multi-compute servers (dbasf, 2001~, V2, 2004) Users code/reconstruction code are dynamically loaded The only framework for all processing stages (from DAQ to final analysis) 2004/09/27Nobu Katayama(KEK)12 Data access methods The original design of our framework, BASF, allowed to have an appropriate IO package loaded at run time Our (single) IO package grew to handle more and more possible ways of IOs (disk, tape, etc.) and was extended to deal with special situations and became spaghetti-ball like code This summer we were faced again to extend it to handle new HSM and for tests with new software We finally rewrote our big IO package into small pieces, making it possible to simply add one derived class for one IO method Basf_io class Basf_io_diskBasf_io_tapeBasf_io_zfservBasf_io_srb derived classes IO objects are dynamically loaded upon request 2004/09/27Nobu Katayama(KEK)13 Reconstruction software 30~40 people have contributed in the last several years For many parts of reconstruction software, we only have one package. Very little competition Good and bad Identify weak points and ask someone to improve them Mostly organized within the sub detector groups Physics motivated, though Systematic effort to improve tracking software but very slow progress For example, 1 year to get down tracking systematic error from 2% to less than 1% Small Z bias for either forward/backward or positive/negative charged tracks When the problem is solved we will reprocess all data again 2004/09/27Nobu Katayama(KEK)14 Analysis software Several ~ tens of people have contributed Kinematical and vertex fitter Flavor tagging Vertexing Particle ID (Likelihood) Event shape Likelihood/Fisher analysis People tend to use standard packages but System is not well organized/documented Have started a task force (consisting of young Belle members) 2004/09/27Nobu Katayama(KEK)15 Postgresql database system The only database system Belle uses other than simple UNIX files and directories A few years ago, we were afraid that nobody uses postgresql but it seems postgresql is now widely used and well maintained One master, several copies at KEK, many copies at institutions/on personal PCs ~120,000 records (4.3GB on disk) IP (Interaction point) profile is the largest/most popular It is working quite well although consistency among many database copies is the problem 2004/09/27Nobu Katayama(KEK)16 BelleG4 for (Super) Belle We have also started (finally) building Geant4 version of the Belle detector simulation program Our plan is to build the Super Belle detector simulation code first, then write reconstruction code We hope to increase number of people who can write Geant4 code in Belle and in one year or so, we can write Belle G4 simulation code and compare G4, G3 and real data Some of the detector code is in F77 and we must rewrite them Computing 2004/09/27Nobu Katayama(KEK)18 Computing Equipment budgets Rental system Four five year contract (20% budget reduction!) (25Byen;~18M euro for 4 years) (25Byen;~18M euro for 5 years) A new acquisition process is starting for 2006/1- Belle purchased systems KEK Belle operating budget 2M Euro/year Of 2 M Euro, 0.4~1Meuro/year for computing Tapes(0.2MEuro), PCs(0.4MEuro) etc. Sometimes we get bonus(!) so far in five years we got about 2M Euro in total Other institutions 0~0.3Meuro/year/institution On the average, very little money allocated 2004/09/27Nobu Katayama(KEK)19 Rental system( ) total cost in five years (M Euro) 2004/09/27Nobu Katayama(KEK)20 Sparc and Intel CPUs Belle s reference platform Solaris 2.7 Everyone has account 9 workgroup servers (500Hz, 4CPU) 38 compute servers 500GHz, 4CPU LSF batch system 40 tape drives (2 each on 20 servers) Fast access to disk servers 20 user workstations with DAT, DLT, AITs Maintained by Fujitsu SE/CEs under the big rental contract Compute servers Linux RH 6.2/7.2/9) User terminals to log onto the group servers) 120 PCs (~50Win2000+X window sw, ~70 Linux) User analysis unmanaged) Compute/file servers at universities A few to a few each institution Used in generic MC production as well as physics analyses at each institution Tau analysis Nagoya U. for example 2004/09/27Nobu Katayama(KEK)21 Fujitsu 127PCs (Pentium-III 1.26GHz) Appro 113PCs (Athlon 1.67GHz2) Compaq 60PCs (Pentium-III 0.7GHz) 1020GHz 320GHz 168GHz Dell 36PCs (Pentium-III ~0.5GHz) 470GHz 768GHz 377GHz NEC 84PCs (Xeon 2.8GHz2) Fujitsu 120PCs (Xeon 3.2GHz2) Dell 150PCs (Xeon 3.4GHz2) Coming soon! PC farm of several generations Integrated Luminosity Total # of GHz 2004/09/27Nobu Katayama(KEK)22 Disk 8TB NFS file servers Mounted on all UltraSparc servers via GbE 4.5TB staging disk for HSM Mounted on all UltraSparc/Intel PC servers ~15TB local data disks on PCs generic MC files are stored and used remotely Inexpensive IDE RAID disk servers 160GB (7+1) 16 = 100K Euro (12/2002) 250GB (7+1) 16 = 110K Euro (3/2003) 320GB (6+1+S) 32 = 200K Euro (11/2003) 400GB (6+1+S) 40 = 250K Euro (11/2004) With a tape library in the back end 2004/09/27Nobu Katayama(KEK)23 Tape libraries Direct access DTF2 tape library 40 drives (24MB/s each) on 20 UltraSparc servers 4 drives for data taking (one is used at a time) Tape library can hold 2500 tapes (500TB) We store raw data and dst using Belle tape IO package (allocate, mount, unmount, free) perl script (files to tape database), and LSF (exclusive use of tapes) HSM backend with three DTF2 tape libraries Three 40TB tape libraries are back-end of 4.5TB disks Files are written/read as all are on disk When the capacity of the library becomes full, we move tapes out of the library and human operators insert tapes when requested by users by mails 2004/09/27Nobu Katayama(KEK)24 Data size so far Raw data 400TB written since Jan for 230 fb of data on 2000 tapes DST data 700TB written since Jan for 230 fb of data on 4000 tapes, compressed with zlib MDST data (four vectors, verteces and PID) 15TB for 287 fb of hadronic events (BBbar and continuum), compressed with zlib , two photon: add 9TB for 287 fb Number of tapes DTF(2) tape usages DTF(40GB) tapes from DTF2(200GB) tapes Total ~1PB in DTF(2) tapes plus 200+TB in SAIT 2004/09/27Nobu Katayama(KEK)25 Mass storage strategy Development of next gen. DTF drives was canceled SONY s new mass storage system will use SAIT drive technology (Metal tape, helical scan) We decided to test it Purchased a 500TB tape library and installed it as the backend (HSM) using newly acquired inexpensive IDE based RAID systems and PC file servers We are moving from direct tape access to hierarchical storage system We have learned that automatic file migration is quite convenient But we need a lot of capacity so that we do not need operators to mount tapes 2004/09/27Nobu Katayama(KEK)26 Compact, inexpensive HSM The front-end disk system consists of 8 dual Xeon PC servers with two SCSI channels Each connecting one GB IDE disk RAID system Total capacity is 56TB (1.75TB(6+1+S) 2 2 8) The back-end tape system consists of SONY SAIT Petasite tape library system in three rack wide space; main system (four drives) + two cassette consoles with total capacity of 500 TB (1000 tapes) They are connected by a 16 port 2Gbit FC switch Installed in Nov. 2003~Feb and is working well We keep all new data on this system Lots of disk failures (even power failures) but system is surviving Please see my poster 2004/09/27Nobu Katayama(KEK)27 PetaServe: HSM software Commercial software from SONY We have been using it since 1999 on Solaris It now runs on Linux (RH7.3 based) It used Data Management API of XFS developed by SGI When the drive and the file server is connected via FC switch, the data on disk can directly be written to tape Minimum size for migration and the size of which the first part of the file remains on disk can be set by users The backup system, PetaBack, works intelligently with PetaServe; In particular, in the next version, if the file has been shadowed (one copy on tape, one copy on disk), it will not be backed up, saving space and time So far, it s working extremely well but Files must be distributed by ourselves among disk partitions (now 32, in three months many more as there is 2TB limit ) Disk is a file system and not cache HSM disk can not make an effective scratch disk There is a mechanism to do nightly garbage collection if the users delete files by themselves 2004/09/27Nobu Katayama(KEK)28 Extending the HSM system The front-end disk system will include ten more dual Xeon PC servers with two SCSI channels Each connecting one GB RAID system Total capacity is 150TB (1.7TB(6+1+S) 2 2 8) +(2.4TB(6+1+S) 2 2 10) The back-end tape system will include Four more cassette consoles and eight more drives; total capacity is 1.29 PB (2500 tapes) 32 port 2Gbit FC switch will be added (32+16 inter connected with 4 ports) Belle hopes to migrate all DTF2 data into this library by the end of TB/day! data must be moved from the old tape library to the new tape libraray 2004/09/27Nobu Katayama(KEK)29 Floor space required B DDDDDDD J D D D C C C C C C C 150EX150EX150EX B D C D 11,493mm 4,792mm 3,325 mm 3,970mm 3,100 mm 12,285mm Lib ~500TB ~141 BHSM ~150TB ~31 Backup ~13TB ~16 200BF200C 6,670 mm 2,950 mm CapacitySpace required DTF2663TB130m 2 SAIT1.29PB20m 2 SAIT library systems DTF2 library systems 2004/09/27Nobu Katayama(KEK)30 Use of LSF We have been using LSF since 1999 We started using LSF on PCs since 2003/3 Of ~1400 CPUs (as of 2004/11), ~1000 CPU are under LSF and being used for DST production, generic MC generation, calibration, signal MC generation and user physics analyses DST production uses it s own distributed computing (dbasf) and child nodes do not run under LSF All other jobs share the CPUs and dispatched from LSF For users we use fair share scheduling We will be trying to evaluate new features in LSF 6.0 such as report and multi-clusters hoping to use it a collaboration wide job management tool 2004/09/27Nobu Katayama(KEK)31 Rental system plan for Belle We have started the lengthy process of the computer acquisition for 2006 (-2010) 100,000 specCINT2000_rates (or whatever ) compute servers 10PB storage system with extensions possible fast enough network connection to read/write data at the rate of 2-10GB/s (2 for DST, 10 for physics analysis) User friendly and efficient batch system that can be used collaboration wide Hope to make it a Grid capable system Interoperate with systems at remote institutions which are involved in LHC grid computing Careful balancing of lease and yearly-purchases Production 2004/09/27Nobu Katayama(KEK)33 Online production and KEKB As we take data, we write raw data directly on tapes (DTF2), at the same time we run dst production code (event reconstruction) using 85 dual Athlon PC servers See Itoh san s talk on RFARM on 29 th in Online computing session Using the results of event reconstruction we send feedback to the KEKB accelerator group such as location and size of the collision so that they can tune the beams to maximize instantaneous luminosity, keeping them collide at the center We also monitor BBbar cross section very precisely and we can change the machine beam energies by 1MeV or so to maximize the number of BBbars produced The resulting DST are written on temporary disks and skimmed for detector calibration Once detector calibration is finished, we run the production again and make DST/mini-DST for final physics analysis Changed RF phase corresponding to.25 mm Vertex Z position Luminosity 2004/09/27Nobu Katayama(KEK)34 Online DST production system Server room Belle Event Builder DTF2 Control room E-hut Computer Center 1000base/SX 100base-TX PCs are operated by Linux Dell CX600 3com com 4400 Dual Athron 1.6GHz nodes input distributor output collector control disk server DNS NIS Planex FMG-226SX Dual PIII(1.3GHz) Dual Xeon(2.8GHz) Dual PIII(1.3GHz) (170 CPUs) 2004/09/27Nobu Katayama(KEK)35 Online data processing Run (Sept. 18, 2004) 211 pb accumulated Accepted 9,029,058 events Accept rate Hz Run time 25,060s Peak Lum was 9 10 33, we started running from Sept. 9 Level 3 software trigger Recorded events (52%) Used 186.6[GB] DTF2 tape library RFARM 82 dual Athlon servers Total CPU time: sec L4 outputs: events Hadrons DST:381GB(95K/ev) New SAIT HSM Skims: pair (5711 are written) Bhabha: (15855) Tight Hadron (45819) More than 2M BBbar events! 2004/09/27Nobu Katayama(KEK)36 DST production Reprocessing strategy Goal:3 months to reprocess all data using all KEK compute servers Often we have to wait for constants Often we have to restart due to bad constants Efficiency:50~70% History 2002: Major software updates 2002/7 Reprocessed all data till then (78fb in three months) 2003: No major software updates 2003/7 Reprocessed data taken since 2002/10 (~60fb in three months) 2004: SVD2 was installed in summer 2003 Software is all new and being tested till mid May 2004/7 Reprocessed data taken since 2003/10 (~130fb in three months) Elapsed time (actual hours) Setup Dbase update ToF (RFARM) dE/dx Belle incoming data rate (1fb 1 /day) Comparison with previous system New! Old L processed /day(pb ) day See Adachi/Ronga poster Using 1.12THz total CPU we achieved >5 fb 1 /day 5 fb 1 /day 2004/09/27Nobu Katayama(KEK)39 generic MC production Mainly used for physics background study 400GHz Pentium III~2.5fb /day 80~100GB/fb data in the compressed format No intermediate (GEANT3 hits/raw) hits are kept. When a new release of the library comes, we try to produce new generic MC sample For every real data taking run, we try to generate 3 times as many events as in the real run, taking Run dependence Detector background are taken from random trigger events of the run being simulated into account At KEK, if we use all CPUs we can keep up with the raw data taking ( 3) MC production We ask remote institutions to generate most of MC events We have generated more than 2 10 9 events so far using qq 100TB for 3 300 fb of real data Would like to keep on disk M events produced since Apr. 2004 2004/09/27Nobu Katayama(KEK)40 Random number generator Pseudo random number can be difficult to manage when we must generate billions of events in many hundred thousands of jobs. (Each event consumes about one thousand random numbers in one mille second. Encoding the daytime as the seed may be one solution but there is always a danger that one sequence is the same as another We tested a random generator board using thermal noises. The board can generate up to 8M real random numbers in one second We will have random number servers on several PC servers so that we can run many event generator jobs in parallel Network/data transfer 2004/09/27Nobu Katayama(KEK)42 Networks and data transfer KEKB computer system has a fast internal network for NFS We have added Belle bought PCs, now more than 500 PCs and file servers We have connected Super- SINET dedicated 1Gbps lines to four universities We have also requested that we connect this network to outside KEK for tests of Grid computing Finally, a new Cisco 6509 has been added to separate the above three networks A firewall and login servers make the data transfer miserable (100Mbps max.) DAT tapes to copy compressed hadron files and MC generated by outside institutions Dedicated GbE network to a few institutions are now being added Total 10Gbit to/from KEK being added Still slow network to most of collaborators 2004/09/27Nobu Katayama(KEK)43 Data transfer to universities KEK Computing Research centor Osaka U Nagoya U 1TB/day ~100Mbps ~ 1TB/day 400 GB/day ~45 Mbps 170 GB/day NFS e + e - B o B o Tohoku U 10Gbps Tsukuba Hall Belle We use Super-SINET, APAN and other international Academic networks as the back bone of the experiment U of Tokyo TIT Australia U.S. Korea Tiwan Erc. 2004/09/27Nobu Katayama(KEK)44 Grid Plan for Belle Belle hasn t made commitment to Grid technology (We hope to learn more here) has not but institutions might have been involved, in particular, when they are also involved in one of the LHC experiments Parallel/distributed computing aspect Nice but we do have our own solution Event (trivial) parallelism works However, as we accumulate more data, we hope to have more parallelism (several tens to hundreds to thousands of CPUs) Thus we may adopt a standard solution Parallel/distributed file system Yes we need something better than many UNIX files on many nodes For each run we create ~1000 files; we have had O(10,000) runs so far Collaboration wide issues WAN aspect Network connections are quite different from one institution to another authentication We use DCE to login We are testing VPN with OTP CA based system would be nice resource management issues We want submit jobs to remote institutions if CPU is available 2004/09/27Nobu Katayama(KEK)45 Belle s attempts We have separated stream IO package so that we can connect to any of (Grid) file management packages We started working with remote institutions (Australia, Tohoku, Taiwan, Korea) See our presentation by G. Molony on 30 th in Distributed Computing session SRB (Storage Resource Broker by San Diego Univ.) We constructed test beds connecting Australian institutes, Tohoku university using GSI authentication See our presentation by Y. Iida on 29 th in Fabric session gfarm (AIST) We are using gfarm (cluster) at AIST to generate generic MC events See presentation by O. Tatebe on 27 th in Fabric session 2004/09/27Nobu Katayama(KEK)46 Human resources KEKB computer system+Network Supported by the computer center (1 researcher, 3~4 system engineers+1 hardware engineer, 2~3 operators) PC farms and Tape handling 1 KEK/Belle researcher, 2 Belle support staffs (they help productions as well) DST/MC production management 2 KEK/Belle researchers, 1 pos-doc or student at a time from collaborating institutions Library/Constants database 2 KEK/Belle researchers + sub detector groups We are barely surviving and are proud of what we have been able to do; make the data usable for analyses Super KEKB, S-Belle and Belle computing upgrades 2004/09/27Nobu Katayama(KEK)48 More questions in flavor physics The Standard Model of particle physics cannot really explain the origin of the universe It does not have enough CP violation Models beyond the SM have many CP violating phases The is no principle to explain flavor structure in models beyond the Standard model Why V cb >>V ub What about the lepton sector? Is there a relationship between quark and neutrino mixing matrices? We know that Standard Model (incl. Kobayashi- Maskawa mechanism) is the effective low energy description of Nature. However most likely New Physics lies in O(1) TeV region. LHC will start within a few years and (hopefully) discover new elementary particles. Flavor-Changing-Neutral-Currents (FCNC) suppressed New Physics w/o suppression mechanism excluded up to 10 3 TeV. New Physics Flavor Problem Different mechanism different flavor structure in B decays (tau, charm as well) Luminosity upgrade will be quite effective and essential. 2004/09/27Nobu Katayama(KEK)49 Roadmap of B physics Discovery of CPV in B decays Precise test of SM and search for NP Study of NP effect in B and decays Identification of SUSY breaking mechanism time or integrated luminosity Yes!! 2001 sin2 1, CPV in B , 3, V ub, V cb, b s , b s ll, new states etc. Anomalous CPV in b sss NP discovered at LHC (2010?) Now 280 fb -1 if NP=SUSY Tevatron (m~100GeV) LHC (m~1TeV) KEKB (10 34 ) SuperKEKB (10 35 ) Concurrent program 2004/09/27Nobu Katayama(KEK)50 Projected Luminosity SuperKEKB Oide scenario 50ab -1 2004/09/27Nobu Katayama(KEK)51 UT Measurements 5 ab 1 Two cases at sin2 1 =0.013, 2 =2.9 o, 3 =4 o, |V ub |=5.8% New Physics ! SM only estimated from current results 2004/09/27Nobu Katayama(KEK)52 With more and more data We now have more than 250M B 0 on tape. We can not only observe very rare decays but also measure time dependent asymmetry ot the rare deay modes and determine the CP phases 2716 events 68 events enlarged 2004/09/27Nobu Katayama(KEK)53 New CPV phase in b s J/ Ks Ks 5 ab 1 ( S =0.24 summer 2003 b s WA) 6.2 S S f S J/ Ks 2004/09/27Nobu Katayama(KEK)54 SuperKEKB: schematics 8 GeV e + beam 4.1 A 3.5 GeV e beam 9.6 A L I y y * (Super Quad) Super Belle 4 x ~ 24 2004/09/27Nobu Katayama(KEK)55 Detector upgrade: baseline design / K L detection Scintillator strip/tile Tracking + dE/dx small cell/fast-gas + larger radius CsI(Tl) 16X 0 Si vtx. det. SC solenoid 1.5T lower? 2 pixel lyrs. + 3 lyr. DSSD pure CsI (endcap) PID TOP + RICH R&D in progress 2004/09/27Nobu Katayama(KEK)56 Computing for super KEKB For (even) luminosity; DAQ:5KHz, 100KB/event 500MB/s Physics rate: bytes/year:1PB/year 800 4GHz CPUs to catch up data taking GHz 4CPU PC servers 10+PB storage system (what media?) 300TB-1PB MDST/year online data disk Costing >50 M Euro? 2004/09/27Nobu Katayama(KEK)57 Conclusions Belle has accumulated 274M BBbar events so far (July, 2004); data are fully processed; everyone is enjoying doing physics (tight competition with BaBar) A lot of computing resource are added to the KEKB computer system to handle floods of data The management team remains small (less than 10 KEK staffs (not full time, two of whom are group leasers of sub detectors) + less than 5 SE/CE) in particular, the PC farms and the new HSM are managed by a few people We look for the GRID solution for us at KEK and for the rest of Belle collaborators 2004/09/27Nobu Katayama(KEK)58 Belle jargon DST: output files of good reconstructed events MDST: mini-DST, summarized events for physics analysis (re)-production/reprocess Event reconstruction process (Re)-run all reconstruction code on all raw data reprocess:process ALL events using a new version of software Skim (Calibration and Physics) pairs, gamma-gamma, Bhabha, Hadron, Hadron, J/ , single/double/end-point lepton, b s , hh generic MC (Production) A lot of jetset c,u,d,s pairs/QQ/EvtGen generic b c decays MC events for background study 2004/09/27Nobu Katayama(KEK)59 Belle Software Library CVS (no remote check in/out) Check-ins are done by authorized persons A few releases per year Usually it takes a few weeks to settle down after a major release as we have no strict verification, confirmation or regression procedure so far. It has been left to the developers to check the new version of the code. We are now trying to establish a procedure to compare against old versions All data are reprocessed/All generic MC are regenerated with a new major release of the software (at most once per year, though) Reprocess all data before summer conferences In April, we have a version with improved reconstruction software Do reconstruction of all data in three months Tune for physics analysis and MC production Final version before October for physics publications using this version of data Takes about 6 months to generate generic MC samples Library cycle (2002~2004) 0416 0424 0703 1003 (SVD1) (+) (SVD2) Overview Main changes Based on the RFARM Adapted to off-line production Dedicated database server Input from tape Output to HSM Stable, light, fast Input bcs03 Tape Master basf Output bcs202 basf zfserv Typical DBASF cluster Overview CPU power 34% 15% 51% 1.26 2004/09/27Nobu Katayama(KEK)62 Belle PC farms Have added as we take data 99/06:16 4CPU 500MHz Pen III 00/04:20 4CPU 550MHz Pen III 00/10:20 2CPU 800MHz Pen III 00/10:20 2CPU 933MHz Pen III 01/03:60 4CPU 700MHz Pen III 02/01:127 2CPU 1.26GHz Pen III 02/04:40 700MHz mobile Pen III 02/12:113 2CPU Athlon 03/03:84 2CPU 2.8GHz Xeon 04/03:120 2CPU 3.2GHz Xeon 04/11:150 2CPU 3.4GHz Xeon(800) We are barely catching up We must get a few to 20 TFLOPS in coming years as we take more data Computing resources Int. Luminosities Summary and prospects Major challenges Database access (very heavy load at job start) Input rate (tape servers are getting old) Network stability (shared disks, output,) Old system: /fb/day (online) Production performance (CPU hours) Farm 8 operational! > 5.2 fb /day