Top Banner
Data GRID deployment i n HEPnet-J Takashi Sasaki Computing Research Center KEK
20

Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Jan 13, 2016

Download

Documents

Dayna Quinn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Data GRID deployment in HEPnet-J

Takashi Sasaki

Computing Research Center

KEK

Page 2: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Who we are?

• KEK stands for – Kou = High– Enerugi = Energy – Kasokuki Kennkyu Kiko = Accelerator Research

Organization

• We are one of the governmental agencies as like other national universities and national laboratory in Japan since the year 2004– We are an Inter-University Research Institute

Corporation

Page 3: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Major projects at KEK• Belle

– CP violation– @KEK

• K2K, T2K– Neutrino– KEK/Tokai to Kamioka

• CDF– Hadron collider, top quark – Fermi Lab., US

• ALTAS– Hadron collider, SUSY– CERN, Switerland

• J-PARC– Joint project with JAEA– Being built at Tokai

• ILC– International Linear Collider– Site still note decided

• International competition • Japan has interests to host

• Lattice QCD– Dedicated IBM blue gene

• 57.3TFlops• Material and life science

– Synchrotron radiation • Muon and meson science

• Technology transfer – Medical applications

• Simulation • Accelerator

Page 4: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

HENP institutes in Japan

• KEK is the only central laboratory in Japan• Smaller scale centers are exist also

– ICEPP(U Tokyo), Riken, Osaka Univ. and a few • Majorities are smaller groups in universities

– Mostly 1-3 faculties and/or researchers and graduate students

• No engineers nor no technicians for IT• This is not HENP specific, but commonly observed

– KEK has a role to offer necessary assistance to them• Mostly graduate students in physics are the

main human resource to support IT unfortunately

Page 5: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

HEPnet-J

• Originally, KEK organized HEP institutes in Japan to provide the networking among them – We started from 9600bps DECnet in early 1980’s– KEK is one of the first Internet sites and the first web

site in Japan (1983? and 1992)• This year, Super SInet3 will be introduced with

20Gbps and 10Gbps to main nodes as the final upgrade– Shift to more application oriented rather than the band

width – GRID deployment is an issue – Virtual Organization for HEP Japan

Page 6: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

History of HEPnet-J

2003 Super SInet (backbone) IP 10Gbps

Page 7: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Belle at KEK

Page 8: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.
Page 9: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Belle collaboration 13 countries 55 institutes J an. 2006

Country number of people Institute breakdown of numberKorea 38 Chonnam National Univ. 2

Ewha Womans Univ. 2Gyeongsang National Univ. 2Korea Univ. 3Kyungpook National Univ. 8Seoul National Univ. 9Sungkyunkwan Univ. 7Yonsei Univ. 5

U. S. A. 32 Univ. of Cincinnati 8Univ. of Hawaii 12Univ. of Pittsburgh 1Princeton Univ. 4RIKEN - Brookhaven National Laboratory Research Center - 3Virginia Polytechnic Institute and State Univ. 4

Taiwan 27 National Central Univ. 6National Taiwan Univ. 19National United Univ. (National Lien- Ho Institute of Technology) 2

Russia 34 Budker Institute of Nuclear Physics 19Institute for High Energy Physics 3Institute for Theoretical Experimental Physics 12

China 19 Institute of High Energy Physics Chinese Academy of Science, 12Peking Univ. 2Univ. of Science and Technology of China 5

Australia 18 Univ. of Melbourne 10Univ. of Sydeny 8

Poland 10 The Henryk Niewodniczanski Institute of Nuclear Physics - Polish Academy of Science- 10Austria 8 Institute of High Energy Physics 8Slovenia 13 J ozef Stefan Institute 12

Nova Gorica Polytechnic 1India 9 Panjab Univ. 3

Tata Institue of Fundamental Research 6Swiss 8 Laboratory for High Energy Physics (LPHE) - EPF- Lausanne 8Germany 1 Univ. of Frankfurt 1sub total 217 217

J apan 178 Chiba Univ. 5Hiroshima Institute of Technology 1Kanagawa Univ. 1Kyoto Univ. 3Nagoya Univ. 21Nara Women's Univ. 7Nippon Dental Univ. 1Niigata Univ. 6Osaka City Univ. 4Osaka Univ. 6RIKEN - Wako- 1Saga Univ. 12Shinshu Univ. 4Toho Univ. 5Tohoku Gakuin Univ. 4Tohoku Univ. 9Univ. of Tokyo 10Tokyo Institute of Technology 8Tokyo Metroporitan Univ. 5Tokyo Univ. of Agriculture and Technology 2Toyama National College of Maritime Technology 1Univ. of Tsukuba 2KEK 60

sub total 178

TOTAL 395

Page 10: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Data flow model in Belle• At every beam crossing, an interaction between particles happens and final state

particles are observed by the detector – Event

• Different type of interactions may happen at each beam crossings • Events are in time sequence • Something like one picture in the movie film

– Run • Something like a role of the movie film • Cut at a good file size for later processing (historically a size of a tape, 2GB or 4GB)

– Data from the detector (signals) are called as “raw data” • Physical properties for each particles are “reconstructed”

– Vectorization of images and conversions of units – a signal processing

• Events are classified into types of interactions (pattern matching)– Data Summary Tape (DST)

• More condensed events samples are selected from DST– Something like a knowledge discovery in images– Called Mini DST– Detector signals are striped – Sometimes, subset of mini DST, micro DST is produced

Page 11: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Belle data analysis• Frequency of reprocessing

– Reconstruction from raw data• One a year or less

– DST production • Twice a year or less

– Mini DST production • Many times

– Micro DST production • Many times

– End users analysis • Every day, very many times

• Monte Carlo production – More than number of real data– More likely CPU intensive jobs

• Full simulation • Fast simulation

• Event size– 40KB in raw data (signal only)

• Record rate– 10MB/sec

• Accumulated event in total– 1 PB

Page 12: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Event processing

• Reconstruction and DST production is done on site due to large data size

• Physics analysis jobs are executed locally against miniDST or microDST, and also MC– What they are doing mainly is statistical analysis and visualizatio

n of histograms • Also software development

• Official jobs, like MC production, cross the levels– CPU intensive jobs

• miniDST and microDST production are done by sub-groups and can be localized

• Most of jobs are integer intensive than floating points– Many branches in the code

Page 13: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Data Distribution Model in Belle

• Level 0 (a few PB)– Only KEK has raw data and

reconstructed data– Whole MC data

• Level 1 (a few 10TB) – Big institutions may want a

replica of DST– Join MC production

• Level 2 (a few 100GB)– Most of institutions are

satisfied with mini DST – Join May join MC production

• Smaller institutions may satisfied with micro DST even

• Collaboration wide data set – Raw data– Reconstructed data– DST– MC events (background+

signal)

• Sub group wide data set– Mini DST– Micro DST– MC events (signals)

Page 14: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

GRID deployment at KEK• Bare Globus

– Up to GT2 and gave up to follow• We have our own GRID CA

– In production since this January– Accredited by APGRID PMA

• Two LCG sites and one test bed – KEK-LCG-01

• For R&D– KEK-LCG-02

• For production• Interface to HPSS

– Test bed • Training and tests

• NAREGI test bed– Under construction

• SRB (UCSD)– GSI authentication or password– SRB-DSI became available

• Works as SRM for the SRB world from LCG side

• Performance test will be done– Performance tests among RAL, C

C-IN2P3 and KEK is on going• Gfarm

– Collaboration with AIST

Page 15: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

GRID deployment

• ATLAS definitely require LCG/gLite– ICEPP (International Center for Elementary Particle Physics), U

of Tokyo will be a tier-2 center of ATLAS• They have degraded from tier-1• One professor, one associate professor and a few assistant

professors are working on the tier-2 center– No technician, no engineer nor no contractors, but only “physicists” – Can you believe this?

– How other ATLAS member institutes, mostly smaller groups, can survive?

• Belle – Some of the collaborators requested us to support a GRID

environment for data distribution and efficient analysis – Some time their collaborators also join either of LHC

experiments • They want to use the same thing for both

Page 16: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

LCG/gLite

• LCG (LHC Computing GRID) is now based on gLite 3.0.

• Only middleware available today to satisfy HEP requirements – US people are also developing their own

• Difficulty – Support

• Language gaps

– Quality assurance – Assumes rich man power

Page 17: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

NAREGI

• What we expect for NAREGI– Better quality – Easier deployment – Better support in the native language

• What we need but still looks not in NAREGI – File/replica catalogue and data GRID related

functionalities• Need more assessments

• Comes a little bit late– Earlier is better for us

• We need something working today!

• Require commercial version of PBS for β

Page 18: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

First stage plan• Ask NAREGI to implement LFC on their middleware

– We assume job submission between them will be realized – Share the same file/replica catalogue space between LCG/gLite and NA

REGI• Move data between them using GridFTP

– Try something by ourselves• Brute force porting of LFC on NAREGI

• NAREGI<->SRB<->gLite will be tried also

• Assessments will done for– Command level compatibility (syntax) between NAREGI and gLite– Job description languages – Software in experiments, especially ATLAS

• How depends on LCG/gLite?

Page 19: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Future strategy

• ILC, International Linear Collider, will be the target– interoperability among gLite, OSG and NARE

GI will be required

Page 20: Data GRID deployment in HEPnet-J Takashi Sasaki Computing Research Center KEK.

Conclusion

• HE(N)P has a problem to be solved today– GRID seems the solution, however, much hu

man resource consumption is the problem

• We expect much on NAREGI– Still we cannot escape from gLite – Interoperability is the issue

• We work on this issue together with NARGI and IN2P3