Evolution of storage and data management
Post on 25-Feb-2016
38 Views
Preview:
DESCRIPTION
Transcript
Evolution of storage and data management
Ian Bird
GDB: 12th May 2010
Ian.Bird@cern.ch 2
Discussion on evolving WLCG Storage and Data Management• Background:
– Experiments’ concern with performance and scalability of access to data • Particularly for analysis
– Concern over long term sustainability of existing solutions– Recognize:
• Evolution of technology (networking, file systems, etc)• Huge amounts of disk available to experiments (on aggregate!)• Other communities exist that: have solved similar problems; will have
similar problems soon– Short term concerns:
• Performance, scalability, bottlenecks (e.g. SRM)– The MONARC model assumed networking was a problem
• It is actually something we should invest more in
Ian.Bird@cern.ch 3
Scope• Focus on analysis use cases
– Production is in hand • Start from viewpoint of user access to data
– Hide details of back-end storage systems and etc.• Timescale for solutions in production – 2013 run
– But with incremental working prototypes that address some of the short term concerns• Should not be seen as a reason for new development projects
– Must use available tools as far as possible – must keep long term sustainability in mind• Working model:
– More network-centric than now– Few large archival repositories – long term data curation– “Cloud” of storage making optimal use of available capacity
• Should be a single effort of the community– There may be funding opportunities – but we should make sure that they help us in our
goals and obtaining what we need• Must be driven by the real needs of the experiments ...
– But we must learn the lesson of SRM – too many unrealistic “requirements” and no consensus on implementation
Ian.Bird@cern.ch 4
Technical areas• Data archives and storage cloud
– Simplify so that “tape” is really an archive– Allow remote data access when needed– Look into peer-peer technologies
• Data access layer– E.g. Xrootd/GFAL + some intelligence to determine when to cache/when to use
remote access• Output datasets
– Still need a service for (asynchronous) movement of datasets to an archive• Global home directory facility
– Model is a global file system. Industrial solutions available?• “Catalogues”
– Still need to locate data. Issue of consistency between different storage systems.• Authorization mechanisms
– For access to files in storage systems (archive + cloud), and quotas etc.
Ian.Bird@cern.ch 5
Next steps• Jamboree– 3-day workshop to look at existing tools in each of
the areas: June 16-18 in Amsterdam• Elaboration of a more concrete plan and
timelines; set up working groups– Could be at WLCG workshop in London, July 7-9
• Develop demonstrator prototypes in each area – testing of components/technologies
• Experiment testing – integration into frameworks
Ian.Bird@cern.ch 6
Jamboree agenda – 1 10:00 Welcome & Introduction to the Jamboree
- goals of the workshopIan
10:15 Presentation of a strawman for a new model of data access and management - should include background and rationale behind this
Experiment person (Ian F/Kors?)
11:00 Status of networking (cf expectation in 2001), prospects including intercontinental and far (e.g. Asia) connectivity
David Foster
11:45 Back-end storage - to be used as a true archive: implications /simplifications possible. Should discuss the need to not know about or have to access back-end storage (tape).==> What should be the interface to storage?
Dirk Duellmann?
12:30 Lunch 14:00 Needs from a data access layer - how analysis use needs to access data, etc. Experiment person
14:45 Data transfer use cases: peer-peer (many-many) or point-point. On demand vs scheduled.
Experiment person
15:30 Coffee 15:45 Namespaces, authorization needs, quotas, catalogues - what is needed? Experiment person
16:30 The need for global home directory service - missing so far. What are the use cases?
Experiment person
17:00 Conclusions - summary of discussion points, points arising. Progress on model? Short term improvements possible?
Ian.Bird@cern.ch 7
Agenda – 2 Technology
9:00 File systems. Summary of work on Hadoop, Lustre, GPFS, etc. Hepix WG?
9:45 Xrootd - outlook Fabrizio Furano
10:30 Coffee
10:45 FTS, LFC - what is OK, what is not? What is still useful? ??
11:30 Alien FC - why is it interesting? Pablo Saiz
12:15 P2P technologies - can we simply adapt them? ??
13:00 Lunch
14:00 ROOT - outlook and developments Rene Brun
Site Experiences
14:45 NDGF/ARC Josva Kleist
15:15 GridKa - xrootd with tape back-end ??
15:45 Coffee
16:00 CNAF - GPFS/TSM/Storm ??
16:30 Hadoop at a Tier 2 Brian Bockelman?
17:00 Summary and conclusions - are there areas for potential early demonstrators?
Ian.Bird@cern.ch 8
Agenda – 3 9:00 Conclusions of the discussions
Summaries from secretaries of Days 1 and 2
10:00 Next steps: points to address: - Draft an outline plan with timelines; - Create working groups - agree mandates and convenors. Set expectations for July workshop
11:00 Summary: draft summary of the workshop and conclusions for presentation to wider community
12:00 Close
top related