Evolution of storage and data management

Ian Bird

GDB: 12th May 2010

Ian.Bird@cern.ch 2

Discussion on evolving WLCG Storage and Data Management• Background:

– Experiments’ concern with performance and scalability of access to data • Particularly for analysis

– Concern over long term sustainability of existing solutions– Recognize:

• Evolution of technology (networking, file systems, etc)• Huge amounts of disk available to experiments (on aggregate!)• Other communities exist that: have solved similar problems; will have

similar problems soon– Short term concerns:

• Performance, scalability, bottlenecks (e.g. SRM)– The MONARC model assumed networking was a problem

• It is actually something we should invest more in

Ian.Bird@cern.ch 3

Scope• Focus on analysis use cases

– Production is in hand • Start from viewpoint of user access to data

– Hide details of back-end storage systems and etc.• Timescale for solutions in production – 2013 run

– But with incremental working prototypes that address some of the short term concerns• Should not be seen as a reason for new development projects

– Must use available tools as far as possible – must keep long term sustainability in mind• Working model:

– More network-centric than now– Few large archival repositories – long term data curation– “Cloud” of storage making optimal use of available capacity

• Should be a single effort of the community– There may be funding opportunities – but we should make sure that they help us in our

goals and obtaining what we need• Must be driven by the real needs of the experiments ...

– But we must learn the lesson of SRM – too many unrealistic “requirements” and no consensus on implementation

Ian.Bird@cern.ch 4

Technical areas• Data archives and storage cloud

– Simplify so that “tape” is really an archive– Allow remote data access when needed– Look into peer-peer technologies

• Data access layer– E.g. Xrootd/GFAL + some intelligence to determine when to cache/when to use

remote access• Output datasets

– Still need a service for (asynchronous) movement of datasets to an archive• Global home directory facility

– Model is a global file system. Industrial solutions available?• “Catalogues”

– Still need to locate data. Issue of consistency between different storage systems.• Authorization mechanisms

– For access to files in storage systems (archive + cloud), and quotas etc.

Ian.Bird@cern.ch 5

Next steps• Jamboree– 3-day workshop to look at existing tools in each of

the areas: June 16-18 in Amsterdam• Elaboration of a more concrete plan and

timelines; set up working groups– Could be at WLCG workshop in London, July 7-9

• Develop demonstrator prototypes in each area – testing of components/technologies

• Experiment testing – integration into frameworks

Ian.Bird@cern.ch 6

Jamboree agenda – 1 10:00 Welcome & Introduction to the Jamboree

- goals of the workshopIan

10:15 Presentation of a strawman for a new model of data access and management - should include background and rationale behind this

Experiment person (Ian F/Kors?)

11:00 Status of networking (cf expectation in 2001), prospects including intercontinental and far (e.g. Asia) connectivity

David Foster

11:45 Back-end storage - to be used as a true archive: implications /simplifications possible. Should discuss the need to not know about or have to access back-end storage (tape).==> What should be the interface to storage?

Dirk Duellmann?

12:30 Lunch 14:00 Needs from a data access layer - how analysis use needs to access data, etc. Experiment person

14:45 Data transfer use cases: peer-peer (many-many) or point-point. On demand vs scheduled.

Experiment person

15:30 Coffee 15:45 Namespaces, authorization needs, quotas, catalogues - what is needed? Experiment person

16:30 The need for global home directory service - missing so far. What are the use cases?

Experiment person

17:00 Conclusions - summary of discussion points, points arising. Progress on model? Short term improvements possible?

Ian.Bird@cern.ch 7

Agenda – 2 Technology

9:00 File systems. Summary of work on Hadoop, Lustre, GPFS, etc. Hepix WG?

9:45 Xrootd - outlook Fabrizio Furano

10:30 Coffee

10:45 FTS, LFC - what is OK, what is not? What is still useful? ??

11:30 Alien FC - why is it interesting? Pablo Saiz

12:15 P2P technologies - can we simply adapt them? ??

13:00 Lunch

14:00 ROOT - outlook and developments Rene Brun

Site Experiences

14:45 NDGF/ARC Josva Kleist

15:15 GridKa - xrootd with tape back-end ??

15:45 Coffee

16:00 CNAF - GPFS/TSM/Storm ??

16:30 Hadoop at a Tier 2 Brian Bockelman?

17:00 Summary and conclusions - are there areas for potential early demonstrators?

Ian.Bird@cern.ch 8

Agenda – 3 9:00 Conclusions of the discussions

Summaries from secretaries of Days 1 and 2

10:00 Next steps: points to address: - Draft an outline plan with timelines; - Create working groups - agree mandates and convenors. Set expectations for July workshop

11:00 Summary: draft summary of the workshop and conclusions for presentation to wider community

12:00 Close

Evolution of storage and data management

end storage systems

storage cloudsimplify

different storage systems

new model of data access

end storage tape

data managementbackground

scalability of access

evolving wlcg storage

Documents

Policy Regulated Management of Schema Evolution … ·...

Data and Storage Evolution in Run 2

Management 002 Evolution

OpenStack and Storage Foundation Evolution - Veritas ·...

Enrollment Management Evolution

Blades and Storage Evolution

The evolution of storage systems - Signal...

Evolution of Management

The Evolution of Data Storage Devices

Storage Management Technical Evolution Group Wahid Bhimji...

Beyond NAS and SAN: The Evolution of Storage

The Storage Evolution: From Blocks, Files and Objects to...

STORAGE MANAGEMENT Introduction to Information Storage and.....

Evolution of Storage Archi

Management Technical Specification, Management · Storage.....

Tailings storage facility landform evolution modelling ·.....