Top Banner
DOIs and Supercomputing DataCite Summer 2013 Meeting Terry Jones, Sudharshan Vazhkudai, Doug Fuller Oak Ridge National Laboratory
27

2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

May 06, 2015

Download

Technology

datacite

2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.

Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30

Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

DOIs and Supercomputing

DataCite Summer 2013 Meeting

Terry Jones, Sudharshan Vazhkudai, Doug Fuller

Oak Ridge National Laboratory

Page 2: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

2 Terry JonesDataCite Summer 2013 / Washington DC

Why Supercomputers!? Because Innovation Drives The Economy…

• Over the last 5 years, 38% of the international innovation “R&D 100” awards went to US National Labs

2009 2010 2011 2012 20130

5

10

15

20

25

30

35

40

45

50

• This was done with YOUR tax money

• Ideas shape the course of history – John Maynard Keynes

• The central goal of economic policy should be to spur higher productivity through greater innovation – Joseph Schumpeter’s Innovation Economics

Page 3: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

3 Terry JonesDataCite Summer 2013 / Washington DC

Why Supercomputers!? (part 2) …And in 2013, Supercomputers Drive Innovation

Computers have changed the way we conduct experiments. Given enough computer power, we can perform accurate experiments more quickly, more cheaply, and often with greater control.

Page 4: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

4 Terry JonesDataCite Summer 2013 / Washington DC

The New Laboratory: High-Performance Computing yields breakthroughs

Page 5: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

5 Terry JonesDataCite Summer 2013 / Washington DC

Big Problems Require Big Solutions

Energy

Healthcare

Competitiveness

OLCF resources are available to academia and industry through open, peer-reviewed allocation mechanisms.

Page 6: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

6 Terry JonesDataCite Summer 2013 / Washington DC

• High Performance Production Computing for the Office of Science

• Characterized by a large number of projects (over 400) and users ( over 4800)

• Leadership Computing for Open Science• Characterized by a small number of projects ( about 50) and

users (about 800) with computationally intensive projects

• Linking it together – ESnet• Investing in the future – R&E Prototypes

ESnet

Titan at ORNL (#2)

Mira at ANL (#5)

Hopper at LBNL (#24)

June 2013

DOE Office of Science HPC User Facilities

Page 7: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

7 Terry JonesDataCite Summer 2013 / Washington DC

DOE Office of Science HPC User Facilities

Super Scale:Bytes & Bandwidth

Page 8: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

8 Terry JonesDataCite Summer 2013 / Washington DC

With Big Computations Comes Big Data

• DOE HPC User Facilities produce enormous volumes of data

• Each User Facility has tertiary (archival) storage, often HPSS – statistics for one such computer center pictured here

• In addition, each center provides secondary storage – for example: a 10PB Lustre parallel file system

DoublingEveryYear

Page 9: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

9 Terry JonesDataCite Summer 2013 / Washington DC

• Part of a Collaborative DOE Office of Science program at ORNL and ANL

• Mission: Provide the computational and data resources required to solve the most challenging problems.

• Access to the most powerful computer in the world for open access computing (Titan)

• Highly competitive user allocation programs (INCITE, ALCC).

• Projects receive 10x to 100x more resource than at other generally available centers.

• OLCF centers partner with users to enable science & engineering breakthroughs (Liaisons, Catalysts).

Oak Ridge Leadership Computing Facility (OLCF) -- A Leading DOE User Facility

Page 10: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

10 Terry JonesDataCite Summer 2013 / Washington DC

We have increased our system capability by 10,000 times since 2004

• Strong partnerships with supercomputer vendors.• LCF users employ large portions of the machine for large fractions of time.• Strong partnerships with our users to scale codes and algorithms.

Page 11: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

11 Terry JonesDataCite Summer 2013 / Washington DC

OLCF Future (Based On Extrapolation)

Jaguar: 2.3 PFLeadership

system for science

Titan (OLCF-3): 10–20 PF

Leadership system2009 2012 2016 2019

OLCF-5: 1 EF

OLCF-4: 100–250 PF

• Computer system performance increases through parallelism– Clock speed trend flat to slower over coming years

– In the last 28 years, systems have scaled from 64 cores to ~300,000

– Applications must utilize all inherent parallelism

• Our compute and data resources have grown 10,000X over the decade, are in high demand, and are effectively used.

Page 12: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

12 Terry JonesDataCite Summer 2013 / Washington DC

The Data Deluge

2013 4PB disk & 34PB tape [Titan] 2017 64PB disk & 600PB tape [Coral] 2021 1EB disk & 10EB tape (?)

• Key Challenge: Make Sense of So Much Data

• We’ll Need Better Tools

• If “many hands make light work,” how can we enable more people to make sense of the data?

FIND THE NEEDLE IN THE HAYSTACK

Page 13: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

13 Terry JonesDataCite Summer 2013 / Washington DC

What Breakthroughs Are We Missing?

• HPC will remain important to Scientific Discovery– Important for Climate, Material Science, Energy Security

• Today, the state-of-the-art is (still!) bibliographic publications

• But The Gains From Bibliographic Sharing Are Limited– Constraints in paper length– Limited Focus of paper– Limited ability to convey with graphs, figures, tables

• Urgently Needed: A Quick Way To ‘Enable’ Data

Page 14: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

14 Terry JonesDataCite Summer 2013 / Washington DC

New External Drivers for Supercomputing Centers

• The push is on to squeeze more results from High-Performance Computing– Scientists have difficulty in replicating (or even understanding) other’s results– Tax payers want more openness– The Holdren memo

Page 15: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

15 Terry JonesDataCite Summer 2013 / Washington DC

Our Response: Make Supercomputer Produced Data As Widely Available As Possible

• DOIs provide the necessary mechanism & implementation

• Makes sense for OLCF (uniquely qualified for 100TB datasets)

• Will benefit from DataCite’s integration with Thomson Reuter’s data citation index and other services.

• Already successful for sensor-driven research like NASA

• As research goes forward, the project Principal Investigator stores “appropriate data”– Presumably, if data can support a bibliographic result (graph, figure, data), the data is worth

curation.

• After curation, the data is available to the entire scientific community

✔ Helps OLCF with ‘research tracking’ ✔ Helps OLCF with ‘reporting to sponsors’✔ Helps OLCF resolve data disposition questions✔ All The Traditional Benefits To Researchers

Page 16: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

16 Terry JonesDataCite Summer 2013 / Washington DC

DOI BenefitsDOI Benefits

• Identify & Cite key data products of interest and value, and annotate them.

• Safely share data with their collaborators even before publishing the result in a scientific communication.

• Future data analyses can easily feed off of the data products, fostering a highly dynamic, and collaborative environment.

From User’s Perspective, DOIs can: From Sponsor’s Perspective, DOIs can:

• Help with research tracking and identifying the major results coming out of a project allocation on the center’s resources.

• Aid in reporting to sponsors.• Since the DOIs also capture some basic metadata along with the

index, it can help the center to answer questions on the disposition of the data, search and discover them.

From Center’s Perspective, DOIs can:

• Added benefit of seeing data sharing flourish within the community, and more data analyses spawned from the data products.

• Both users and centers that the sponsor funds now have rich tools for data management.

• Preserve data products for a longer-term, much beyond the expiration of their projects at the centers.

• Satisfy requirements from funding agencies on data management plans in terms of long-term preservation, sharing and dissemination of research results.

• DOIs enable more value for the dollar spent. In addition to software tools, research artifacts, and papers, there is now a new entity, the citable data product.

• Better utilization of HPC center resources.

• Provides a tool the to cull the data holdings. Provide tangible policies to users for long-term data preservation.

• Evolve to support “data-only” users through data science tools such as DOIs.

• Provide an opportunity for our center to distinguish itself from other centers (they have the best data tools)

Page 17: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

17 Terry JonesDataCite Summer 2013 / Washington DC

Workflow for DOI Creation

1. User creates data

2. User requests DOI

3. ORNL requests DOI

4. OSTI provides DOI

5. DOI stored at data portal

6. Request Permanent Data Copy

7. Data Migrated to

Archive

8. Archive success response

9. DOI success response

Page 18: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

18 Terry JonesDataCite Summer 2013 / Washington DC

Workflow for DOI Data Retrieval

1. User provides

search criteria

4. Request Data Subset

5. Data Migrated for

Upload

2. Matches found via Metadata

3. User identifies

needed data

6. User retrieves data

Page 19: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

19 Terry JonesDataCite Summer 2013 / Washington DC

Some Challenges Are Expected

• How will permanent data storage be funded?– Projects last 3 years.

• Researchers are affiliated with institutions that have their own data policies.– For example, the Princeton Plasma Physics Lab may have policies affecting how we can support

it’s fusion projects.

• Some fields will require effort to make their data “portable” for a wide audience.– Astrophysics has a standard file format, Fusion does not.

• Developing good metadata is a human intensive effort– Getting PIs to provide the metadata– Looking to OSTI & DataCite for some help with DOI Q&A

Page 20: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

20 Terry JonesDataCite Summer 2013 / Washington DC

…More Challenges

• What about Authenticated access to data? Or malicious users in general...

• What about the long-term QA aspects of maintaining data?

• What about the logistics of very large data?

– Staging

– Retrieving huge files (can’t be on disk)

Where’s The Data?

Page 21: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

21 Terry JonesDataCite Summer 2013 / Washington DC

Current Project Status

• Provided a DOI recommendation for the Center– Pros and Cons– Long term implications

• Designed the Workflow

• Created infrastructure to support the workflow– Frontend infrastructure for storing & DOI association– Backend infrastructure for search & retrieval

• Having conversations with a few selected HPC user communities1. Astrophysics2. Groundwater Simulation3. Climate4. Turbulence5. Fusion

Page 22: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

22 Terry JonesDataCite Summer 2013 / Washington DC

Summary

• High Performance Computing & Data are integral to scientific discovery

• Bibliographic publications cannot contain the wealth of insight available in the raw data

• ORNL is leading an effort to make HPC data available to all with DOIs

• In the future, “Publish” to a scientist will probably refer to obtaining a DOI for a supercomputer dataset

Page 23: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

23 Terry JonesDataCite Summer 2013 / Washington DC

Acknowledgements

• OLCF DOI Team– Sudharshan Vazhkudai– Doug Fuller– Terry Jones

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

• OSTI Support– Mark Martin– Jannean Elliott

• ORNL Support– Jack Wells– Giri Palanisamy – John Cobb– Stan White

Page 24: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

24 Terry JonesDataCite Summer 2013 / Washington DC

Questions?

[email protected]

Page 25: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

25 Terry JonesDataCite Summer 2013 / Washington DC

Extra Viewgraphs

Page 26: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

26 Terry JonesDataCite Summer 2013 / Washington DC

High-Temperature Superconductivity Biofluidic Systems Plasma Physics Cosmology

Taking a Quantum Leap in Time to Solution for Simulations of High-TC

Superconductors

19 Petaflops Simulation of Protein

Suspensions in Crowding Conditions

Radiative Signatures of the Relativistic Kelvin-Helmholtz

Instability

HACC: Extreme Scaling and

Performance Across Diverse Architectures

Titan Titan Titan Sequoia, Mira, Titan

How Does The OLCF Compare With Other Centers?

Peter Staar ETH Zurich

Massimo BernaschiICNR-IAC Rome

Michael Bussmann HZDR - Dresden

Salman HabibANL

Four of Six SC13 Gordon Bell Finalists Used Titan

Page 27: 2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Ridge National Laboratory)

27 Terry JonesDataCite Summer 2013 / Washington DC

The New Laboratory (continued): High-Performance Computing is widely applicable