Top Banner
DISCOVER THE OCEAN. UNDERSTAND THE PLANET. BEYOND INFRASTRUCTURE GAPS CASRAI Canada ReConnect14 Benoît Pirenne, Director, User Engagement, Ocean Networks Canada. Ottawa, November 19, 2014
19
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

BEYOND INFRASTRUCTURE GAPS

CASRAI Canada ReConnect14 Benoît Pirenne, Director, User Engagement, Ocean Networks Canada. Ottawa, November 19, 2014

Page 2: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

OR: HOW WILL WE SOLVE RESEARCH DATA MANAGEMENT ISSUES IN CANADA?

CASRAI Canada ReConnect14 Benoît Pirenne, Director, User Engagement, Ocean Networks Canada. Ottawa, November 19, 2014

Page 3: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Why data management?❖ Research Data Management has recently received a lot of attention

- Science research equipment and programmes are costly to setup and/or operate and therefore data must be re-used and shared with many other users

- There is potential for new insight to emerge from a re-use of the data

- Too many (smaller) research programme don’t have a data management plan and data end up being lost

Page 4: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DM Activities

Sensors, Other Digital

DataArchive Initial Users

Other Users (≠ disciplines,

public)

Data Acquisition

Format translation,

data products

Page 5: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Challenges of DM❖ People focus on the hardware issues:

- That’s chasing the wrong rabbit!

- [LHC’s 25PB/yr]: “Storing the data is not a problem: hard drives are cheap and getting cheaper. The challenge is preserving knowledge that is less commonly stored — the software, algorithms and reference plots specific to each experiment. These often degrade or disappear with time”, says Cristinel Diaconu (nature.com Nov. 26, 2013)!

- Funding agencies prefer the hardware focus, because funding is a one-off!

Page 6: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

❖ Real challenge: data description (metadata) - Requires: gathering, indexing, describing and curating research data

at all stages of data collection, preparation, archival and distribution

- Metadata is essential for, and part of, data quality assessment

- Includes source, full description, calibration, annotations, space-time info, …, ownership, access authorizations, …

- Includes the link between data and resulting publications

Challenges of DM

Page 7: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DM Activities

Sensors, Other Digital

DataArchive Initial Users

Other Users (≠ disciplines,

public)

Data Acquisition

Format translation,

data products

Metadata

Page 8: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

❖ Real challenge: data description (metadata) - Not popular with funding agencies because metadata

requires having expert and dedicated staff to curate data

- Metadata requires software systems to be maintained to support the activity

- Metadata is a long term commitment

Challenges of DM

Page 9: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Challenges of DM❖ Data access

- Search through data (not always possible), search through metadata

- Metadata encoding and transport standards needed

- Data formats are discipline-specific

- Uniform, interoperable access is a huge challenge (e.g., VO)

Page 10: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Challenges of DM

- Convince PIs and funding agencies that good Data Management is important. - But this battle is by now almost won. (NSF, TC3+, … )

- New CFI Cyber-Infrastructure initiative to be announced to support most needs of data stewardship

Page 11: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

How can we afford DM?❖ Data Management is affordable

- Experience shows that across disciplines, the average cost to set up a DM is ~10% of the costs of the projects it supports

- Experience shows that the burden of operating a DM is about 10% of the overall projects operating costs

- DM costs fall down further when projects are no longer operational

Page 12: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Towards Data Stewardship facilities

❖ At the service of many projects in related disciplines ❖ Provides long-term data storage, access and stewardship, well beyond the lifetime of individual projects

❖ Need is particularly acute for small projects ❖ Avoid the creation of many ad-hoc systems that can’t be maintained long-term

❖ International quality standards exists (ICSU’s World Data System)

Page 13: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DM Activities

Sensors, Other Digital

DataArchive Initial Users

Other Users (≠ disciplines,

public)

Data Acquisition

Format translation,

data products

Data Stewardship Facilty

Page 14: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DSF for users❖ Address the following:

❖ Too many data repositories for similar datasets

❖ poorly described results

❖ untraceable sources

❖ unreadable digital media

❖ “abandoned”, inaccessible records

❖ incomplete dataset description

Page 15: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DSF for users❖ Are a one-stop-shop for data in a given discipline, and a portal to international resources

❖ Allow scientists to focus on science, not on data management

❖ Ensure stewardship of data beyond project funding ❖ Ensure data will remain citable

Page 16: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DSF for users- Buy-in from users and PIs regarding:

- Development of trust with external entities managing their data

- The definition of a(n open) data policy, sharing of data

- Being thorough with data/experimentation description (Metadata)

- Realizing that data management is not achieved with a bit of hardware and software

In progress: use of clouds

increasingIn progress: more

and more open data policies around

Page 17: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DSF for Funding agencies❖ Ability make economies of scale ❖ DSF have expertise in data management and relevant science disciplines ❖ DSF have the wherewithal to remain at the leading edge of technology ❖ Users already used to entrust their data to “the Cloud”, and work using remote compute resources

❖ With similar international peers, have a voice at the interoperability and standards table

❖ Newest CFI Cyber Infrastructure program is a step in the right direction

Page 18: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

❖ Has to deal with users for whom the data volumes are unheard of!

Challenges For DSF’s

Page 19: RDC - Benoit Pierenne: Data Interoperability I

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Canadian DSF examples❖ Canadian Astronomy Data Centre (CADC) is a great example of discipline specific Data Stewardship Facility

❖ Canadian Polar Data Network (CPDN) — includes multi-disciplinary data

❖ Canadian Research Data Centre Network (CRDCN) (social and population health statistics)

❖ …