(Open) Research Data Management in H2020 ISERD – Tel Aviv, Oct 31, 2016 @openaire_eu Natalia Manola OpenAIRE managing director Athena Research & Innovation Centre Credits The OpenAIRE team Sara Jones, Data Curation Center (DCC), UK Marjan Grootveld, DANS, NL
73
Embed
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
(Open) Research Data Management in H2020
ISERD – Tel Aviv, Oct 31, 2016@openaire_eu
Natalia ManolaOpenAIRE managing director
Athena Research & Innovation Centre
CreditsThe OpenAIRE team
Sara Jones, Data Curation Center (DCC), UKMarjan Grootveld, DANS, NL
Outline•OpenAIRE – who are we?•H2020 policies – what’s involved?•Research data management – what about?•Data Management Plan (DMP) – how to?•Lessons learnt – what to avoid?•OpenAIRE – what do we offer?
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 2
Who we are•EU project•In 24x7 operation since Dec 2010
• OpenAIRE• OpenAIREplus• OpenAIRE2020
•Consortium of 50 partners
•One of 5 key EU e-Infrastructures•A legal entity in 2017
Open Access experts
• Institutional, national and international perspectives on OA policies & e-InfrastructuresInformation & Computer Science experts
• Building efficient e-Infra technologies• State of the art technologies (big data, linked data)
Legal experts
• Legal &policy recommendations
Data communities• Best practices for data• Linking to data infrastructures
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 3
Human Network
50 Partners from every EU country, and beyondData centers, universities, libraries, repositories, legal experts
Digital Network
… fosters the social and technical links that enable Open Science in Europe and beyond
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
National Open Access Desks (NOADs)33 OA expert nodes in all Europe•(OA) Policy aligning•Technical assistance•Training 4
Integrated Scientific Information System
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 5
17.2 mi unique publications 760 validated data providers370Κ publications linked to projects from 6 funders28 K datasets linked to publications or funders3.5K links to software
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
H2020 Policies
The following European Commission branded slides come from the EC’s open access team and provide an overview to the key points. Content from Jean-Francois Dechamp and colleagues.
• Guidelines on Data Management in Horizon 2020http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
• Annotated model grant agreement, clause 29.3 http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/amga/h2020-amga_en.pdf
• New infographic summarising key policy points http://ec.europa.eu/research/press/2016/pdf/opendata-infographic_072016.pdf
Findable• Use metadata and specify standards for metadata
creation (if any). If there are no standards in your discipline describe what type of metadata will be created and how
• Search keywords • Persistent and unique identifiers such as DOIs or other
handles • File and folder naming conventions• Versioning of the datasets and clear version numbers
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 18
Metadata and documentation• Metadata and documentation is needed to find
and understand research data• Think about what others would need in order
to find, evaluate, understand, and reuse your data
• Get others to check the metadata to improve quality
• Use standards to enable interoperability
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 19
Where to find metadata standardsMetadata Standards DirectoryBroad, disciplinary listing of standards and toolsMaintained by RDA grouphttp://rd-alliance.github.io/metadata-directory
BiosharingA portal of data standards, databases, and policies for life, environmental and biomedical scienceshttps://biosharing.org
Accessible• Explain which data can’t be shared openly, if any• Specify how access will be provided in case of
restrictions, e.g., through a data committee, a license, or arranged with the repository
• Will methods or software tools needed to access the data (if any) be included or documented?
• Deposit the data and associated metadata, documentation and code preferably in certified repositories which support Open Access
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 21
Where to find a repository?More information: https://www.openaire.eu/opendatapilot-repository
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
What to deposit?a. the data needed to validate the results
presented in scientific publications, including the metadata;
b. any other data, including the metadata, as specified in the DMP;
c. plus for a-b the documentation and the tools that are needed to validate the results, e.g. specialised software or software code, algorithms and analysis protocols (when possible, these instruments themselves).
Choose appropriate file formatsIf you want your data to be re-used and sustainable in the long-term, you typically want to opt for open, non-proprietary formats. Type Recommended Avoid for data
File format considerations•No clearcut definitions of “sustainable file format”•Each archives has its own expertise, relative to its designated community Examples:
What should be preserved and shared?• The data needed to validate results in scientific publications
(minimally!).• The associated metadata: the dataset’s creator, title, year
of publication, repository, identifier etc.• Follow a metadata standard in your line of work, or a
generic standard, e.g. OpenAIRE or DataCite, and be FAIR.• Documentation: code books, lab journals, informed consent
forms – domain-dependent, and important for understanding the data and combining them with other data sources.
• Software, hardware, tools, syntax queries, machine configurations – domain-dependent, and important for using the data. (Alternative: information about the software etc.)RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Basically, everything that is needed to replicate a study should be available. Plus everything that is potentially
•When regenerating data is cheaper than archiving, don’t archive. Select what data you’ll need and want to retain.
•10 years is often stated in data policies and academic codes, but data can be valuable for ages, in climatology, sociology, health sciences, astronomy, linguistics, … Look beyond minimal retention periods where relevant.
•“The lifetime of software is generally not as long as that of data” (Daniel Katz e.a. http://bit.ly/2eScCKp)
How much does it cost? Who pays?• What are the costs for making data FAIR in your project? • Resources needed for long term preservation• Check the UK Data Service Costing model • The High Level Expert Group on the European Open
Science Cloud recommends that “well budgeted data stewardship plans should be made mandatory and we expect that on average about 5% of research expenditure should be spent on properly managing and stewarding data”
• Who pays? How?UKDS model http://www.data-archive.ac.uk/create-manage/planning-for-sharing/costing
What is a data management plan?A brief plan written at the start of a project to define:• how the data will be created?• how it will be documented?• who will access it?• where it will be stored?• who will back it up?• whether (and how) it will be shared & preserved?
DMPs are often submitted as part of grant applications, but are useful whenever researchers are creating data
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
The DMP is a living document. You are not required to provide detailed answers to all the questions
in the first version of the DMP (due M6)
37
Explain any selection criteria in the DMP
DMPonlineA web-based tool to help researchers write DMPsIncludes a template for Horizon 2020 – FAIR principles
https://dmponline.dcc.ac.uk RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 38
Or choose your funder to get their specific templatePick your uni to add local guidance and to get their template if no funder appliesChoose any additional optional guidance
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 39
DMPonline for H2020
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Focus on how you will ensure your data
are “FAIR”
KEEP IT UP TO DATE
40
Example DMP plans
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 41
Example plans• 108 DMPs from the National Endowment for the Humanities
Travelling wave tube based w‐band wireless networks with high data rate distribution, spectrum& energy efficiency
The academic institutions participating in TWEETHER have available appropriate depositories which in fact are linked to
OpenAIRE.
Apart from these repositories, TWEETHER will also use ZENODO to ensure the maximum dissemination of the information
generated in the project (research publications and data)…
51
Example metadata description
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Minimal metadata
More than one dataset? Describe generically what is possible and dataset-specific what is necessary.
52
Data description examplesThe final dataset will include self-reported demographic and behavioural data from interviews with the subjects and laboratory data from urine specimens provided.
NIH data sharing statements
Every two days, we will subsample E. affinis populations growing under our treatment conditions. We will use a microscope to identify the life stage and sex of the subsampled individuals. We will document the information first in a laboratory notebook and then copy the data into an Excel spreadsheet. The Excel spreadsheet will be saved as a comma separated value (.csv) file.
DataOne – E. affinis DMP exampleRDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 53
Metadata will be tagged in XML using the Data Documentation Initiative (DDI) format. The codebook will contain information on study design, sampling methodology, fieldwork, variable-level detail, and all information necessary for a secondary analyst to use the data accurately and effectively.
ICPSR Framework for Creating a DMP
We will first document our metadata by taking careful notes in the laboratory notebook that refer to specific data files and describe all columns, units, abbreviations, and missing value identifiers. These notes will be transcribed into a .txt document that will be stored with the data file. After all of the data are collected, we will then use EML (Ecological Metadata Language) to digitize our metadata. EML is one of the accepted formats used in ecology, and works well for the types of data we will be producing. We will create these metadata using Morpho software, available through KNB. The metadata will fully describe the data files and the context of the measurements.
We will make the data and associated documentation available to users under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed.
NIH data sharing statements
The videos will be made available via the bristol.ac.uk website (both as streaming media and downloads) HD and SD versions will be provided to accommodate those with lower bandwidth. Videos will also be made available via Vimeo, a platform that is already well used by research students at Bristol. Appropriate metadata will also be provided to the existing Vimeo standard.
All video will also be available for download and re-editing by third parties. To facilitate this Creative Commons licenses will be assigned to each item. In order to ensure this usage is possible, the required permissions will be gathered from participants (using a suitable release form) before recording commences.
Restriction on use examplesBecause the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data-sharing agreement.
NIH data sharing statements1. Share data privately within 1 year. Data will be held in Private Repository, but metadata will be public
2. Release data to public within 2 years. Encouraged after one year to release data for public access.
3. Request, in writing, data privacy up to 4 years. Extensions beyond 3 years will only be granted for compelling cases.
4. Consult with creators of private CZO datasets prior to use. Pis required to seek consent before using private data they can
Data will be provided in file formats considered appropriate for long-term access, as recommended by the UK Data Service. For example, SPSS Portal format and tab-delimited text for qualitative tabular data and RTF and PDF/A for interview transcripts. Appropriate documentation necessary to understand the data will also be provided. Anonymised data will be held for a minimum of 10 years following project completion, in compliance with LSHTM’s Records Retention and Disposal Schedule. Biological samples (output 3) will be deposited with the UK BioBank for future use.
Writing a Wellcome Trust Data Management and Sharing Plan
The investigators will work with staff at the UKDA to determine what to archive and how long the deposited data should be retained. Future long-term use of the data will be ensured by placing a copy of the data into the repository.
Start early•Negotiation on licenses and consent agreement may preclude later sharing if not careful
•Costs cannot be included retrospectively•Useful to consider data issues at the consortium negotiation stage to make sure potential issues are identified and sorted asap
•Involve all work packages and partners to get a coherent plan. •Focus effort on datasets you’ll create rather than reuse
Decisions made early on affect what you can do later
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 59
Think backwardsWhat data organisation would a re-user like?
CREATING DATA
PROCESSING DATA
ANALYSING DATA
PRESERVING DATA
GIVING ACCESS TO
DATA
RE-USING DATA
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Think about the desired end result and plan for this
“Sharing” means “outside the consortium”
60
InstitutionRDM policy
Facilities
€$£
Research funders
PublishersData Availability
policy
Commercial partners
Find who is responsible for what
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 https://www.openaire.eu/briefpaper-rdm-infonoads 61
•Zenodo for all types of publications, data and software•Claiming – linking research results•Amnesia, an anonymization tool for all
•Data providers – Interoperability Guidelines, validation,…•Project coordinators – reporting •Funders and institutions – monitoring•Research communities – gathering, monitoring all research
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 66
DASHBOARDS
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Literature Repositorie
sOA Journals
Funding Info
Validation
Cleaning
De-duplicating
Inferring
Linking
Organizations
Projects
Authors
Datasets
Publications
Data Provide
rs
…
Monitoring
Reporting
Evaluation
Impact
Classification
Clustering
Analysis
CRIS systems
An EU-CRIS system
Data Repositorie
s
Metadata
Full text
Usage data
DiscoveryCrowdsourcin
g
APIs
Trends
Aggregators
Enriching
ServicesOpenAIRE PlatformData Providers
Routing
Archives
Guidelines
Integrated Scientific Information System
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
17.3 mi unique publications 760+ validated data providers370Κ publications linked to projects from 6 funders28 K datasets linked to publications3.5K links to software repositories33K organizations
OrganizationsProjects
AuthorsDatasets
Publications
Data Providers
Software Facilities MethodsResearch
Communities
OpenAIRE-ConnectFrom January 2017
68
World-wide alignment & synergiesInteroperability alignment, sharing technologies & services•La Refencia: Latin America repository network
•JAIRO – Japanese Institutional Repositories Online
•REMERI – Mexican Network of Institutional Repositories
•…•ICSU/World Data Service – A network of 70 certified data centers