Page 1
Because good research needs good data
Funded by
Supporting Research Data Management
at the University of Stirling
Graham Pryor and Martin Donnelly
Digital Curation Centre
27 April 2012
This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License
Page 2
The Digital Curation Centre is
• a consortium comprising units from the Universities of
Bath (UKOLN), Edinburgh (DCC Centre) and Glasgow
(HATII)
• launched 1st March 2004 as a national centre for
solving challenges in digital curation that could not be
tackled by any single institution or discipline
• funded by JISC
• with additional HEFCE funding from 2011 for
• the provision of support to national cloud services
• targeted institutional development
Page 3
The DCC Mission
Helping to build capacity, capability and skills in data management and curation
across the UK’s higher education research community
– DCC Phase 3 Business Plan
Page 4
DCC institutional stakeholders
University managers
Researchers
Research support staff with a role to play in data management, particularly those from
• University libraries
• IT services
• The research and innovation office
• Digital repositories
Page 5
Why manage research data? The impact of e-Science and the global network
• “Research data is a form of infrastructure, the basis
for data intensive research across many domains” –
EC Riding the Wave report, 2010
• “Funders expect research to be international in
scope. A third of all articles published are
internationally collaborative” – Royal Society, 2011
The governmental and funder imperative
• “Publicly-funded research data must be made
available for secondary scientific research” – ESRC
research data policy
Page 6
Why manage research data? The researcher incentive
• “By making their data available via licensed
platforms researchers stand to improve their
status as researchers through the mandatory
citing and attribution of their original work”
– Mark Hahnel, FigShare, IDCC 2011
Page 7
Why manage research data? The researcher incentive
• “By making their data available via licensed
platforms researchers stand to improve their
status as researchers through the mandatory
citing and attribution of their original work”
– Mark Hahnel, FigShare, IDCC 2011
The same demanding, sometimes competing
community of perspectives that the Digital Curation
Centre was created to unravel…
Page 8
Where is the data in research?
The six datacentric phases of the research lifecycle
Page 9
Reflections: the research data lifecycle
Page 10
Three perspectives
Scale and complexity – Volume and pace
– Infrastructure
– Open science
Policy – Funders
– Institutions
– Ethics & IP
Management – Storage
– Incentives
– Costs & Sustainability http://www.nonsolotigullio.com/effettiottici/images/escher.jpg/
Page 11
“Surfing the
Tsunami” Science: 11 February 2011
The data deluge
Page 12
Challenges of scale and complexity
– transformation and globalisation
Page 13
http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/publications.htm
l#november-2009
“For science to effectively function,
and for society to reap the full
benefits from scientific endeavours,
it is crucial that science data be
made open”
Page 14
Open to all? Case studies of openness
in research
Choices are made according to context, with
degrees of openness reached according to:
• The kinds of data to be made available
• The stage in the research process
• The groups to whom data will be made
available
• On what terms and conditions it will be
provided
Default position of most:
• YES to protocols, software, analysis tools,
methods and techniques
• NO to making research data content freely
available to everyone
After all, where is the incentive? Angus Whyte, RIN/NESTA, 2010
Page 15
“While many researchers are
positive about sharing data in
principle, they are almost
universally reluctant in
practice. ..... using these
data to publish results before
anyone else is the
primary way of gaining
prestige in nearly all
disciplines.” INCREMENTAL Project
“Data
sharing was
more readily
discussed by
early career
researchers.”
Page 16
Rules and regulations…
Compliance
• Rights, Exemptions, Enforcement Data Protection Act
1998
• Climategate, Tree Rings, Tobacco and…(what’s next?)
Freedom of Information Act 2000
• etc. etc. etc……….. Computer Misuse Act
1980
Page 17
Policy
• Public good
• Preservation
• Discovery
• Confidentiality
• First use
• Recognition
• Public funding
Page 18
RCUK Policy and Code of Conduct on the
Governance of Good Research Conduct (updated Oct 2011)
UNACCEPTABLE RESEARCH CONDUCT includes mismanagement or
inadequate preservation of data and/or primary materials, including failure
to:
keep clear and accurate records of the research procedures followed
and the results obtained, including interim results;
hold records securely in paper or electronic form;
make relevant primary data and research evidence accessible to
others for reasonable periods after the completion of the research:
data should normally be preserved and accessible for 10 yrs (in some
cases 20 yrs or longer);
manage data according to the research funder’s data policy and all
relevant legislation;
wherever possible, deposit data permanently within a national
collection.
Responsibility for proper management and preservation of data and primary
materials is shared between the researcher and the research organisation.
Page 20
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx
EPSRC’s nine expectations and
a roadmap - implications for HEIs
Page 21
DCC
policy
summary
http://www.dcc.ac.uk/resources/policy-and-legal
Page 22
…….addressing where
European copyright and
database law poses flaws and
obstacles to the access to
research data
Intellectual Property Rights and Digital Preservation
21.11.2011 at the Clifton Hill House, Bristol University
“a poor fit between technology, processes and
regulations constrains preservation actions and
significantly inhibits the benefits which long-term
access ought to deliver”
Regulation, regulation…
Page 23
Data access as headline news
JISC Legal
Page 24
Management – infrastructure and
data storage challenges...
The case for cloud computing in genome
informatics. Lincoln D Stein, May 2010
Scaleable
Cost-effective (rent on-demand)
Secure (privacy and IPR)
Robust and resilient
Low entry barrier / ease-of-use
Has data-handling / transfer /
analysis capability
Cloud services?
Page 25
“Departments don’t have guidelines or
norms for personal back-up and researcher
procedure, knowledge and diligence varies
tremendously. Many have experienced
moderate to catastrophic data loss”
Incremental Project Report, June 2010
http://www.flickr.com/photos/mattimattila/3003324844/
Page 26
Management - incentivisation,
recognition and reward
Page 27
Management -
costs, benefits
and value
Page 28
Help desk:
0131 651 1239
[email protected]
www.dcc.ac.uk
Page 29
DCC Institutional Support:
Tools and Services
Martin Donnelly
Digital Curation Centre
University of Edinburgh
University of Stirling 27 April 2012
Page 30
Institutional Engagements
With funding from HEFCE we’re:
• Working intensively with 18 HEIs to increase RDM capability
– 60 days of effort per HEI drawn from a mix of DCC staff
– Deploy DCC & external tools, approaches & best practice
• Support varies based on what each institution wants/needs
• Lessons & examples to be shared with the community
www.dcc.ac.uk/community/institutional-engagements
Page 31
Some current IE activities
Assessing
needs
RDM roadmaps
Piloting tools
e.g. DataFlow
Policy
development
Policy
implementation
Page 32
Support offered by the DCC
Assess
needs
Make the case
Develop
support
and
services
RDM policy development
Customised Data Management Plans
DAF & CARDIO assessments Guidance
and training
Workflow assessment
DCC support
team
Advocacy to senior management
Institutional data catalogues
Pilot RDM tools
…and support policy implementation
Page 33
DATA MANAGEMENT STRATEGY
(Research and Admin)
Five components:
• Policy
• Advocacy
• Planning
• Tools
• Training
Page 35
Your Data as Assets: DAF
• What are the characteristics of
research data assets?
– Number?
– Scale?
– Complexity?
– Dependencies?
– Liabilities?
• Why do researchers act the way they
do with respect to data?
• What do they need to do research?
Page 36
IN BRIEF
The Data Asset Framework provides a methodology
and online tool to identify research data assets and
find out how they are being managed. This
information will enable institutions to develop a data
strategy so their assets are preserved and remain
accessible in the long term. It is usually applied at
research group / department level to ensure the
scope is manageable.
URL: http://www.data-audit.eu
Page 37
Data Management Planning:
DMP Online
• A growing requirement from
funders, publishers and HEIs,
in the UK and internationally
• Supportive of good research
practice, according to RCUK
• A cross-cutting activity
involving multiple stakeholder
types (researchers, librarians,
IT managers, support staff)
Page 38
IN BRIEF
DMP Online is the DCC's web-based data
management planning tool. It allows you to build and
edit DMPs according to the requirements of the
major UK funders.
The tool also contains helpful guidance and links for
researchers and other data professionals. The
structure of the tool is based on the DCC’s Checklist
for a Data Management Plan.
URL: http://www.dcc.ac.uk/dmponline
Page 40
Capacity Assessment and
Building: CARDIO • How well does an institution (or
department, School, etc) manage its data?
• Depends on: – Finances
– Technology
– Policy management
– Organisational will
• Demands acknowledgement of many perspectives
Page 41
IN BRIEF
An online tool which helps departments or research
groups to identify and communicate their current data
management capabilities, and subsequently identify
coordinated pathways for future enhancement via a
dedicated knowledge base.
CARDIO emphasises a collaborative, consensus-
driven approach, and enables benchmarking with
other groups and institutions.
URL: http://cardio.dcc.ac.uk/
Page 43
Risk Management: DRAMBORA
• A variety of risk factors, both internal and external, affect the management of digital objects such as research data
• Risks can tangible (fire/flood) or intangible (accidental data loss leading to reputational impact)
• They may exist in isolation, or lead to other risks if not adequately managed
Page 44
IN BRIEF
DRAMBORA is an audit methodology and tool for
identifying and planning for the management of risks
which may threaten the availability and/or usability of
content in a digital repository or archive.
URL: http://www.repositoryaudit.eu
Page 45
DCC Services
• Policy
• Strategy
• Training
• Other services…
Page 46
Policy (i)
The DCC has a number of guidance resources related to
research data policy. We can guide institutions on their
requirements to manage/share data, and offer practical
steps to help them develop data policies by:
- Providing templates and examples to demonstrate
what aspects could be incorporated into a data policy;
- Coordinating / contributing to meetings of relevant
stakeholders to ensure all activities and perspectives are
addressed;
- Reviewing and feeding back on draft policies;
- Assisting with communications to launch and
implement the policy.
Page 47
Policy (ii)
Benefits of developing a data policy:
- Compliance with funder guidelines, e.g. the EPSRC
expectation that HEIs have a RDM roadmap in place by
May 2012, and be fully compliant by May 2015;
- Assuring the good conduct of research in line with
Research Integrity guidelines (see RCUK & UKRIO docs);
- Clarity for researchers and demonstrable institutional
commitment for RDM;
- The prestige of joining a small but growing group of
leading institutions with a data policy:
http://www.dcc.ac.uk/resources/policy-and-
legal/institutional-data-policies
Page 48
Strategy (i)
We offer a half-day workshop in which key stakeholders
from an institution (e.g. librarians, senior IT staff, research
administration, repository staff, researchers, etc) convene
to discuss and develop an institutional strategy for RDM.
Benefits:
- Coherence across service providers and agreed
direction for RDM services;
- Ability to reference strategy / commitment to RDM (the
University of Oxford policy may be a useful example of
this - http://www.admin.ox.ac.uk/rdm);
- A move towards more efficient management of data.
Page 49
Strategy (ii)
Through practical breakout sessions, senior DCC staff can
lead and mediate discussion to help the institution
determine its priorities and define practical next steps.
These might include the development of infrastructure (e.g.
data repositories), new services (e.g. DMP support), policy
development, improved guidance or data management
training provision.
Suggested actions will depend on gaps/areas for
improvement as perceived by the institution.
Page 50
Training (i)
We offer a variety of training courses:
- DC101 introduction to data management
- Tools of the Trade courses which give practical
overviews and hands-on exercises using DCC tools
- Train-the-Trainer, which equips information professionals
to teach RDM courses.
We also organise regional data management roadshow
events which can incorporate a training element.
Generic training materials are available online, and
hardcopy packs can be produced.
Page 51
Training (ii)
The DCC can:
- Run courses, tailoring content to institutional needs;
- Assist in the development of online learning materials
(screencasts, audio-synced slides);
- Develop resources such as guidance documents, case
studies and manuals.
Key benefits of training provision are:
- Improved data management capacity;
- The opportunity to profile and raise awareness of
institutional support services.
Page 52
Other services... CARDIO Used at research group or department level to assess activity and
data management infrastructure and contribute to an institution-wide
view
Data Asset Framework DAF is a structured mechanism used to identify what data exists and
understand how research data are being managed and shared
Customised DMP We can work with you to develop an institution-specific instance of
DMP Online for developing data management plans that fit funder
requirements before and after an award of grant
Policy development We can assist in the development of institutional policy
Workflow assessment Using tested methodologies we can analyse current research data
workflows
Training We can train people in the use of many of the above tools and in
generic skills such as data quality assessment
Costing We can assist with the development of costing and pricing for data
management services
Risk management Working with you to identify risks in current or planned research data
management practice, we will make recommendations on mitigation
and the elimination of those risks
Institutional data
catalogues
We can recommend options for exposing metadata about your
research data via CRIS systems, repositories, or a mix of these
Page 53
Recap: support offered by the DCC
Assess
needs
Make the case
Develop
support
and
services
RDM policy development
Customised Data Management Plans
DAF & CARDIO assessments Guidance
and training
Workflow assessment
DCC support
team
Advocacy with senior management
Institutional data catalogues
Pilot RDM tools
…and support policy implementation
Page 54
Practicalities
• University Modernisation Fund provides
resource for 18 “institutional engagements”
between DCC and HEIs
• Up to 60 days of effort available per
institution, between now and March 2013
• Institution agrees a schedule of work with
the DCC, and each assigns a primary
contact / programme manager
Page 55
Questions and Thanks
For more information:
– Visit http://www.dcc.ac.uk
– Email [email protected] or
[email protected]
This work is licensed under a Creative Commons Attribution 2.5
UK: Scotland License. © Digital Curation Centre 2012