Inverting the Pyramid: Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley [email protected] Reusable with attribution: CC-BY The DCC is supported by Jisc Maximising the value of research data to society
Jul 03, 2015
Inverting the Pyramid:
Kevin Ashley Digital Curation Centre
www.dcc.ac.uk@kevingashley
Reusable with attribution: CC-BY
The DCC is supported by Jisc
Maximising the value of research data to society
My home – the DCC
• Mission – to increase capability and capacity for research data services in UK institutions
• Not just a UK problem – an international one
• Training, shared services, guidance, policy, standards, futures
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 2
DCC networks and partnerships
Original Slide: Martin Donnelly, DCC
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 3
About me
• 35 years ago – a mathematician in medical research
• Acquired a skill for rescuing old data:
– Lost code books
– Lost programs
– Bad or obsolete media or systems
• It was fun – but it should not have been necessary
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 4
My home – the DCC
• Mission – to increase capability and capacity for research data services in UK institutions
• Not just a UK problem – an international one
• Training, shared services, guidance, policy, standards, futures
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 5
Generic science data lifecycle
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 6
Adapted from: Harnessing the Power of Digital Data: Taking the Next Step.‖Scientific Data Management (SDM) for Government Agencies:Report from the Workshop to Improve SDM.
PLAN COLLECT INTEGRATE/TRANSFORM
PUBLISH DISCOVER ARCHIVE/DISCARD
E-Science curation report - 2003
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 7
Herve L’Hour’s analysis
• Data lifecycles are linear, cyclical or spiral (sometimes all three)
• See more at http://www.dcc.ac.uk/events/research-data-management-forum-rdmf/rdmf11 - workflows & research data management
• Linear cycles are project-based or repository-based
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 10
Traditional knowledge management view of data
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 11
Image © John Curran @ designedforlearning.co.uk
Image from forwardmotion.eu
But in research…
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 12
"DIKW-diagram" by RobOnKnowledge - Own work. Licensed under
Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons -http://commons.wikimedia.org/wiki/File:DIKW-diagram.png#mediaviewer/File:DIKW-diagram.png
I ♥ your data!
I don’t ♥ what you said about it.
LIDAR & RADAR images of ice cloud –H. Ruschennberg
2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -
CC-BY14
2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -CC-BY
15
The Old weather project
Data for research, not from research
Data reuse stories
• The palaeontologist who saved years of work with archaeological data
• The 19th-century ships logs that help us model climate change
• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 16
Data reuse - messages
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 17
Often your data tells stories that your
publications do not
Not all data comes from other researchers
One person’s noise is another person’s signal
Discipline-bounded data discovery doesn’t give us
all we need or want
Understanding Biodiversity
• We don’t understand what drives it
• What helps, hinders speciation
• No one project or data source is enough
• Biology, geology, climate science, chemistry…
• Big and small problems
• Reanalysis & gap analysis
Research on Biodiversity…
• Requires many different data sources
• Not all will be published
• Not all publications are for similar research reasons, so…
• Citing the publication is irrelevant
• Some is research data, other government or reference data
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 20
Why care?
• Data is expensive – an investment
• Reuse:
– More research
– Teaching & Learning
– Planning
• Impact – with or without publication
• Accountability
• Legal & regulatory requirements
Why does this matter?
• Research quality– How close can we get to
the truth?
• Research speed– How quickly can we get
to the truth?
• Research finance– How much does the
truth cost?
• Improving one or more of these is of interest to all actors:
• Researchers as data creators
• Researchers as data reusers
• Research institutions
• Funders – hence government and society
2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -CC-BY
21
Creative data reuse
• http://vimeo.com/38402965
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 22
Integrity – not without data
• Cyril Burt– Twin studies on intelligence.– Questioned 1976; now discredited
• Duke case– Data hiding leads to wasted treatments, clinical
trials, probable death & huge lawsuits
• Dutch cases– Stapel – 55 publications – “fictitious data”– Poldermans – fabricated data or negligence?
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 23
“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256
Without data reuse:
•We can waste billions
•People suffer & die
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 24
Data reuse from Hubble
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 25
Data reuse is already happening – and researchers can change
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 26
Where can it happen
Global, international
Nationally
InstitutionBy Subject
Research Group
2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -
CC-BY28
Research data centres are good value!
• See Jisc reports on ADS, BADC, UKDA:
• Returns on investment between 400% and 1200%
• Unfortunately – many research domains have no relevant data centres
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 29
http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx
“Provision for data management, for curation and long-term preservation, and for the sharing and re-use of data, varies wildly between subject areas.”
“The data management needs of many researchers are little considered or catered for.”
If greater provision is to be
made, a shortfall in
infrastructure (both technical
and human) must be
overcome.
Policy makers are aware that in many areas of enquiry, researchers’ access to well-managed, open and reusable data opens up significant opportunities.
2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -
CC-BY30
All from JISC MRD 2 call, 2010
2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -
CC-BY31
2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -
CC-BY32
The library as custodian
• Increasing role for library to provide access to institutional assets
• See Lorcan Dempsey’s thoughts on the inside-out library vs outside-in library
– http://www.slideshare.net/lisld/the-inside-out-library
• Build on library strengths – preservation, access, curation, selection
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 33
G8UK - Endorses
OA
Open Data
Charter
Policy Paper
18 June 2013
2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -
CC-BY34
Funder requirements
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 35
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx
UK - RCUK
Canada
UK - RCUK
USA – NSF, NEH, etcDenmark
USA – non-government funders (Sloan, Gates,…)
Europe
RCUK policy - The 1-minute version
• Research data are a public good – make openly available in timely & responsible way
• Have policies & plans. Data with long-term value should be preserved & usable
• Metadata for discovery & reuse. Link publications & data
• Sometimes law, ethics get in the way. We understand.
• Limited embargos OK. Recognition is important –always cite data sources
• OK to use public money to do this. Do it efficiently.
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 36
EPSRC policy points
• Awareness of regulatory environment
• Data access statement
• Policies and processes
• Data storage
• Structured metadata descriptions
• DOIs for data
• Securely preserved for a minimum of 10 years from last use
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY
Compliance expected by 2015
2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -
CC-BY38
DCC Policy Summary
http://www.dcc.ac.uk/resources/policy-and-legal
Helping make data reuse possible –experience from the DCC
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 39
Some lessons – a summary• Data reuse is rarely as simple as people think it is• It is already happening• It is good for research, for researchers, for funders, for
universities• Without senior management attention and researcher
involvement, your initiative will fail• Research data management services cannot involve the
library alone• Researchers need to know your services exist• Training for young researchers in good data practice is
valuable
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 40
DCC ‘institutional engagement’Assess needs
Make the case
Develop support and
services
RDM policy development
Customised Data Management Plans
DAF & CARDIO assessments
Guidance and training
Workflow assessment
DCC support
team
Advocacy with senior management
Institutional data catalogues
Pilot RDM tools
…and support policy implementation2014-11-25
Kevin Ashley –IMCW/ICKM-2014, Antalya -CC-BY
41
Original Slide: Graham Pryor, DCC
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 42
Some institutional roles
• Leadership – coordinate action• Audit – who has what, where does it go?• Advice on access – data, wherever it is• Preservation – permanence• Citability• Data/publication linking• Promoting data in teaching• Selection• Education – early career researchers
Who (in the UK) is leading RDM work?
Library
IT
Research
Office
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 43
RESEARCHERS
INSTITUTIONAL SERVICES
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 44
Some example services
• Storage – persistent, shareable
• Permanent, citeable identifiers
• Database as a service (e.g. Oxford ORDS)
• Embed tools in Excel – Dataup, others
• Workflow management – Taverna
• Training for early career researchers
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 45
Make data creation easier
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 46
Make data citable
• Making data available increases citations
• Everyone – academic, funder, institution –loves citations
• Want evidence?– Alter, Pienta, Lyle – 240%, social sciences *
– Piwowar, Vision – 9% (microarray data)†
– Henneken, Accomazzi – 20% (astronomy) #
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 47
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
Make data discoverable
• Data must be discoverable to be reused
• Alone, or in conjunction with publication
• Services include:
– Institutional catalogues
– national data registries
– Repository registries – databib, re3data
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 48
Dataverse –helping
researchers make data findable & reusable
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 49
Gking.harvard.edu/data
DCC guidance
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 50
http://dataintelligence.3tu.nl/en/home/
Choice of RDM training
materials for librarians
Up-skilling
for data
http://datalib.edina.ac.uk/mantra/libtraining.html
2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -
CC-BY51
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 52
What data to keep
The Data Deluge is upon us
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 53
Sensor’s ability to produce data outstrips IT’s ability to process it
Roles and Responsibilities
What data to keep
2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -
CC-BY54
IDCC15 – London, Feb 9-12 2015
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 55
http://www.dcc.ac.uk/events/idcc15
The 10th
International Digital Curation Conference
My message to researchers• The credit belongs to you
• The data belongs to all of us
• Share, and we all reap the benefits
• The story doesn’t end with a publication
2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 56