USE AND REUSE Research data locally and globally Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley [email protected] Reusable with attribution: CC-BY The DCC is supported by Jisc & FP7
Jan 26, 2015
USE AND REUSEResearch data locally and globally
Kevin Ashley Digital Curation Centre
www.dcc.ac.uk@kevingashley
Reusable with attribution: CC-BYThe DCC is supported by Jisc & FP7
2
Why does this matter?
• Research quality– How close can we get to
the truth?
• Research speed– How quickly can we get
to the truth?
• Research finance– How much does the
truth cost?
• Improving one or more of these is of interest to all actors:
• Researchers as data creators
• Researchers as data reusers
• Research institutions• Funders – hence
government and society2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
3
The Data Deluge is upon us
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
Sensor’s ability to produce data outstrips IT’s ability to process it
4
Funders are making demands
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
2014-01-08Kevin Ashley – ESIP Winter 2014 -
CC-BY5
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx
EPSRC expects all those institutions it fundsto develop a roadmap that aligns … with EPSRC’s expectations by 1st May 2012;to be fully compliant … by 1st May 2015.
2014-01-08Kevin Ashley – ESIP Winter 2014 -
CC-BY6
• Awareness of regulatory environment• Data access statement• Policies and processes• Data storage• Structured metadata descriptions• DOIs for data• Securely preserved for a minimum of 10 years
from last use
7
Where are funders making demands?• USA – NSF, NEH, some philanthropic funders• UK• Germany – DFG• Europe – European Commission (H2020)
Often tied to requirements on open access to research publications – but not as common.
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
8
To universities, that looks like a problem
• Funder requirements exist for a reason:– That data is valuable
• Value to funder, society from reuse• Value to the institution is there also
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
BIS business case: £1.5m investment in research data services pays back 2.5 times after 5 years
9
Research Data Centres – the solution!
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
MANY AREAS OF RESEARCH HAVE NO
DATA CENTRE TO SERVE THEM
10
Data centres deliver valueWant a 400% -> 1200% return on your investment?
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
Try BADC!
http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx
11
Data reuse from Hubble
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
12
Don’t trust government
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
http://thetyee.ca/News/2013/12/23/Canadian-Science-Libraries/
13
Commercial services
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
14
Cloud – sorted!
• Sorry, but it isn’t.• High-use datasets and long tail present
different economic and technical challenges• See David Rosenthal’s analysis of the
economics of Amazon for preservation“Distributed digital preservation in the cloud”IJDC 8(1), 2013 doi:10.2218/ijdc.v8i1.248
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
15
Cost of data for 100 years – local vs Amazon S3Data from blog.dshr.org/2013/01/talk-at-idcc2013.html © David Rosenthal, used under CC-BY-SA licence
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
16
Cost of data for 100 years – local vs Amazon S3 AND GlacierData from blog.dshr.org/2013/01/talk-at-idcc2013.html © David Rosenthal, used under CC-BY-SA licence
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
17
National responses – supporting universities
• USA – NSF initiatives (DataONE, SEAD, Data Conservancy et al)
• Australia – ANDS, RDSI• UK – DCC, Jisc ‘Managing Research Data’
programmes• Netherlands – Research Data Netherlands• Canada – Research Data Canada• Also grassroots or funder-led work in Finland,
Denmark, Germany2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
18
UK- Jisc acts through DCC to help
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
Kevin Ashley – ESIP Winter 2014 - CC-BY 19
DCC ‘institutional engagement’
Assess needs
Make the case
Develop support and
services
RDM policy development
Customised Data Management Plans
DAF & CARDIO assessments
Guidance and training
Workflow assessment
DCC support
team
Advocacy with senior management
Institutional data catalogues
Pilot RDM tools
…and support policy implementation2014-01-08
Kevin Ashley – ESIP Winter 2014 - CC-BY 20http://dataintelligence.3tu.nl/en/home/
http://www
.sheffield.
ac.uk/is/re
search/pro
jects/
rdmrose
Choice of RDM training materials for librarians
Up-skilling for data
http://datalib.edina.ac.uk/mantra/libtraining.html
2014-01-08
Kevin Ashley – ESIP Winter 2014 - CC-BY 21
Australian National Data Service
2014-01-08
National Service, backed with university-level initiatives
Kevin Ashley – ESIP Winter 2014 - CC-BY 22
Excuses – and responses• “People will ask questions”
– So use a data centre or repository• “It will be misinterpreted”
– Stuff happens. Also, openness encourages correction• “It’s not interesting”
– Let others be the judge – your noise is my signal• “I might get another paper out of it”
– Up to a point. We might get more research out of it• “I don’t have permission”
– A real problem. But solvable at senior level• “It’s too bad/complicated” –see above• “It’s not a priority”
– Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well
2014-01-08
See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
232014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
These excuses bear a strong resemblance to those used by
politicians and civil servants who argue against the release of government
records
This is not a group you want to be compared with
24
Integrity
• Not everyone publishes here
• Almost all fraud connected to unavailable data
• People suffer & die due to research fraud
• When your research is reproducible – it gets cited
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
25
Integrity – not without data
• Cyril Burt– Twin studies on intelligence.– Questioned 1976; now discredited
• Duke case– Data hiding leads to wasted treatments, clinical trials,
probable death & huge lawsuits• Dutch cases
– Stapel – 55 publications – “fictitious data”– Poldermans – fabricated data or negligence?
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256
26
Should all data be open?
• NO• Many reasons – most to do with human
subjects• But data existence should always be open• Allows discovery & negotiation on use• Avoids pointless replication
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
27
Gentleman’s data centres
• Some data centres have club-like behaviour– Barriers to access– Only for contributors– Territorial
• Not without value, but barriers to progress
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
28
Citability
• Making data available increases citations• Everyone – academic, funder, institution –
loves citations• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *– Piwowar, Vision – 9% (microarray data)†– Henneken, Accomazzi – 20% (astronomy) #
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
Kevin Ashley – ESIP Winter 2014 - CC-BY 29
Can we find it?
• Data must be discoverable to be reused• Alone, or in conjunction with publication• Institutional catalogues, national data
registries, national and international domain-specific services
2014-01-08
30
Data discovery around the world
• Research Data Australia• UK data registry pilot &
Gateway2Research• Research Data
Netherlands• World Data System• re3data.org &
databib.org – discovering repositories
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
Kevin Ashley – ESIP Winter 2014 - CC-BY 31
Repository finders
2014-01-08
A re3data record
32
A databibrecord
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
Kevin Ashley – ESIP Winter 2014 - CC-BY 332014-01-08
Kevin Ashley – ESIP Winter 2014 - CC-BY 342014-01-08
Kevin Ashley – ESIP Winter 2014 - CC-BY 35
Other global work of note
• Domain initiatives such as Belmont forum• International generic groups – RDA, CODATA• Problem-specific services – Datacite, EZID,…
2014-01-08
Kevin Ashley – ESIP Winter 2014 - CC-BY 362014-01-08
Idea
Develop
Fund
Plan
Record
Process
Publish
Read
Kevin Ashley – ESIP Winter 2014 - CC-BY 372014-01-08
Idea
Develop
Fund
Plan
Record
Process
Publish
Read
Idea
Develop
Fund
Plan
Record
Process
Publish
Read
Kevin Ashley – ESIP Winter 2014 - CC-BY 38
Idea
Develop
Fund
Plan
Record
Process
Publish
Read
2014-01-08
39
Data reuse stories
• The palaeontologist who saved years of work with archaeological data
• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull
• The 19th-century logs and photographs that help us model climate change
2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
Often your data tells stories that your
publications do not
40
3TU treasure chest2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY
Kevin Ashley – ESIP Winter 2014 - CC-BY 41
Thanks for your attention
[email protected]@kevingashley
2014-01-08