www.software.ac.uk Software – a different kind of research object? http:// dx.doi.org / 10.6084/m9.figshare. 5459542 3 rd October 2017, Lancaster Data Conversations, Lancaster Neil Chue Hong (@ npch ), Software Sustainability Institute ORCID: 0000 - 0002 - 8876 - 7606 | [email protected]Slides licensed under CC-BY where indicated: Supported by Project funding from
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Software Sustainability Institute
www.software.ac.uk
Software – a different kind of research object?
http://dx.doi.org/10.6084/m9.figshare.5459542
3rd October 2017, Lancaster Data Conversations, LancasterNeil Chue Hong (@npch), Software Sustainability InstituteORCID: 0000-0002-8876-7606 | [email protected]
Slides licensed underCC-BY where indicated:
Supported by Project funding from
Software Sustainability Institute
www.software.ac.ukWhat’s software got to do with my research?
The research community
relies on software
Do you use research
software?
What would happen to your
research without software
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014.
406 respondents covering representative range of funders, discipline and seniority.
56%Develop their
own software
71%Have no formal
software training
Software Sustainability Institute
www.software.ac.uk
Software in Nature
Nangia and Katz: https://arxiv.org/pdf/1706.06527.pdf
Repeatability of published microarray gene expression
analyses56% of analyses could not be repeated,
of which 30% were because of software issues. 50% did not state software version, 39% did not provide raw data.
Only 11% could be reproduced satisfactorily.
Ioannidis et al. Nature Genetics, 41, 2010doi:10.1038/ng.295
Software Sustainability Institute
www.software.ac.uk
Repeatability in Computer Science
Of 401 papers in ACM Computer Science journals and proceedings, only 85 provided a link to software.For 176 the software could not be obtained.
Collberg, Proebsting, Warren, University of Arizona TR 14-04, 2015 http://reproducibility.cs.arizona.edu/v2/RepeatabilityTR.pdf
Software Sustainability Institute
www.software.ac.uk
Errors due to bioinformatics pipeline
The results presented in the Report “Ancient Ethiopian genome reveals extensive Eurasian admixture throughout the African continent“ were affected by a bioinformatics error – identified because of open science
Llorente et al. Science, 350, 6262doi:10.1126/science.aad2879
Software Sustainability Institute
www.software.ac.uk
T
Software Sustainability Institute
www.software.ac.ukIsn’t software just a typeof data?
Software Sustainability Institute
www.software.ac.uk
Authorship Lifecycle
IdentifyCite
Reuse
Research
Index
Papers, data, software all research outputs ofa continuous cycle.
With software, technologymakes it easier to track, but not reward.
We cannot separatepapers, data and softwarewhen we release research.
http://openresearchsoftware.metajnl.com
Software Sustainability Institute
www.software.ac.uk
The current process
Startresearch
Writesoftware
Usesoftware
Produceresults
Publishresearch
paper
Releasedata
Releasesoftware
Which mentions software and data
This process is simple but does not reward production orreuse of good software and data.
It also has a long contribution cycle.
Software Sustainability Institute
www.software.ac.uk
Writesoftware
A better process?
Startresearch
Identifyexisting
software
Usesoftware
Produceresults
Publishresearch
paper
Adapt/extend
software
Releasedata
Releasesoftware
Publishsoftware
paper Publishdata
paper
Wh
ich referen
ces so
ftware an
d d
ata pap
ers
Software and data papers are needed as proxies for rewarding reuse.
But it enables a shorter contribution cycle for data and software.
Software Sustainability Institute
www.software.ac.uk
What do we choose to identify:- Workflow?- Software that runs workflow?- Software referenced by workflow?- Software dependencies? What’s the minimum citable part?
Boundary
http://dx.doi.org/10.6084/m9.figshare.1497930
Software Sustainability Institute
www.software.ac.uk
Algorithm
Function
Pro
gram
Library / Su
ite / Package
…
Granularity
http://dx.doi.org/10.6084/m9.figshare.1497930
Software Sustainability Institute
www.software.ac.ukVersioning
Personalv1
Personalv2
Personalv3
Personal v2a
Public v1
Personal v3a
Personal v2a
Public v2
Public v3
Why do we version?- To indicate a change- To allow sharing- To confer special status
http://dx.doi.org/10.6084/m9.figshare.1497930
Software Sustainability Institute
www.software.ac.uk
AuthorshipAuthorship• Which authors have had what impact on each version of the software?• Who had the largest contribution to the scientific results in a paper?
http://beyond-impact.org/?p=175
OGSA-DAI projects statistics from Ohloh
http://dx.doi.org/10.6084/m9.figshare.1497930
Software Sustainability Institute
www.software.ac.ukIf software is so important, why is most of it hard to reuse?
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability Institute
A national facility for cultivating better, more sustainable, research software to enable world-class research• Software reaches boundaries in its
development cycle that prevent improvement, growth and adoption
• Providing the expertise and services needed to negotiate to the next stage
• Developing the policy and tools tosupport the community developing andusing research software Supported by EPSRC Grant EP/H043160/1
+ EPSRC/ESRC/BBSRC grant EP/N006410/1
Software Sustainability Institute
www.software.ac.uk
, it’
Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/,
Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)
Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.
Software Sustainability Institute
www.software.ac.uk
T
Research Culture Needs Changing
“This particular project was something I wrote a couple years ago to help me out with a workflow… I’d put it up on Github, so that others could potentially use it or use the code. So I went to see what people were saying about this project. It seemed liked I’d done something fundamentally wrong, so stupid that it flabbergasts someone... So of course I start sobbing. Then I see these people’s follower count, and I sob harder. I can’t help but think of potential future employers that are no longer potential.”
Our research culture presents barriers but few incentives to sharing code
• There is a fear of being “found out” for poor code, but no encouragement or resources to improve software engineering skills
• There is no reward for publishing code in the current system of metrics. Researchers fear being “scooped” or losing ability to publish.
• Many organisations do not understand how to exploit open source licenses
Software Sustainability Institute
www.software.ac.ukNever be ashamed of making your software available
Software Sustainability Institute
www.software.ac.uk
T
Vandewalle (2012) DOI: 10.1109/MCSE.2012.63
Software Sustainability Institute
www.software.ac.uk
Research Software Workflow
develop share preserve
Developed and versioned using code repository
Published via code repositoryor website
Deposited in digital repositorywith paper / for preservation
Software Sustainability Institute
www.software.ac.uk
Good Enough Practices To Please Your Future Self
• Data:
Save and backup raw data
Create analysis-friendly data
Record your processing steps
Anticipate the need to use multiple tables, and use a unique identifier for each record
Submit data to a repository and get a DOI
Software Sustainability Institute
www.software.ac.uk
Good Enough Practices To Please Your Future Self
• Software: Document for your future self:
• Brief descriptive comment at the start of your code • Provide a simple example or test data set• Give functions and variables meaningful names• Make dependencies and requirements explicit
Learn to be modular• Break programs into functions• Don’t duplicate functionality• Search for well-maintained libraries that do what you need
Make it accessible in the future• Make the license explicit• Keep track of changes• Submit code to a reputable DOI-issuing repository
Good Enough Practices in Scientific Computing: https://doi.org/10.1371/journal.pcbi.1005510
Software Sustainability Institute
www.software.ac.uk
What you can do now
• Make sure you’re using version control
• Write a README file that describes how you can get your code up and running, and give it to a colleague to try out
What it does, requirements / dependencies, simple example of use and input + output data
• Ask a collaborator to contribute a new piece of functionality, and get feedback on the process
• Talk to your library / IT services about the services they offer
Software Sustainability Institute
www.software.ac.uk
Get some training
Teach basic lab skillsfor scientific computing
so that researchers can do more in less time and with less pain.
Teach basic concepts, skills and tools for working more effectively with data. Workshops are designed for people with little to no prior computational experience.
The un-conference that most participants would recommend to their colleagues
Software Sustainability Institute
www.software.ac.uk
T
Without data it’s difficult to validate results. But without code, we waste the opportunity to advance science.
These slides: http://dx.doi.org/10.6084/m9.figshare.5459542
“The only way to publish software in a scientifically robust manner is to share source code, and that means publishing via the internet in an open-access/open-source fashion. —Warren Lyford DeLano, Creator of PyMOL, 2005
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability Institute
A national facility for cultivating better, more sustainable, research software to enable world-class research• Software reaches boundaries in its
development cycle that prevent improvement, growth and adoption
• Providing the expertise and services needed to negotiate to the next stage
• Developing the policy and tools tosupport the community developing andusing research software Supported by EPSRC Grant EP/H043160/1
• Community Engagement (Lead: Shoaib Sufi) Fellowship Programme Events and Workshops
• Consultancy (Lead: Steve Crouch) Open Call for Projects / Collaborations Software Evaluation
• Policy and Publicity (Lead: Simon Hettrick) Case Studies / Policy Campaigns Software and Research Blog
• Training (Lead: Aleksandra Nenadic) Software Carpentry and Data Carpentry (300+ students/year) Guides and Top Tips
• Journal of Open Research Software (Editor: Neil Chue Hong)
• Collaboration between universities of Edinburgh, Manchester, Oxford and SouthamptonSupported by EPSRC Grant EP/H043160/1 + EPSRC/ESRC/BBSRC grant EP/N006410/1