*
* http://www.software.ac.uk/blog/2014-03-21-reproducible-research-impossible-dream
Slide 2
Slide 3
Agenda
1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion
Slide 4
Agenda
1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion
Slide 5
Personal Introduction
• Defense of my Ph.D. thesis at TU-Sofia is pending• Research in image/MR image segmentation• Publications in peer-reviewed journals• Some experience in industry
Slide 6
Agenda
1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion
Slide 7
Introduction to Reproducible ResearchDefinitions
Reproducible Research (RR) is an approach aiming at complementing classical printed scientific articles with everything required to independently reproduce the results they present *. "Everything" covers:
• data• computer codes• a precise description of how the code was applied to the data
* Delescluse, Matthieu, et al. "Making neurophysiological data analysis reproducible: Why and how?" Journal of Physiology-Paris 106.3 (2012):159-170.
Introduction to Reproducible ResearchDefinitions
Another definition (Signal Processing): An article about computational science in a
scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures*.D. Donoho
* D. Donoho et al., “Reproducible Research in Computational Harmonic Analysis,” Computing in Science & Eng., vol. 11, no. 1, 2009, pp. 8–18
Slide 6
Slide 9
Introduction to Reproducible ResearchDefinitions
• Replication – independent people going out and collecting new data to verify research* (Roger Peng). It is considered the scientific golden standard.
• Reproduction – independent people analyze the same data and produce the same result* . Focus on validity of data analysis. (Roger Peng)
* http://simplystatistics.org/2011/12/02/reproducible-research-in-computational-science/
Introduction to Reproducible ResearchDefinitions
*
* Peng, R. D. (2011). Reproducible research in computational science. Science (New York, Ny), 334(6060), 1226.
Slide 8
Slide 11
Introduction to Reproducible ResearchHistory
The RR “movement" started with what economists have been calling replication since the early 1980s to reach what is now called reproducible research in computational data analysis. Currently, it is influenced by the open science and open source movement.
Slide 12
Introduction to Reproducible Research Relation to scientific method
Steps of a scientific method *:1. Define a question2. Observe – gather information and resources3. Form an explanatory hypothesis4. Test the hypothesis by performing an experiment and
collecting data in a reproducible manner5. Analyze the data6. Interpret the data and draw a conclusion7. Publish results8. Retest (reproduce) from other researchers
* Crawford S, Stucki L (1990), "Peer review and the changing research record", "J Am Soc Info Science", vol. 41, pp. 223–228
The steps related to the Reproducible Research are in italic type
* https://scischol102.wordpress.com/category/science/
* *
Slide 11
Slide 14
Introduction to Reproducible Research Relation to scientific method
Principles of a scientific method:1. Empirically testable2. Replicable3. Objective4. Transparent5. Falsifiable6. Logically consistent
Slide 15
Introduction to Reproducible Research Scheme
*
* http://www.biostat.jhsph.edu/~rpeng/research.html (mod.)
Slide 16
Introduction to Reproducible ResearchCurrent situation
Current situation with RR in different fields:• Medicine (cancer research), social sciences
(psychology), etc.Replication/Reproducibility crisis – the results of scientific experiments are impossible to replicate
• Natural sciences • Computer science
* Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452-454.
*
Slide 15
Slide 18
Introduction to Reproducible ResearchCurrent situation
Reproducibility in Medical imaging &Computer vision & Machine learning:• Public test sets available• Most method codes are available (papers from
major conferences and journals)• High pressure/workload on researchers to
make their work reproducible
Slide 19
Introduction to Reproducible ResearchCurrent situation
Reproducibility in Medical imaging &Computer vision & Machine learning (cont.):• Benchmark comparison with other methods -
compulsory• Experiment automation• Differences between Medical imaging vs.
Computer vision & Machine learning fieldsExample: IPOL journal
Slide 20
Introduction to Reproducible ResearchReasons
Reasons for reproducibility/replication crisis:• “Publish or perish” culture - pressure to obtain
publishable results• Uneasiness to make method codes public –
additional time and efforts to improve its quality• Most graduate non-CS students are not taught in
software engineering and statistics courses
*
* Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452-454.
Slide 21
Slide 22
Other problems:• Insufficient description of the experiment in the
publications• Test datasets and paper method codes not publicly
available – common in social sciences• The used mathematical methods are inclined to
malpractices – p hacking (data dredging), failing to report non-significant tests, inclusion/exclusion of points/results until achieving the desired result
Introduction to Reproducible ResearchReasons
Slide 23
Introduction to Reproducible ResearchReasons
Problems with method code:• Reproducibility issues – missing method data
and code, method code errors, not all figures and tables are reproduced
• Documentation issues – missing README file, bad code documentation
• Programming style issues – bad coding style
*
* Wolkovich, E. M., Regetz, J., & O'Connor, M. I. (2012). Advances in global change research require open science by individual researchers. Global Change Biology, 18(7), 2102-2110.
Slide 24
Introduction of Reproducible Research Guidance (Biostatistics journal)
Authors should provide all data code inorder to reproduce all results, images andtables with:
• README file• Consistent coding style and documentation• Test data sets• Simulations and random numbers• General advice
* Peng, R. D. (2009). Reproducible research and biostatistics. Biostatistics,10(3), 405-408.
Slide 25
Slide 26
Agenda
1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion
Slide 27
Software tools
Recommended programs to use to achievereproducibility:• Latex (Tex editor)• Version control systems - Git software systems• Make – pipeline
Literate programming concept (Knuth).
Slide 28
Software tools
Matlab programming language:• Matlab file exchange• Proprietary Matlab toolboxes - disadvantages• Examples of RR toolboxes - Wavelab,
Sparselab• Matlab publish – no literate programming
support
Slide 29
Software tools
R programming language:• R studio – development environment for R
programming language• Graphic packages, such as ggplot2• Packages as knitr or rmarkdown – literate
programming support
Slide 30
Software tools
Python programming language:• Many open scientific libraries available – scipy,
numpy, etc.• IPython notebook • Sumatra package – save parameter values,
code state, output results and files
* ISMB/ECCB 2013 Keynote
*
Slide 31
Slide 32
Agenda
1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion
Slide 33
The context – personal experience
Making a current research project reproducible at the end of the process is not the best way ….
* http://www.idiap.ch/~marcel/professional/BTAS_SS_2015.html
*
The context – personal experience
Difficulties with:• Exact reproduction of all figures and results• Exact parameter values setting• Time to improve code quality and add
documentation
Slide 34
Slide 35
The context – personal experience
Motivation for achieving reproducibility:• Better visibility of research• More citations and higher impact• Increased trust in research quality (outside
academia, e.g. from industry)• Help from readers of the publication with the
improvement of the developed method
Slide 36
Agenda
1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion
Slide 37
The situation in Bulgaria and abroad
RR in Bulgaria:• Its introduction in the scientific community is still
at the beginning • Its principles need to be taught at under- graduate and graduate level• Paper code and test datasets, in general, are not available online in most fields
Slide 38
The situation in Bulgaria and abroad
Advances of RR implementation would:• Increase the impact of research conducted by
Bulgarian researchers abroad • Improve reputation and applicability – especially
to people from industry• Faster distinction of quality work and steady
improvement of lower quality papers
Slide 39
The situation in Bulgaria and abroad
Advances of RR implementation (cont.):• Profit from the fast development of scientific
computing, machine learning, data science, and AI• Attract more bright young people in research (open source movement and open data)
Slide 40
The situation in Bulgaria and abroad
RR abroad:• A great issue in social and biomedical sciences• An important criterion for manuscript evaluation
from reviewers in many CS fields• One of major requirements of funding agencies
abroad for the evaluation of project proposals
Slide 41
Agenda
1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion
Slide 42
Additional resources for research and RR methods
MOOC courses:1. Data science specialization (www.coursera.org) (John
Hopkins University) – course 5 Reproducible research2. Methods and Statistics in Social Sciences Specialization
(www.coursera.org) (University of Amsterdam) 3. Research Methods: An Engineering Approach
(www.edx.org) (Wits University )4. Research Data Management and Sharing
(www.coursera.org) (The University of North Carolina at Chapel Hill & The University of Edinburgh)
Slide 43
Additional resources for research and RR methods
Software tools for RR:1. Software carpentry (www.Software-carpentry.org) – basic
computing skills for researchers2. Bootcamps - one or two day long courses – teaching coding
and professional skills for researchers.3. MOOC courses - www.coursera.org, www.edx.org,
www.udacity.org - for programming skills in R, Python, Matlab.
Slide 44
Additional resources for research and RR methods
Books:1. Stodden, V., Leisch, F., & Peng, R. D. (Eds.)
(2014). Implementing Reproducible Research. CRC Press 2. Gandrud, C. (2013). Reproducible Research with R and R
Studio. CRC Press3. Subramanian, G. (2015). Python Data Science Cookbook.
Packt Publishing Ltd4. Milovanovic, I., Foures, D., & Vettigli, G. (2015). Python Data
Visualization Cookbook. Packt Publishing Ltd
Slide 45
Agenda
1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion
Slide 46
Discussion
Topics for discussion:• What do you think about reproducibility,
in general?• Have you already met RR in your work?• How the application of reproducibility might
impact your work as researchers, engineers, or programmers?
Slide 47
End