Top Banner
Repeatable and Reproducible Evaluation Fraida Fund NYU Polytechnic School of Engineering [email protected]
33

Reproducible Evaluation Repeatable and

Apr 09, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reproducible Evaluation Repeatable and

Repeatable and Reproducible Evaluation

Fraida FundNYU Polytechnic School of Engineering

[email protected]

Page 2: Reproducible Evaluation Repeatable and

“In industry, we ignore the evaluation in academic papers. It is often wrong and always

irrelevant.”

- Head of a major industrial lab, 2011

Source of quote: Vitek, Jan, and Tomas Kalibera. "R3: Repeatability, reproducibility and rigor." ACM SIGPLAN Notices 47, no. 4a (2012): 30-36. http://janvitek.github.io/pubs/r3.pdf)

Page 3: Reproducible Evaluation Repeatable and

Common problems in evaluation

● Unclear goals● Meaningless measurements● No baseline (or wrong baseline)● Not representative● Implicit assumptions● Weak statistics● Ineffective or misleading graphics● Proprietary code and data● Results are not reproducible

Page 4: Reproducible Evaluation Repeatable and

Repetition

The ability to re-run the exact same experiment with the same method on the same or similar system and obtain the same or very similar result.

Page 5: Reproducible Evaluation Repeatable and

Reproducibility

Independent confirmation of qualitative results by a third party, using the description of experiment design in the report/paper.

Page 6: Reproducible Evaluation Repeatable and

Six degrees of reproducibility

5: The results can be easily reproduced by an independent researcher with at most 15 min of user effort, requiring only standard, freely available tools (C compiler, etc.).

Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf

Page 7: Reproducible Evaluation Repeatable and

Six degrees of reproducibility

4: The results can be easily reproduced by an independent researcher with at most 15 minutes of user effort, requiring some proprietary source packages (MATLAB, etc.).

Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf

Page 8: Reproducible Evaluation Repeatable and

Six degrees of reproducibility

3: The results can be reproduced by an independent researcher, requiring considerable effort.

Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf

Page 9: Reproducible Evaluation Repeatable and

Six degrees of reproducibility

2: The results could be reproduced by an independent researcher, requiring extreme effort.

Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf

Page 10: Reproducible Evaluation Repeatable and

Six degrees of reproducibility

1: The results cannot seem to be reproduced by an independent researcher.

Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf

Page 11: Reproducible Evaluation Repeatable and

Six degrees of reproducibility

0: The results cannot be reproduced by an independent researcher.

Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf

Page 12: Reproducible Evaluation Repeatable and

How reproducible is CS systems research?

Page 13: Reproducible Evaluation Repeatable and

Versioning problems

Page 14: Reproducible Evaluation Repeatable and

We’ll give you code… soon

Page 15: Reproducible Evaluation Repeatable and

No plans to release the code

Page 16: Reproducible Evaluation Repeatable and

Only student knew how to use, left

Page 17: Reproducible Evaluation Repeatable and

Proprietary code

Page 18: Reproducible Evaluation Repeatable and

Depends on proprietary/obsolete systems

Page 19: Reproducible Evaluation Repeatable and

Poor design

Page 20: Reproducible Evaluation Repeatable and

Build errors

Page 21: Reproducible Evaluation Repeatable and
Page 22: Reproducible Evaluation Repeatable and

How to create a reproducible experiment

Page 23: Reproducible Evaluation Repeatable and

Experiment design

❏ Is there a clear mapping between your experiment goal and experiment design?

❏ Does your experiment achieve your goal with the minimum amount of work possible?

❏ Is it clear what the “result” of your evaluation is?

❏ Are there as few manual steps in your experiment as possible?

❏ Are the tools used in your experiment open and widely available?

Page 24: Reproducible Evaluation Repeatable and

Data analysis and visualization

❏ Did you separate raw and processed data?❏ Do you have a data analysis and

visualization script? (No manual calculations or interactive image generation!)

❏ Did you share the raw and processed data and script used to generate any images in your report?

❏ Are you using version control?❏ Do you follow good statistics and data

integrity practices?

Page 25: Reproducible Evaluation Repeatable and

Documentation

❏ Is it clear where to begin? (e.g., can someone picking a project up see where to start running it)

❏ Are there instructions for setting up the experiment and executing it?

❏ Do you explain non-obvious steps in the instructions?

❏ Have you noted the exact version of every external application used in the process?

❏ Are you using version control?

Page 26: Reproducible Evaluation Repeatable and

Lab exercises

Page 27: Reproducible Evaluation Repeatable and

Final lab exercises

Routing (repeatable and reproducible): ● Dijkstra’s algorithm● OSPF

Software defined networks● Just to give you another tool to use in

potential projects

Page 28: Reproducible Evaluation Repeatable and

Projects

● Form groups of 3 or 4● Project will run on GENI

○ Lab exercises give you some software tools to use: iperf, netem, tinyhttpd, OSPF setup, SDN, others

○ May use these or other software● Must use good experiment design practices● Must use good practices for communicating

quantitative results● Must use good practices for creating

reproducible experiments

Page 29: Reproducible Evaluation Repeatable and

Projects

The labs are meant to help you, so you can use them as a jumping-off point for projects

Topics can include:● Data center networks● Congestion and flow control● Routing and resiliency● SDN● Other topics related to HSN

Page 30: Reproducible Evaluation Repeatable and

Projects

Start thinking about your project● Work in groups of 3-4● Must have reasonable division of labor (every student

takes responsibility for a part of the project)● Must apply lessons from the lab lectures

● Will give you specific instructions for proposal before spring break.

● Project proposals due @ midterm.

Page 31: Reproducible Evaluation Repeatable and

Lab coverage on midterm

Lab topics are included on midterm:● Using networking testbeds● Experiment design● Communicating results● Reproducible experiments

Will give some example problems for you to work on.

Page 32: Reproducible Evaluation Repeatable and

Getting help

● Office hours on lab website ● Asking for help on the Internet

○ For e.g. Git Bash, R usage, there’s lot of information online

○ GENI Users Group: https://groups.google.com/forum/#!forum/geni-users

○ If you ask a question, cite it in your report

Page 33: Reproducible Evaluation Repeatable and

References1. Raj Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental

Design, Measurement, Simulation, and Modeling," Wiley- Interscience, New York, NY, April 1991, ISBN:0471503361.

2. Moraila, G., Shankaran, A., Shi, Z., & Warren, A. M. “Measuring Reproducibility in Computer Systems Research.” Tech Report (2014). http://reproducibility.cs.arizona.edu/tr.pdf

3. Vitek, Jan, and Tomas Kalibera. "R3: Repeatability, reproducibility and rigor." ACM SIGPLAN Notices 47, no. 4a (2012): 30-36. http://janvitek.github.io/pubs/r3.pdf

4. P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf

5. Edwards, Sarah, Xuan Liu, and Niky Riga. "Creating Repeatable Computer Science and Networking Experiments on Shared, Public Testbeds." ACM SIGOPS Operating Systems Review 49, no. 1 (2015): 90-99. http://mescal.imag.fr/membres/arnaud.legrand/research/readings/acm_sigops_si_rsea/p90-edwards.pdf and http://groups.geni.net/geni/wiki/PaperOSRMethodology

6. Leek, Jeff. The elements of data analytic style. 20157. Handigol, Nikhil, Brandon Heller, Vimalkumar Jeyakumar, Bob Lantz, and Nick McKeown.

"Reproducible network experiments using container-based emulation." In Proceedings of the 8th international conference on Emerging networking experiments and technologies, pp. 253-264. ACM, 2012. http://tiny-tera.stanford.edu/~nickm/papers/p253.pdf and https://reproducingnetworkresearch.wordpress.com/