Reliable Research: Towards Experimental Standards for Computer Science [by Justin Zobil] Presented by: Osama Alsaadoun.

Reliable Research: Towards ExperimentalStandards for Computer Science[by Justin Zobil]

Presented by:

Osama Alsaadoun

Outline Introduction

Standards for Experimentation

Motivations for Record Keeping

Experiments in Computer Science

Practice in Computer Science

Summary

Introduction What’s the basic definition of a standard:◦a required or agreed level of quality or attainment

◦an idea or matter used as a measure, norm, or model in comparative evaluations.

Why would we need standards for experimental research? Standards in conducting research generally exist to address objectives like:◦Material authorship◦Abuse of power◦Scientific fraud◦Objectivity◦Human & animal rights

Example of Research Standards Research guidelines published by Joint National Health & Medical Research Council/Australian Vice-Chancellor's Committee (AVCC) statement and guidelines on research practice (last update: Nov 2014)

AVCC Statement and Guidelines exist to guide institutions in developing their own procedures and guidelines, by providing a comprehensive framework of minimum acceptable standards.

AVCC Structure The AVCC Guidelines Categories

◦ 1. General Principles◦ 2. Data storage and retention◦ 3. Authorship◦ 4. Publication◦ 5. Supervision of students / research trainees◦ 6. Disclosure of potential conflicts of interest◦ 7. Research misconduct

AVCC Excerpts◦ 1. General Principle

◦ 1.1 Institutions must establish procedures and guidelines on good research practice, and on steps to be followed if suspicions or allegations exist regarding research misconduct. Those procedures and guidelines must meet the standards set out in this document.

◦ 2. Data Storage & Retention◦ 2.1 Data (including electronic data) must be recorded in a durable and appropriately referenced form.

◦ 3. Authorship◦ 3.1 Each institution must establish a written policy on the criteria for authorship of a research output◦ 3.2 Authorship of a research output is a matter that should be discussed between researchers at an

early stage in a research project, and reviewed whenever there are changes in participation

AVCC Excerpts 4. Publication

◦ 4.2 An author who submits substantially similar work to more than one publisher must disclose this to the publishers at the time of submission.

5. Supervision of students / research trainees◦ 5.5 The supervisor must ensure, as far as possible, the validity of research data obtained by a

student under his/her supervision.

6. Disclosure of potential conflicts of interest◦ 6.1 Institutions must have clearly formulated policies regarding potential conflicts of interest.

7. Research misconduct◦ “Misconduct” or “Scientific Misconduct” is taken here to mean fabrication, falsification,

plagiarism, or other practices that seriously deviate from those that are commonly accepted within the scientific community….

Research in Computer Science Does computer science have research standards? Implications?

Research Standards in Computer Science

Computer science doesn’t have accepted standards for conducting research or experiments recording

While many researchers in computer science keep some record of their work, many others appear not to

Computer scientists do not as undergraduates receive the kind of training in scientific method that is compulsory in other disciplines (Really??)◦ Students of other disciplines will use the research methods in their professional careers, but why not

computer science students?

Failure to keep such records is unlikely seen as poor practice by CS community

Research Practice in Computer Science◦Justification:

◦ Recording not as crucial as medicine?◦ Other methods to verify Computing research experiment results◦ records are kept, in effect, by the automatic mechanisms of dumps

and backups!◦Implications:

◦ No generally accepted standards can mean implicit approval of fraudulent research

◦ Allows researchers to deliberately publish claims not supported by their experiments

Issues with CS Research Standards

The current lack of standards is inconsistent with the expectations of the wider scientific community, in breach of published guidelines, and encourages publication of poor research.

Record-keeping practices for computer science research should be designed to meet the needs of sounds research practices (proof, rigor, verifiability, etc.)◦ They should be reasonable and endurable, so they can gain wider acceptance by the research

community◦ Guidelines must have consensual support if they are to have any authority.◦ They should also be designed to be appropriate and comprehensive enough to support

various research activities or different kinds of work requiring different structures records.

Q: Is AVCC for Computer Science? Main issue about AVCC is that it was written for the general scientific community, and may not for instance apply to computer science research◦ Key difference is the use of data to be retained: in CS, data is

usually the subject, and else is outcome◦ In this context, the guidelines doesn’t require recording of

“subjects”

Motivations to Record Keeping Why do we need to keep records?

Motivations to Record Keeping Proof: Research records may be the only evidence of an effectively irreproducible experiment, then records are the single source of data used for the basis of published research. However, such records can be reused in the future to draw new inferences around the original experiment.

Originality: It provides evidence of precedent: to establish prior work (or inventions), or in events of plagiarism accusations.

Rigor: Record keeping drives researchers to ensure certain degrees of care while conducting the experiments. Good record keeping also helps detecting errors in setting experiment parameters, if expected outcomes fail to result.

Motivations to Record Keeping Elucidation: record keeping enforces a research practice that gathers notes and discussions that clarify vague ideas, or explain the intents, thoughts, descriptions, and expected outcomes of an experiment, provides an excellent resource when the researcher is assembling material for publication.

Reproduction: gathering new data about the same phenomena; the data can differ yet remain consistent with being good evidence of the properties in question.

Motivations to Record Keeping Verification: checking whether the experiments were conducted and analyzed with appropriate care, or indeed whether they were conducted to begin with, whether the claims are justified by the results, and whether the published results are a fair reflection of the experimental outcomes.◦ a given set of records will provide a measure of certainty that the

work was conducted as described and with adequate care, but is not an absolute or firm evidence

Motivations to Record Keeping◦ Misconduct: Records keeping is valuable for detecting and

inhibiting misconduct; if record-keeping is expected, a scientist must plan to be unethical if research results are to be forged.

◦ The more thorough that records are expected to be, the harder it is to falsify them

◦ With low standards for records, a researcher can decide on the spur of the moment to report a falsehood; with better standards, reporting a falsehood requires considerable effort or risk.

Recommendation Summary Categorizing Computer Science Research Experimental Records Computer Science Publications

Categories of Computer Science Research

For illustrative purposes, the author suggests the following categorization for computer science research:◦ Evaluating whether an algorithm (or more generally a system) behaves as

predicted. Behavior might involve resource requirements or correctness, for example.

◦ Comparing algorithms with regard to particular properties.◦ Identifying appropriate parameters or typical resource requirements for an

algorithm◦ Demonstrating that a concept is feasible in practice◦ Testing of human factors, such as reaction to an interface or ability of a

retrieval system to identify relevant information.

Record Requirements in Computer Science Research

The kinds of records needed for computer science is generally dependent on the kind of conducted research:

As per proceedings of the 1997 Australasian Computer Science Conference [Ramamohanarao and Zobel, 1997] cited - for example - several instances of experiments ◦ comparing different algorithms◦ compiler optimizations◦ demonstrating the convergence of a method for eigenvalue computation◦ demonstration of learning in a neural network◦ evaluation of a browsing interface◦ evaluation of schedulers◦ comparative measurements of system performance

Types of Records in a Computer Science Research

In computer science research, there should be careful records of:

◦ Subjects and responses, together with descriptions of the apparatus and experimental environment.

◦ In the cases involving algorithms, the apparatus in some sense documents itself. The code embodies a great deal of the matter of the experiment

◦ version numbers

◦ configuration of parameters applied to the subject

Recommendation on Computer Science Publications

Publication of computer science research results should usually be based on three separate, mutually supporting elements:◦ Notebooks: provide a guidebook to the experiments. They should contain

descriptions of ideas and show the progress of the research. Can be used to record:◦ dates; daily notes; names and locations of code, scripts, input, and other files◦ important references and web addresses◦ minutes of discussions; bug reports; locations and identifying marks of paper records◦ experimental parameters ◦ intent, outcomes, and interpretation of experiments

Recommendation on Computer Science Publications◦ Code is required if the experiments are to be run again, and at an

absolute minimum researchers should preserve the exact code used to yield any published results, and if possible the exact input.◦ Notebooks should capture versions and kinds of changes made to the code.◦ Unsuccessful variations of the code should also be kept even if with known bugs, as such

bugs may have been a factor in experimental results.

◦ Logs should be complete transcripts of the output of each experiment.◦ In a human consumable formats: table, list of averaged values, etc.◦ To avoid selection bias, the logs should constitute of multiple runs with input of

differentiated sets of data.

Research Elements Maintenance Requirements

General maintenance requirements of these elements:

◦ Storage in a centralized trusted repository (online or offline)

◦ Use of verifiable timestamps or secure digital signatures

◦ Mechanism to verify matches between code and corresponding changes in notebooks

◦ Extends public availability to community researchers, as applicable

Summary Standards for recording research are not high in computer science As a discipline, Computer Science community needs to develop agreed standards and practices for conducting and recording research.

Record-keeping need not be a burden: it encourages experimental rigor and can reduce the effort of producing finished research.

Final Quote “At conferences such as ACSC I have often heard researchers debate the question of whether computer science is a science; a question which as it stands is probably meaningless, as it depends on highly individual interpretations of what science is. However, given that much suggests that the paradigms of modern science are appropriate for computer science" [Stewart, 1995], a more pertinent question is whether computer science research adheres to scientific standards. Too often, the answer to that question is no". Adoption of better experimental practice will help to change that answer.”

Thanks!

Reliable Research: Towards Experimental Standards for Computer Science [by Justin Zobil] Presented by: Osama Alsaadoun.

Documents

research misconduct

research output

experimental research

research project

reliable research

research misconduct

validity of research

good research practice