Top Banner
Towards Understanding the Replication of SE Experiments Natalia Juristo Universidad Politecnica de Madrid (Spain) & University of Oulu (Finland) ESEM Conference Baltimore (USA) October 11th, 2013
57

Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Jul 01, 2015

Download

Technology

Natalia Juristo

To consolidate a body of knowledge built upon evidence, experimental results have to be extensively verified. Experiments need replication at other times and under other conditions before they can produce an established piece of knowledge. Several replications need to be run to strengthen the evidence.

Most SE experiments have not been replicated. If an experiment is not replicated, there is no way to distinguish whether results were produced by chance (the observed event occurred accidentally), results are artifactual (the event occurred because of the experimental configuration but does not exist in reality) or results conform to a pattern existing in reality.

The immaturity of experimental SE knowledge has been an obstacle to replication. Context differences usually oblige SE experimenters to adapt experiments for replication. As key experimental conditions are yet unknown, slight changes in replications have led to differences in the results that prevent verification.

There are still many uncertainties about how to proceed with replications of SE experiments. Should replicators reuse the baseline experiment materials? How much liaison should there be among the original and replicating experimenters, if any? What elements of the experimental configuration can be changed for the experiment to be considered a replication rather than a new experiment?

The aim of replication is to verify results, but different types of replication serve special verification purposes and afford different degrees of change. Each replication type helps to discover particular experimental conditions that might influence the results. We need to learn which types of replications are feasible in SE as well as the acceptable changes for each type and the level of verification provided.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Towards Understanding

the Replication of

SE Experiments

Natalia JuristoUniversidad Politecnica de Madrid (Spain)

&University of Oulu (Finland)

ESEM Conference Baltimore (USA) October 11th, 2013

Page 2: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Scope & Terminology

This talk focuses on the replication of

experiments

I will refer to the study whose results we

want to check as the baseline experiment

Page 3: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Replication Intuitive Definition

Deliberate repetition of research

procedures in a second investigation for

the purpose of determining if earlier

results can be reproduced

Page 4: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Content

Replication & the experimental paradigm

State of replication in ESE: practice & theory

Shedding a bit of light Purposes for replicating

Replication functions

Some answers Replication limits

Baseline and replication minimum degree of similarity

Admissible changes

Reproduction of results

Threats to reuse materials

Summary

Page 5: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Role of Replication in

Experimental Paradigm

Page 6: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Searching for Regularities

Science does not settle for anecdotes. A

scientific law or theory describes a regular

occurrence in the world

Regularities existing in reality are identified by

reproducing the same event in different

replications

The result of one experiment is an isolated

event

Page 7: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

One Result, Three Meanings

Without reproduction of results it is impossible

to distinguish whether they

occurred by chance

are artifactual

the event occurs only in the experiment, not in reality

really correspond to a regularity

Page 8: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

State of Replication in ESE

Practice

Page 9: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Not Enough Replications

Most SE experiments have not yet been

replicated

Two reviews provide empirical data to

support this point

Let us look at their results

Page 10: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Experiments in Leading Journals

& Conferences (1993-2002)

5,453 articles published from 1993 to 2002

in major SE journals and conference

proceedings

113 experiments

20 (17.7%) described as replications

Sjøberg et al. “A survey of

controlled experiments in SE” TSE, 2005

Page 11: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

All Publications

96 papers reporting replications

133 replications of 72 baseline studies

Any type of empirical study Quasi-Experiments 35 49%

Controlled Experiments 21 29%

Case Study 15 21%

Survey 1 1%

da Silva et al. “Replication of Empirical Studies in

Software Engineering Research: A Systematic

Mapping Study” EMSE 2013

Page 12: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Mostly Internal & Not Stand-Alone

Publications [Da Silva et al. 2013]

Internal – Original-Included reports

Internal – Replication-only reports

1994 1995 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

0

0 0 2 1 1 1 3 1 3 6 1 3 5 2 4 1

3 0 0 3 1 2 2 1 3 11 7 5 2 6 14

1996

0

0

Nu

mb

er o

f R

ep

lica

tio

ns

2

4

6

8

10

12

14

16

Total 1 5 6 4 5 6 6 4 4 9 15 11 13 9 13 220

18

20

22

External 1 2 4 3 1 4 1 1 0 0 3 1 3 5 3 70

Page 13: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

First Paper Published on a

Replication [Da Silva et al. 2012]

In SE, the first article that explicitly reported

a replication of an empirical study was

published in 1994 Daly, Brooks, Miller, Roper & Wood

Verification of Results in Sw Maintenance Through

External Replication

Intl Conf on Software Maintenance

Page 14: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

EMSE Special Issue on

Replication

The large number of submissions was

admittedly more than we expected

We received a total of 16 submissions

Encouraging the publication of replications

will foster researchers to replicate more

studies

Page 15: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

State of Replication in ESE

Theory

Page 16: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

First Theoretical Publication on

Replication

In 1999, a paper discussed a framework to

organize sets of related experiments

(families) and the generation of knowledge

from such sets Basili, Shull & Lanubile. Building knowledge through

families of experiments. TSE

Is a family exactly a set of replications? …experiments can be viewed as part of common

families of studies, rather than being isolated

events…

Page 17: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

More Activity in the Last 10 Years Shull, Basili, Carver, Maldonado, Travassos, Mendonça & Fabbri Replicating

software engineering experiments: Addressing the tacit knowledge problem.

ISESE 2002

Vegas, Juristo, Moreno, Solari & Letelier Analysis of the influence of

communication between researchers on experiment replication. ISESE 2006

Brooks, Roper, Wood, Daly & Miller Replication’s role in software engineering.

Guide to Advanced Empirical SE. Springer 2008

Juristo & Vegas Using differences among replications of software engineering

experiments to gain knowledge. ESEM 2009 [Juristo & Vegas The Role of Non-

Exact Replications in SE EMSE Journal 2011]

Krein & Knutson A Case for replication: Synthesizing research methodologies in

SE. RESER 2010

Gómez, Juristo & Vegas Replications types in experimental disciplines. ESEM

2010

Juristo, Vegas, Solari, Abrahao & Ramos A Process for Managing Interaction

between Experimenters to Get Useful Similar Replications IST 2013

Page 18: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

State of the Theory

There is no agreement yet on terminology,

typology, purposes, operation and other

replication issues

There is not even agreement on what a

replication is!!

Different authors consider different types of

changes to the baseline experiment as

admissible

Page 19: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Example of Divergent Views

Some researchers advise the use of different protocols

and materials to preserve independence and prevent

error propagation in replications by using the same

configuration

Kitchenham

The role of replications in ESE - a word of warning EMSE 2008

Other researchers recommend the reuse of materials to

assure that replications are similar enough for results to

be comparable

Shull, Carver, Vegas & Juristo

The role of replications in ESE EMSE 2008

Page 20: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Shedding some Light on

Replication

Role of Replication

Page 21: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

The Two Roles of Replication

Validation

Learning

Page 22: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Learning Relevant Conditions

As more replications of Thompson and

McConnell’s baseline experiment were run

different conditions influencing the results of this

experiment were identified

After several hundred experiments had been run

experimenters managed to identify around 70

conditions influencing the behavior of this type of

invertebrate

Page 23: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Which are the Important

Variables?

“…In fact, the principle of Transversely

Excited Atmospheric (TEA) lasers,

scientists did not know that the inductance

of the top was important”

A physicist quoted in

Changing Order: Replication and Induction in Scientific Practice

Harry Collins 1992

Page 24: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

First Learn, Then Validate

“In the early stages, failure to get the expected

results is not falsification but a step in the

discovery of some interfering factor.

For immature experimental knowledge, the first

step is … to find out which experimental

conditions should be controlled”

Validity and the Research Process

Brinberg and McGrath 1985

Page 25: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

SE Problems with Identical

Replications

SE has tried to repeat experiments identically, but no exact replications have yet been achieved

The complexity of the software development setting prevents the many experimental conditions from being reproduced identically

Yet this is a regular rather than an exceptional situation

Page 26: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

In the Beginning Most is Unkown

“Most aspects are unknown when we start

to study a phenomenon experimentally.

Even the tiniest change in a replication

can lead to inexplicable differences in the

results”

Validity and the Research Process

Brinberg and McGrath 1985

Page 27: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Start with Similar Replications

“The less that is known about an area the more

power a very similar experiment has ... This is

because, in the absence of a well worked out set

of crucial variables, any change in the

experiment configuration, however trivial in

appearance, may well entail invisible but

significant changes in conditions”

Changing Order:

Replication and Induction in Scientific Practice

Harry Collins 1992

Page 28: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Learning & Validation Process

1. Start with identical replications At the beginning of experimental research, equality, even if

targeted, will not happen

There will be either invisible but significant changes in conditions or

induced changes due to context adaptation or both

Failure to get the expected results should not be construed as

falsification, but as a step towards the discovery of some new

factor

2. Later on, both knowledge discovery and testing

can be more systematic Changes in the configuration will be made purposely to learn

more variables and rule out artifactual results

Page 29: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Learning is Even More Important

Replication is needed not merely to

validate one’s findings, but more

importantly, to establish the increasing

range of radically different conditions

under which the findings hold, and the

predictable exceptions

The design of replicated studies. American Statistician

Lindsay and Ehrenberg 1993

Page 30: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Shedding some Light on

Replication

Functions of Replication

Page 31: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Reminder of Experimental Setting

Operationalization Treatments

Response variable

Protocol Experimental design

Experimental objects

Guides

Measuring instruments

Data analysis techniques

Population Objects

Subjects

Page 32: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Verification Functions

Control experimental errors

Control protocol independence

Understand operationalization limits

Understand population limits

Page 33: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Control Experimental Errors

Verify that the results of the baseline experiment

are not a chance product of an error

All elements of the experiment must resemble

the baseline experiment as closely as possible

Collateral benefit

Provide an understanding of the natural (random)

variation of the observed results

critical for being able to decide whether or not results hold in

dissimilar replications

Page 34: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Control Protocol Independence

Verify that the results of the baseline experiment

are not artifactual

An artifactual result is due to the experimental

configuration and cannot be guaranteed to exist in

reality

The experimental protocol needs to be changed

for this purpose

If an experiment is replicated several times using the

same materials, the observed results may occur due

to the materials

The same applies for all protocol elements

Page 35: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Understanding Operationalization

Limits

Learn how sensitive results are to different

operationalizations

Treatment operationalizations

treatment application procedures, treatment

instructions, resources, treatment

transmission …

Effect operationalizations

Metrics, measurement procedures …

Page 36: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Understand Population Limits

Learn the extent to which results hold for

other subject types or other types of

experimental objects

Learn to which specific population the

experimental sample belongs and what

the characteristics of such population are

Page 37: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Changes & Replication Functions

Experimental

Configuration

Control

Experimental

Error

Control Protocol

Independence

Understand

Operationalization

Limits

Understand

Population Limits

Operationalization = = ≠ =

Population = = = ≠

Protocol = ≠ = =

Function of Replication

LEGEND: = the element is equal to, or as similar as possible to, the baseline experiment

≠ the element varies with respect to the baseline experiment

Page 38: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Some Answers

Page 39: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Question 1

What exactly is a replication

study?

Page 40: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Establishing Limits

Amount of Changes

Run

Same data

different

models or

statistical

methods

Identical

same site &

researchers

+

Protocol

changes

Operationa-

lization

changes

Population

changesJust the

hypothesis

kept

RE-ANALYSIS REPETITION REPLICATION REPRODUCTION

Not Run

Page 41: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Question 2

What level of similarity should an

experiment have to be

considered a replication rather

than a new experiment?

Page 42: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Unchanged elements of a

replication

A replication must share the hypothesis

with the baseline experiment

Same response variable

although not same metric

Same treatments

although not same operationalization

at least two treatments in common

Page 43: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Partials Replications

Exp. A (Baseline OR Replication)

Exp. B (Replication OR Baseline)

Exp. D

Replication C

RV3RV2

RV1

T5

T4

T3

T2

T6

T1

Page 44: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Question 3

What changes are acceptable?

Page 45: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Levels of Verification

Similarity between the baseline experiment and a replication serve different verification purposes depending on the changes made Replicating an experiment as closely as possible

Verifies results are not accidental

Varying the experimental protocol

Verifies results are not artifactual

Varying the population properties

Verifies types of populations for which the results hold

Varying the operationalization

Verifies range of operationalizations for which the results hold

Page 46: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Can I Change Everything?

It is better not to change everything at the

same time

We can understand the source of differences

in results better if only one change is made at

a time

But a replication with a lot of changes is

not rendered useless or doomed to failure

We will just need to wait until other

replications are run

Page 47: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Everything Different

Changes

Different

Identical

Experiment

Elements

Page 48: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

One Change at a Time

Changes

Different

Iqual

Experiment

Elements

Page 49: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Increasing Changes

Changes

Different

Iqual

Experiment

Elements

Page 50: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Question 4

What is the level of similarity that

results must have to be

considered as reproduced?

Page 51: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Understanding the Natural

Variation

Identical replications are useful for

understanding the range of variability of

results

This provides an estimate for other

experimenters to use as a baseline when

they replicate the experiment

Page 52: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Question 5

Should replications reuse

materials?

Page 53: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Accomodating Opposing Views

The possible threat of errors being propagated

by experimenters exchanging materials does not

mean discarding replications sharing materials

Replications with identical materials and protocols

(and possibly the same errors) are a necessary step

for verifying that an exact replication run by others

reproduces the same results

Other replications that alter the design and other

protocol details should be performed in order to

assure that the results are not induced by the protocol

Page 54: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Accomodating Opposing Views

Replication functions accommodate

opposing views within a broader

framework

Contrary stances are really tantamount to

different types of replication conducted for

different purposes

Different ways of running replications are

useful for gradually advancing towards

verified experimental results

Page 55: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Summarizing

Page 56: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Main Ideas

Replication in ESE

Replication plays an essential role in the experimental paradigm

Replication is not a regular practice in ESE today

More methodological research on the adoption and tailoring of

replication in ESE is still necessary

Clarifying conceptions

Replication is necessary not merely to validate findings, but,

more importantly, to discover the range of conditions under

which the findings hold

Replications provide different knowledge depending on the

changes to the baseline experiment

Knowledge gained from a replication needs to relate changes

and findings

Page 57: Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Towards Understanding

the Replication of

SE Experiments

Natalia JuristoUniversidad Politecnica de Madrid (Spain)

&University of Oulu (Finland)

ESEM Conference Baltimore (USA) October 11th, 2013