Top Banner
Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan
18

Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Dec 17, 2015

Download

Documents

Colin Lane
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science

Hypothesis Life-cycle

Ido Dagan

Page 2: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

2

Why to experiment?

W. Tichy, “Should Computer Scientists Experiment More?” (on course web page)

• System/Model/theory testing– Identify incorrectness, incompleteness in your “theory”/assumptions

• This can save money and lives!

– e.g. underlying assumptions that are violated by reality

– Can lead to revising model and/or system

• Exploration– Find new phenomena

– E.g. unknown user behaviors in using systems

Page 3: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

3

Empirical Research Cycle

Established methodology, with very long tradition Natural sciences, social sciences

Cycle: Form theory/model

E.g. search engine ranking function Hypothesize based on theory

More relevant pages higher than less relevant ones Experiment (when possible)

Ask people to judge relevance (binary, score, relative, …) Observe results Find discrepancies between hypothesized predictions and results Revise theory (and publish results)

This course covers especially [hypothesis .... discrepancy] Heavy use of statistics and analytical skills (a bit of art)

Page 4: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

4

Common Practice

• Vague idea

• No preliminary investigation

• No articulation of precise hypothesis

• Bad experimental design

• No iterations

Page 5: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

5

Lots of Ways to Attack Experimentation

Not general – only applies to the “system/setting under test”. E.g. general claims on user behavior true only for one system

Not forward-looking motivations and observations based on the past.

Lack of representative comparison inadequate benchmarks (users are happy with my system…) difficult/costly to implement comparisons

Not enabling independent replication of experiments Real data can be messy – difficult to choose which data

to gather E.g. which aspects of user behavior (speed, satisfaction,

success,…)

Page 6: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

6

Vague idea

1. Understand the problem,frame the questions, articulate the goals.A problem well-stated is half-solved.

“groping around” experiences

Model/Theory

Hypothesis

Initialobservations

Experiment

Data, analysis, interpretation

Results & finalPresentation

Experimental Lifecycle

Page 7: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

7

A Systematic Approach

1. Understand the problem, frame the questions, articulate the goals.A problem well-stated is half-solved.

• Be able to answer “why” as well as “what”• E.g. why people search? Find website? / Find information?

2. Select metrics that will help answer the questions.• Rank of correct website / Percentage or relevant pages in top 10

3. Identify the parameters that affect behavior• System parameters (e.g., HW config, search speed)• Workload parameters (e.g., user request patterns)• Data parameters (e.g. long/short documents)

4. Decide which parameters to study (vary in experiment)

Page 8: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

9

Vague idea

2. Select metrics that will help answer the questions.3. Identify the parameters that affect behavior

“groping around” experiences

Model/Theory

Hypothesis

Initialobservations

Experiment

Data, analysis, interpretation

Results & finalPresentation

Experimental Lifecycle

Page 9: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

11

Behavior Parameters/VariablesExample: software performance

Hardware parameters CPU model and organization, cache organization, latencies in the

system (these will affect running time)

System parameters Memory availability, usage CPU running time (sometimes approximated by world-clock time) Communication bandwidth, usage Program characteristics

requires floating-point, heavy disk usage, integer math, graphics

Page 10: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

13

Now build a model (theory) Mathematically precise

Memory = 2*sizeof(input) + 3 Runtime = 500 + 30*sizeof(input) + 20

Asymptotically correct Memory = O(sizeof(input)) in worst case, Runtime = O(log (sizeof(input))) in best case Accuracy is proportional to run-time

Qualitative User performance is increased with reduced cognitive load Number of bugs discovered is monotonically decreasing if the same

programmer is used, otherwise it increases

Page 11: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

14

Now form hypothesis

Translate qualitative into quantitative

Use of new system will (these are different hypotheses): + Increase operator accuracy (compared to not using it) by X - Decrease failures by Y - Decrease performance time Z

Introducing link information to relevance score will increase ranking quality by 10%

......

Operationalize the hypothesis

Page 12: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

15

What can go wrong at this stage?

Wrong metrics (they don’t address the questions at hand) e.g., ads click through, rather than purchase

Bad metrics: too difficult to measure, too costly Overlooking significant parameters that affect the system Not clear about where the “system under test” boundaries are

E.g. poor ad content rather than poor ad matching Unrepresentative test-setting.

Not predictive of real usage. Just what everyone else uses (adopted blindly) NOT what anyone else uses (no comparison possible)

Page 13: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

16

Vague idea

“groping around” experiences

Model/Theory

Hypothesis

Initialobservations

Experiment

Data, analysis, interpretation

Results & finalPresentation

Experimental Lifecycle

1. Decide which parameters to vary

2. Select technique3. Select measurements

Page 14: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

17

1. Decide which parameters to study (vary)2. Select measurement technique:

• Can we directly measure what we want?• Intrusive (invasive) versus unobtrusive measurement

• How invasive? Can we quantify interference of monitoring? • E.g. should user mark relevance, or we just follow clicks?

• Simulation – how detailed? Validated against what?• Benchmarks• Repeatability

3. Experiment design

– Lesion studies / ablation tests (with and without component)

– Iron-man (e.g. human performance), straw-man

– Baseline, ceilings and floors

– Factorial design

A Systematic Approach

Page 15: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

18

Vague idea

“groping around” experiences

Hypothesis

Model

Initialobservations

Experiment

Data, analysis, interpretation

Results & finalPresentation

Experimental Lifecycle

1. Run experiments2. Analyze and interpret

data3. Data presentation

Page 16: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

19

1. Run experiments• How many trials? How many combinations of parameter

settings? (e.g. users age groups)• Practically limited

2. Analyze and interpret data• Descriptive statistics

• Dealing with variability, outliers• Hypothesis testing: sample vs. population

• Potentially infinite population (e.g. software runs)• Claims on variable values for population based on sample

variables• Statistical significance

3. Data presentation

A Systematic Approach

Page 17: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

22

1. Run experiments• How many trials? How many combinations of parameter

settings?• Sensitivity analysis on other parameter values.

2. Analyze and interpret data• Statistics, dealing with variability, outliers

3. Data presentation4. Where does it lead us next?

• New hypotheses, new questions, a new round of experiments

A Systematic Approach

Page 18: Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan.Portions © Carla Ellis at Duke University

23

Vague idea

“groping around” experiences

Model/Theory

Hypothesis

Initialobservations

Experiment

Data, analysis, interpretation

Results & finalPresentation

Experimental Lifecycle