Top Banner
Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith
50

Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Dec 26, 2015

Download

Documents

Martin Jacobs
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Empirical Research Methods in Computer Science

Lecture 4November 2, 2005Noah Smith

Page 2: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Today

Review bootstrap estimate of se (from homework).

Review sign and permutation tests for paired samples.

Lots of examples of hypothesis tests.

Page 3: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Recall ...

There is a true value of the statistic. But we don’t know it.

We can compute the sample statistic.

We know sample means are normally distrubuted (as n gets big):

n)x(se xx

Page 4: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

But we don’t know anything about the distribution of other sample statistics (medians, correlations, etc.)!

Page 5: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Bootstrap world

unknown distribution F

observed random sample X

statistic of interest )X(sˆ

empirical distribution

bootstrap random sample X*

bootstrap replication *)X(s*ˆ

statistics about the estimate (e.g., standard error)

Page 6: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Bootstrap estimate of se

Run B bootstrap replicates, and compute the statistic each time:θ*[1], θ*[2], θ*[3], ..., θ*[B]

B

1i

2

B1B

*ˆ]i[*ˆ*ˆse

B

]i[*ˆ

B

1i

(mean of θ* across replications)

(sample standard deviation of θ* across replications)

Page 7: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Paired-Sample Design

pairs (xi, yi) x ~ distribution F y ~ distribution G How do F and G differ?

Page 8: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Sign Test

H0: F and G have the same medianmedian(F) – median(G) = 0

Pr(x > y) = 0.5 sign(x – y) ~ binomial distribution compute bin(N+, 0.5)

N

Nn

5.0,nbinp

Page 9: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Sign Test

nonparametric(no assumptions about the data)

closed form(no random sampling)

Page 10: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Example: gzip speed

build gzip with –O2 or with –O0

on about 650 filesout of 1000,gzip-O2 was faster

binomial distribution, p = 0.5, n = 1000p < 3 x 10-24

Page 11: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Permutation Test

H0: F = G Suppose difference in sample

means is d. How likely is this difference (or a

greater one) under H0? For i = 1 to P

Randomly permute each (xi, yi) Compute difference in sample means

Page 12: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Permutation Test

nonparametric(no assumptions about the data)

randomized test

Page 13: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Example: gzip speed

1000 permutations:difference ofsample meansunder H0 iscentered on 0

-1579 is veryextreme; p ≈ 0

Page 14: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Comparing speed is tricky!

It is very difficult to control for everything that could affect runtime.

Solution 1: do the best you can. Solution 2: many runs, and then

do ANOVA tests (or their nonparametric equivalents).

“Is there more variance between conditions than within conditions?”

Page 15: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Sampling method 1

for r = 1 to 10 for each file f

for each program p time p on f

Page 16: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Result (gzip first)student 2’s program faster than gzip!

Page 17: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Result (student first)

student 2’s program is slower than gzip!

Page 18: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Sampling method 1

for r = 1 to 10 for each file f

for each program p time p on f

Page 19: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Order effects

Well-known in psychology. What the subject does at time t

will affect what she does at time t+1.

Page 20: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Sampling method 2

for r = 1 to 10 for each program p

for each file f time p on f

Page 21: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Result

gzip wins

Page 22: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Sign and Permutation Tests

median(F) median(G)

all distribution pairs (F, G) F G

Page 23: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Sign and Permutation Tests

median(F) median(G)

all distribution pairs (F, G) F G

sign test rejects H0

Page 24: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Sign and Permutation Tests

median(F) median(G)

all distribution pairs (F, G) F G

permutation test rejects H0

Page 25: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Sign and Permutation Tests

median(F) median(G)

all distribution pairs (F, G) F G

permutation test rejects H0

sign test rejects H0

Page 26: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

There are other tests!

We have chosen two that are nonparametric easy to implement

Others include: Wilcoxon Signed Rank Test Kruskal-Wallis (nonparametric

“ANOVA”)

Page 27: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Pre-increment?

Conventional wisdom:

“Better to use ++x than to use x++.”

Really, with a modern compiler?

Page 28: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Two (toy) programs

for(i = 0; i < (1 << 30); ++i)j = ++k;

for(i = 0; i < (1 << 30); i++)j = k++;

ran each 200 times (interleaved) mean runtimes were 2.835 and 2.735 significant well below .05

Page 29: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

What?

leal -8(%ebp), %eaxincl (%eax)movl -8(%ebp), %eax

movl -8(%ebp), %eaxleal -8(%ebp), %edxincl (%edx) %edx is not used anywhere else

Page 30: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Conclusion

Compile with –O and the assembly code is identical!

Page 31: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Why was this a dumb experiment?

Page 32: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Pre-increment, take 2

Take gzip source code. Replace all post-increments with

pre-increments, in places where semantics won’t change.

Run on 1000 files, 10 times each. Compare average runtime by file.

Page 33: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Sign test

p = 8.5 x 10-8

Page 34: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Permutation test

Page 35: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Conclusion

Preincrementing is faster!

... but what about –O? sign test: p = 0.197 permutation test: p = 0.672

Preincrement matters without an optimizing compiler.

Page 36: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Joke.

Page 37: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Your programs ...

8 students had a working program both weeks.

6 people changed their code. 1 person changed nothing. 1 person changed to –O3. 3 people lossy in week 1. Everyone lossy in week 2!

Page 38: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Your programs!

Was there an improvement on compression between the two versions?

H0: No. Find sampling distribution of

difference in means, using permutations.

Page 39: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Student 1 (lossless week 1)

Page 40: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Compression < 1?

Page 41: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Student 2: worse compression

Page 42: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Compression < 1?

Page 43: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Student 3

Page 44: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Student 4 (lossless week 1)

Page 45: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Student 5 (lossless week 1)

Page 46: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Student 6

Page 47: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Student 7

Page 48: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Student 8

Page 49: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Homework Assignment 2

6 experiments:1. Does your program compress text or

images better?2. What about variance of compression?3. What about gzip’s compression?4. Variance of gzip’s compression?5. Was there a change in the

compression of your program from week 1 to week 2?

6. In the runtime?

Page 50: Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.

Remainder of the course

11/9: EDA 11/16: Regression and learning 11/23: Happy Thanksgiving! 11/30: Statistical debugging 12/7: Review, Q&A Saturday 12/17, 2-5pm: Exam