Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith.
Post on 26-Dec-2015
214 Views
Preview:
Transcript
Today
Review bootstrap estimate of se (from homework).
Review sign and permutation tests for paired samples.
Lots of examples of hypothesis tests.
Recall ...
There is a true value of the statistic. But we don’t know it.
We can compute the sample statistic.
We know sample means are normally distrubuted (as n gets big):
nˆ
n)x(se xx
But we don’t know anything about the distribution of other sample statistics (medians, correlations, etc.)!
Bootstrap world
unknown distribution F
observed random sample X
statistic of interest )X(sˆ
empirical distribution
bootstrap random sample X*
bootstrap replication *)X(s*ˆ
F̂
statistics about the estimate (e.g., standard error)
Bootstrap estimate of se
Run B bootstrap replicates, and compute the statistic each time:θ*[1], θ*[2], θ*[3], ..., θ*[B]
B
1i
2
B1B
*ˆ]i[*ˆ*ˆse
B
]i[*ˆ
*ˆ
B
1i
(mean of θ* across replications)
(sample standard deviation of θ* across replications)
Sign Test
H0: F and G have the same medianmedian(F) – median(G) = 0
Pr(x > y) = 0.5 sign(x – y) ~ binomial distribution compute bin(N+, 0.5)
N
Nn
5.0,nbinp
Example: gzip speed
build gzip with –O2 or with –O0
on about 650 filesout of 1000,gzip-O2 was faster
binomial distribution, p = 0.5, n = 1000p < 3 x 10-24
Permutation Test
H0: F = G Suppose difference in sample
means is d. How likely is this difference (or a
greater one) under H0? For i = 1 to P
Randomly permute each (xi, yi) Compute difference in sample means
Example: gzip speed
1000 permutations:difference ofsample meansunder H0 iscentered on 0
-1579 is veryextreme; p ≈ 0
Comparing speed is tricky!
It is very difficult to control for everything that could affect runtime.
Solution 1: do the best you can. Solution 2: many runs, and then
do ANOVA tests (or their nonparametric equivalents).
“Is there more variance between conditions than within conditions?”
Order effects
Well-known in psychology. What the subject does at time t
will affect what she does at time t+1.
Sign and Permutation Tests
median(F) median(G)
all distribution pairs (F, G) F G
permutation test rejects H0
Sign and Permutation Tests
median(F) median(G)
all distribution pairs (F, G) F G
permutation test rejects H0
sign test rejects H0
There are other tests!
We have chosen two that are nonparametric easy to implement
Others include: Wilcoxon Signed Rank Test Kruskal-Wallis (nonparametric
“ANOVA”)
Pre-increment?
Conventional wisdom:
“Better to use ++x than to use x++.”
Really, with a modern compiler?
Two (toy) programs
for(i = 0; i < (1 << 30); ++i)j = ++k;
for(i = 0; i < (1 << 30); i++)j = k++;
ran each 200 times (interleaved) mean runtimes were 2.835 and 2.735 significant well below .05
What?
leal -8(%ebp), %eaxincl (%eax)movl -8(%ebp), %eax
movl -8(%ebp), %eaxleal -8(%ebp), %edxincl (%edx) %edx is not used anywhere else
Pre-increment, take 2
Take gzip source code. Replace all post-increments with
pre-increments, in places where semantics won’t change.
Run on 1000 files, 10 times each. Compare average runtime by file.
Conclusion
Preincrementing is faster!
... but what about –O? sign test: p = 0.197 permutation test: p = 0.672
Preincrement matters without an optimizing compiler.
Your programs ...
8 students had a working program both weeks.
6 people changed their code. 1 person changed nothing. 1 person changed to –O3. 3 people lossy in week 1. Everyone lossy in week 2!
Your programs!
Was there an improvement on compression between the two versions?
H0: No. Find sampling distribution of
difference in means, using permutations.
Homework Assignment 2
6 experiments:1. Does your program compress text or
images better?2. What about variance of compression?3. What about gzip’s compression?4. Variance of gzip’s compression?5. Was there a change in the
compression of your program from week 1 to week 2?
6. In the runtime?
top related