-
Experiments and Observational Studies
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
In observational studies we obtain measurements on several
variables.
Sampling could be random or not. We observe what is in the
sample,
no manipulation of factors by any experimenter.
Factor levels may be chosen by hidden agendas.
It is not clear which variables have an effect on which other
variables
if we observe any correlations. Cause and Effect unclear.
There may be unmeasured factors that affect seemingly correlated
variables.
In a “controlled” experiment we control certain input variables
and
determine their effect on response variables.
We have to guard against subconscious effects when “controlling”
inputs.
=⇒ randomization!1
-
Steps in Designing of Experiments (DOE)
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
1. Be clear on the goal of the experiment. Which questions to
address?Set up hypotheses about treatment/factor effects, a
priori.Don’t go fishing afterwards! It can only point to future
experiments.
If you torture the data long enough, they will confess to
anything.
2. Understand the experimental units over which treatments will
be randomized.Where do they come from? How do they vary? Are they
well defined?
3. Define the appropriate response variable to be measured.
4. Define potential sources of response variationa) factors of
interest (to be manipulated)b) nuisance factors (to be
randomized)
5. Decide on treatment and blocking variables.
6. Define clearly the experimental process and what is
randomized.
2
-
Three Basic Principles in Experimental Design
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
Replication:repeat experimental runs under same values for
control variables.⇒ understanding inherent variability⇒ better
response estimate via averaging.
Repeat all variation aspects of an experimental run. Not just a
repeatmeasure of response, after all aspects of an experimental run
are done.
Randomization:Systematic Confounding between treatment and other
factors (hidden or not)unlikely. Removes sources of bias arising
from factor/unit interaction.Disperses biases randomly among all
units ⇒ error or background noise.Provides logical/probability
basis for inference about treatment effects.
Blocking:Effective when (natural within block
variation)/(between block variation) is small.Randomized treatment
assignment within ≈ homogeneous blocksTreatment effect more clearly
visible against lower within block variation.Separates variation
between blocks from treatment effect.
3
-
Flux ExperimentApplied Statistics and Experimental Design Fritz
Scholz — Fall 2006
18 boards are available for the experiment,not necessarily a
random sample from all boards (present, past and future).
Test flux brands X and Y: randomly assign 9 boards each to X
& Y (FLUX)
The boards are soldered and cleaned. Order randomized.
(SC.ORDER)
Then the boards are coated and cured to avoid handling
contamination.Order randomized. (CT.ORDER)
Then the boards are placed in a humidity chamber and measured
for SIR.Position in chamber randomized. (SLOT)
The randomization at the various process steps avoids unknown
biases.When in doubt, randomize!
Randomization of flux assignment gives us a mathematical
basisfor judging flux differences with respect to the response
SIR.
4
-
DOE Steps Recapitulated
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
1. Goal of the experiment. Answer question: Is Flux X different
from Flux Y?If not, we can use them interchangeably. One may be
cheaper than the other.Test null hypothesis H0: No difference in
fluxes.
2. Understand the experimental units:Boards with all processing
steps up to measuring response.
3. Define the appropriate response variable to be measured:
SIR
4. Define potential sources of response variationa) factors of
interest: flux typeb) nuisance factors: boards, processing steps,
testing.
5. Decide on treatment and blocking variables.Treatment = flux
type, no blocking.With 2 humidity chambers we might have wanted to
block on those.
6. Define clearly the experimental process and what is
randomized.Treatments and all nuisance factors are randomized.
5
-
Flux DataApplied Statistics and Experimental Design Fritz Scholz
— Fall 2006
BOARD FLUX SC.ORDER CT.ORDER SLOT SIR1 Y 13 14 5 8.62 Y 16 8 6
7.53 X 18 9 15 11.54 Y 11 11 11 10.65 X 15 18 9 11.66 X 9 15 18
10.37 X 6 1 16 10.18 Y 17 12 17 8.29 Y 5 10 13 10.010 Y 10 13 14
9.311 Y 14 5 10 11.512 X 12 17 12 9.013 X 4 7 3 10.714 X 8 6 1
9.915 Y 3 2 4 7.716 X 7 3 2 9.717 Y 1 16 8 8.818 X 2 4 7 12.6
see Flux.csv
or flux
6
-
Flux Experiment: First Boxplot Look at SIR DataApplied
Statistics and Experimental Design Fritz Scholz — Fall 2006
X Y
89
10
11
12
Flux
SIR
( l
og
10(O
hm
) )
FLUXY − FLUXX = −1.467
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
7
-
Flux Experiment: QQ-Plot of SIR DataApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
●
●
●
●
●
●
●
●
●
8 9 10 11 12
89
10
11
12
SIR with Flux X ( log10(Ohm) )
SIR
with
Flu
x Y
(
lo
g1
0(O
hm
) )
8
-
QQ-Plot of SIR Data (Higher Perspective?)Applied Statistics and
Experimental Design Fritz Scholz — Fall 2006
● ●●
●●●
●
●
●
0 5 10 15 20
05
10
15
20
SIR with Flux X ( log10(Ohm) )
SIR
with
Flu
x Y
(
lo
g1
0(O
hm
) )
9
-
Some QQ-Plots from N(0,1) Samples (m=9, n=9)Applied Statistics
and Experimental Design Fritz Scholz — Fall 2006
●
●● ●
●
●●●
●
y − x = −0.792
●
●
●●
● ●
●●
●y − x = 0.926
●
●●
● ●
●● ●
●
y − x = −0.57
●
● ●●●
●●
●
●y − x = −0.394
●●
●●
●
●●
●●
y − x = −0.62
●
●
● ●●
●●
●
●y − x = 0.115
●
●
●●●
●
●
● ●
y − x = −0.625
● ●
●
●●
● ● ●●
y − x = −1.12
●
●
●●
●
●●
●
●
y − x = 0.584
●
●
●●
●● ●●
●
y − x = −0.647
●
●
●●
● ●● ●
●y − x = 0.41
●●
●●●
●●
● ●
y − x = −1.33
● ●
● ●●
●●
●●
y − x = −0.667
●
●●
● ●
● ●
●
●y − x = −0.31
● ●
● ●● ●
● ● ●
y − x = −0.757
●
● ●● ●
●
●●
●y − x = 0.845
●
●
●
●
●
●
●●●y − x = 0.165
●
●
●
●
●●
● ●
●y − x = 1.07
●●●
●
●●
●●
●y − x = −0.194
●●
● ●
● ●
●● ●
y − x = −0.253
10
-
Is the Difference Ȳ − X̄ =−1.467 Significant?Applied Statistics
and Experimental Design Fritz Scholz — Fall 2006
In comparing SIR for the two fluxes let us focus on the
difference of meansFLUXY −FLUXX = Ȳ − X̄ .
If the use of flux X or flux Y made no difference then we should
have seenthe same results for these 18 boards, no matter which got
flux X or Y.X or Y is just an artificial “distinguishing” label
with no consequence.
For other random assignments of fluxes, or random splittings of
18 boardsinto two groups of 9 & 9, we would have seen other
differences of means.
There are(18
9)= 48620 such possible splits. For each split we could obtain
Ȳ − X̄ .
Need the reference distribution of Ȳ − X̄ for all 48620 splits
to judge how unusuala random split we had when we got Ȳ − X̄ =
−1.467. It was based on a randomsplit by our randomization, i.e.,
it is one of the 48620 equally likely ones.
11
-
Some Randomization Examples of Ȳ − X̄Applied Statistics and
Experimental Design Fritz Scholz — Fall 2006
8.6 8.6 8.6 8.6 8.6 8.6 8.67.5 7.5 7.5 7.5 7.5 7.5 7.511.5 11.5
11.5 11.5 11.5 11.5 11.510.6 10.6 10.6 10.6 10.6 10.6 10.611.6 11.6
11.6 11.6 11.6 11.6 11.610.3 10.3 10.3 10.3 10.3 10.3 10.310.1 10.1
10.1 10.1 10.1 10.1 10.18.2 8.2 8.2 8.2 8.2 8.2 8.210 10 10 10 10
10 109.3 9.3 9.3 9.3 9.3 9.3 9.311.5 11.5 11.5 11.5 11.5 11.5
11.5
9 9 9 9 9 9 910.7 10.7 10.7 10.7 10.7 10.7 10.79.9 9.9 9.9 9.9
9.9 9.9 9.97.7 7.7 7.7 7.7 7.7 7.7 7.79.7 9.7 9.7 9.7 9.7 9.7
9.78.8 8.8 8.8 8.8 8.8 8.8 8.812.6 12.6 12.6 12.6 12.6 12.6
12.6
Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄1.1778 0.4222 -0.0889
-0.4000 0.5778 0.7778 0.2000
12
-
Reference Distribution of Ȳ − X̄Applied Statistics and
Experimental Design Fritz Scholz — Fall 2006
Compute Ȳ − X̄ for each of the 48620 possible splits and
determine how unusualthe observed difference of −1.467 is.
This seems like a lot of computing work but it takes just a few
seconds in R usingthe function combn of the package combinat.
Download and install that package first from the contributed
packages in CRAN orfrom R packages under STAT 421 site and invoke
library(combinat) prior tousing combn.
randomization.ref.dist=combn(1:18,9,fun=mean.fun,y=SIR)gives the
vector of all 48620 such average differences Ȳ − X̄ ,
wheremean.fun
-
p-Value of Ȳ − X̄ =−1.467Applied Statistics and Experimental
Design Fritz Scholz — Fall 2006
The function combn goes through all choices of index
combinations of 9 values
taken from 1:18 (referred to as ind in mean.fun).
For each such index combination it evaluates the mean Ȳ of the
SIR values forthose chosen indices and the mean X̄ of the remaining
SIR values.It then takes the difference Ȳ − X̄ and outputs all
these differences as a vector.
We find a (two-sided) p-value of .02344 for our observed Ȳ − X̄
=−1.467, i.e.
mean(abs(randomization.ref.dist)>=1.467)=.02344
That is the probability of seeing a |Ȳ − X̄ | value as or more
extreme than theobserved |ȳ− x̄|= 1.467, when in fact the
hypothesis H0 holds true, i.e., under therandomization reference
distribution.
Randomization of fluxes is the logical basis for any such
probability statements,
i.e., calculation of p-values!14
-
Randomization Reference Distribution of Ȳ − X̄Applied
Statistics and Experimental Design Fritz Scholz — Fall 2006
Randomization Reference Distribution of SIRY − SIRX
Y − X = SIRY − SIRX
De
nsi
ty
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
0.5
0.6
P(Y
−X
≤−
1.4
67)=
0.0
11
72 P(Y
−X
≥1
.46
7)=
0.0
11
72
15
-
The p-Value: What it is not!Applied Statistics and Experimental
Design Fritz Scholz — Fall 2006
The p-value based on some sample or experimental data
is not the probability that the hypothesis is true.
The hypothesis is not the outcome of some chance experiment ⇒ no
probability!
The calculation of the p-value assumes that the hypothesis is
true!
It is doubly hypothetical!
The calculated chance is that of seeing stronger “contradictory
evidence” against
the assumed hypothesis than what was obtained in the observed
sample/experiment.
“Contradictory evidence”⇔ a test statistic that measures strong
discrepancy to H0.
p-values vary from sample to sample, tend to be uniformly
distributed under H0.
A small p-value makes H0 implausible and some alternative more
attractive.16
-
Approximation to Randomization Reference Distribution
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
For moderate to large m and n the number of combinations(m+n
m)
becomes solarge that it taxes the computing power or storage
capacity of the average computer.
A simple way out is to generate a sufficiently large sample, say
M = 10,000 orM = 100,000 of combinations from this set of all
(m+nm)
combinations.
Compute the statistic of interest, s(X i,Y i) = Ȳi− X̄i, i = 1,
. . . ,M for eachsampled combination and approximate the
randomization reference distribution
F(z) = P(s(X ,Y )≤ z) by F̂M(z)
where F̂M(z) is the proportion of s(X i,Y i) = Ȳi− X̄i values
that are ≤ z.
By the law of large numbers (LLN) we have for any z
F̂M(z)−→ F(z) as M → ∞ i.e. F̂M(z)≈ F(z) for large M.
17
-
Sample Simulation ProgramApplied Statistics and Experimental
Design Fritz Scholz — Fall 2006
This can be done in a loop using the sample function in R.
simulated.reference.distribution=function(M=10000){
D.star=NULL
for(i in 1:M){
SIR.star=sample(SIR)
D.star=c(D.star,mean(SIR.star[1:9])-mean(SIR.star[10:18]))
}
D.star}
The following slide shows the QQ-plot comparison with the full
randomization
reference distribution, together with the respective
p-values.
This approach should suffice for practical purposes.
18
-
QQ-Plot of Ȳ − X̄ for Simulated & Full Randomization
Reference Distribution
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
−2 −1 0 1 2
−2
−1
01
2
y − x for all combinations
y−
x fo
r a
ll 1
00
00
sa
mp
led
co
mb
ina
tion
s
p̂1 = 0.0099
p̂2 = 0.0117
p1 = 0.01172p2 = 0.01172
19
-
Randomization Distribution of 2-Sample t-TestApplied Statistics
and Experimental Design Fritz Scholz — Fall 2006
t(X ,Y ) =(Ȳ − X̄)/
√1/n+1/m√
[∑ni=1(Yi− Ȳ )2 +∑mj=1(X j− X̄)
2]/(m+n−2)
it expresses the difference in averages relative to a measure of
sample variability.
The randomization reference distribution of the t(X ,Y ) values
is in one-to-one
correspondence to the randomization reference distribution of
the Ȳ − X̄ values.
Theory =⇒ The randomization reference distribution of t(X ,Y )
is very well
approximated by a t-distribution with 16 = 18−1−1 degrees of
freedom.
The test based on t(X ,Y ) and its t-distribution under H0 also
shows up
in a normal population based approach to this problem.
20
-
QQ-Plot of t(X ,Y ) Randomization Reference DistributionApplied
Statistics and Experimental Design Fritz Scholz — Fall 2006
−6 −4 −2 0 2 4 6
−6
−4
−2
02
46
t16 quantiles
ord
ere
d r
an
do
miz
atio
n t
−st
atit
ics
21
-
t-Approximation for t(X ,Y ) Randomization Reference
DistributionApplied Statistics and Experimental Design Fritz Scholz
— Fall 2006
2−sample t−statistic randomization reference distribution
t(X,Y)−statistic
De
nsi
ty
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
t16 density
22
-
The Randomization Test
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
We have obtained the full or simulated randomization reference
distribution.
Thus any extreme value of |Ȳ − X̄ | could either come about due
to a rare chanceevent during our randomization step or due to H0
actually being wrong.
We have to make a decision: Reject H0 or not?
We may decide to reject H0 when |Ȳ − X̄ | ≥C, where C is some
critical value.
To determine C one usually sets a significance level α which
limits the probabilityof rejecting H0 when in fact H0 is true (Type
I error). The requirement
α = P(reject H0 | H0) = P(|Ȳ − X̄ | ≥C | H0) then determines C
= Cα .
23
-
Significance Levels and p-Values
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
When we reject H0 we would say that the results were
significantat the (previously chosen) level α.
Commonly used values of α are α = .05 or α = .01.
Rejecting at smaller α than these would be even stronger
evidence against H0.
Our chance of making a wrong decision (rejecting H0 when true)
would be smaller.
For how small an α would we still have rejected?
This leads us to the observed significance level or p-value of
the test for the givendata, i.e., for the observed discrepancy
value |ȳ− x̄|
p-value = P(|Ȳ − X̄ | ≥ |ȳ− x̄| | H0)
24
-
How to Determine the p-Value
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
We have stated p-values obtained from the full and the simulated
(M=10000)reference distributions. How are they obtained?
Note the following:
> x=1:10> x>3[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE
> sum(x>3)[1] 7> mean(x>3)[1] 0.7
Note that x>3 produced a logic vector with same length as
x.
The logic values FALSE and TRUE are also interpreted numerically
as 0 and 1,respectively, in arithmetic expressions.
25
-
How to Determine the p-Value (continued)
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
We view the reference distribution as a vector x of numbers for
all the differencesof means, Ȳ − X̄ , obtained either for all
48620 possible splits or for the M = 10000simulated splits.
mean(x=1.467) would give us the respective p-valuesp1 = .01172
and p2 = .01172 for the full reference distribution,and p̂1 = .0099
and p̂2 = .0117 for the simulated reference distribution.
The simulated distribution is obviously not quite symmetric.
Rather than adding these 2 p-values to get a 2-sided p-value we
can also do thisdirectly via mean(abs(x)>= 1.467)=.02344 for all
48620 splits or
mean(abs(x)>= 1.467)=.00216 for the M = 10000 simulated
splits.
Here abs(x) gives the vector of absolute values of all
components in x.
26
-
How to Determine the Critical Value C.crit for the Level α
TestApplied Statistics and Experimental Design Fritz Scholz — Fall
2006
For α = .05 we want to find C.crit such that
mean(abs(x)>=C.crit)=.05.
Equivalently, find the .95-quantile of abs(x) via
C.crit=quantile(abs(x),.95).
From the full reference distribution we get C.crit(α = .05) =
1.288889and C.crit(α = .01) = 1.644444.
From the simulated reference distribution we get C.crit(α = .05)
= 1.311111and C.crit(α = .01) = 1.666667.
27
-
What Does the t-Distribution Give Us?
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
What does the observed t-statistic t(x,y) =−2.513 give as
2-sided p-value?
We find P(|t(X ,Y )| ≥ 2.513) = 2∗ (1−pt(2.513,16)) = .02306,
pretty closeto the .02344 from the full randomization reference
distribution.
What are the critical values tcrit(α) for |t(X ,Y )| for level α
= .05, .01 tests?
We find tcrit(α = .05) = qt(.975,16) = 2.1199 and
tcrit(α = .01) = qt(.995,16) = 2.9208, respectively.
With |t(x,y)|= 2.513 we would reject H0 at α = .05 since
|t(x,y)| ≥ 2.1199
but not at α = .01 since |t(x,y)|< 2.9208.
28
-
Hypothesis TestingApplied Statistics and Experimental Design
Fritz Scholz — Fall 2006
We have addressed the question: Does the type of flux affect
SIR?
Formally we have tested the
null hypothesis H0: The type of flux does not affect SIRagainst
the
alternative hypothesis H1: The type of flux does affect SIR.
While H0 seems fairly specific, H1 is open ended. H1 can be
anything but H0.
There may be many ways for SIR to be affected by flux
differences,
e.g., change in mean, median, or scatter.
Such differences may show up in data Z through an appropriate
test statistic s(Z).
Here Z = (X1, . . . ,X9,Y1, . . . ,Y9).
29
-
Test Criteria or Test StatisticsApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
In the flux analysis we chose to use the absolute difference of
sample means,
s(Z) = |Ȳ − X̄ |, as our test criterion or test statistic for
testing the null hypothesis.
A test statistic is a value calculated from data and other known
entities,
e.g., assumed parameter values.
We could have worked with the absolute difference in sample
medians or with the
ratio of sample standard deviations and compared that ratio with
1, etc.
Different test statistics are sensitive to different deviations
from the null hypothesis.
A test statistic, when viewed as a function of random input
data, is itself a random
variable, and has a distribution, its sampling distribution.
30
-
Sampling DistributionsApplied Statistics and Experimental Design
Fritz Scholz — Fall 2006
For a test statistic s(Z) to be effective in deciding between H0
and H1 it is desirablethat the sampling distributions of s(Z) under
H0 and H1 are somewhat different.
Sampling Distribution under H0
rela
tive
frequ
ency
90 95 100 105 110 115 120
0.00
0.10
0.20
Sampling Distribution under H1
rela
tive
frequ
ency
90 95 100 105 110 115 120
0.00
0.05
0.10
0.15
0.20
0.25
31
-
When to Reject H0Applied Statistics and Experimental Design
Fritz Scholz — Fall 2006
The previous illustration shows a specific sampling distribution
for s(Z) under H1.
Typically H1 consists of many different possible distributional
models leading tomany possible sampling distributions under H1.
Under H0 we often have just a single sampling distribution, the
null distribution.
If under H1 the test statistics s(Z) tends to have mostly higher
values than underH0, we would want to reject H0 when s(Z) is
large.
How large is too large? Need a critical value Ccrit and reject
H0 when s(Z)≥Ccrit.
Choose Ccrit such that P(s(Z)≥Ccrit|H0) = α, a pre-chosen
significance level.Typically α = .05 or .01. It is the probability
of the type I error.
The previous illustration also shows that there may be values
s(Z) in the overlapof both distributions. Decisions are not clear
cut =⇒ type I or type II error
32
-
Decision TableApplied Statistics and Experimental Design Fritz
Scholz — Fall 2006
Truth
Decision H0 is true H0 is false
accept H0 correct decision type II error
reject H0 type I error correct decision
Testing hypotheses (like estimation) is a branch of a more
general concept,
namely decision theory. Decisions are optimized with respect to
penalties
for wrong decisions, i.e., P(Type I Error) and P(Type I Error),
or
the mean squared error of an estimate θ̂ of θ, namely
E((θ̂−θ)2).33
-
The Null Distribution and Critical ValuesApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
Sampling Distribution under H0
rela
tive
freq
uenc
y
90 95 100 105 110 115 120
0.00
0.10
0.20
0.30
reject H0accept H0
type I error
critical value = 104.9
significance level α = 0.05
Sampling Distribution under H1
rela
tive
freq
uenc
y
90 95 100 105 110 115 120
0.00
0.10
0.20
reject H0accept H0
type II error
34
-
Critical Values and p-ValuesApplied Statistics and Experimental
Design Fritz Scholz — Fall 2006
Note that p-value(s(z)) ≤ α is equivalent to rejecting H0 at
level α.
Sampling Distribution under H0
rela
tive
frequ
ency
90 95 100 105 110 115 120
0.00
0.10
0.20
observed value 107.1
p−value = 0.0097
critical value = 104.9
significance level α = 0.05
Sampling Distribution under H1
rela
tive
frequ
ency
90 95 100 105 110 115 120
0.00
0.10
0.20
reject H0accept H0
type II error
35
-
p-Values and Significance LevelsApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
We just saw that knowing the p-value allows us to accept or
reject H0 at level α.
However, the p-value is more informative than saying that we
reject at level α.
It is the smallest level α at which we would still have rejected
H0.
It is also called the observed significance level.
Working with predefined α made it possible to choose the best
level α test.Best: Having highest probability of rejecting H0 when
H1 is true.
This makes for nice mathematical theory, but p-values should be
the preferred way
of judging and reporting test results.
36
-
Randomization Reference Distribution of Ȳ − X̄Applied
Statistics and Experimental Design Fritz Scholz — Fall 2006
Randomization Reference Distribution of SIRY − SIRX
Y − X = SIRY − SIRX
De
nsi
ty
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Dcrit =
−1
.28
9
Dcrit =
1.2
89
ob
serv
ed
D
= Y
−X
=−
1.4
67
α = 0.05
37
-
The Power FunctionApplied Statistics and Experimental Design
Fritz Scholz — Fall 2006
The probability of rejecting H0 is denoted by β. It is a
function of the distributionalmodel F governing Z, i.e., β = β(F).
It is called the power function of the test.
When the hypothesis H0 is composite and when s(Z) has more than
one possi-ble distribution under H0 one defines the highest
probability of type I error as thesignificance level of the test.
Hence α = sup{β(F) : F ∈ H0}.
For various F ∈ H1 the power function gives us the corresponding
probabilities oftype II error as 1−β(F).
Montgomery unfortunately uses β = β(F) as the symbol for the
probability oftype II error. This is not standard.
38
-
Samples and PopulationsApplied Statistics and Experimental
Design Fritz Scholz — Fall 2006
So far we have covered inference based on a randomization test.
This relied heavily
on our randomized assignment of flux X and flux Y to the 18
circuit boards.
Such inference can logically only say something about flux
differences
in the context of those 18 boards.
To generalize any conclusions to other boards would require some
assumptions,
judgement, and ultimately a step of faith.
Namely, assume that these 18 boards and their processing
represent a
representative sample from a conceptual population of such
processed boards.
For samples to be representative they should be random
samples.
39
-
Conceptual PopulationsApplied Statistics and Experimental Design
Fritz Scholz — Fall 2006
Clearly the 18 boards happened to be available at the time of
the experiment.
They could have been a random sample of all boards available at
the time.
However, they also may have been taken sequentially in the order
of production.
They certainly could not be a sample from future boards, yet to
be produced.
The processing aspects were to some extent made to look like a
random sampleby the various randomization steps.
Thus we could regard the 9+9 SIR values as two random samples
from twovery large or infinite conceptual populations of SIR
values.
2 populations: all potential boards/processes with flux X or all
the same boards/processeswith flux Y. Can’t have it both ways ⇒
further conceptualization.
40
-
Population Distributions and DensitiesApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
Such infinite populations of Z-values are conveniently described
by densities f (z),with the properties f (z)≥ 0 and
R ∞−∞ f (z)dz = 1.
The probability of observing a randomly chosen element Z that is
≤ to somespecified value x is then given by
F(x) = P(Z ≤ x) =Z x−∞
f (z)dz =Z x−∞
f (t)dt z & t are just dummy variables
F(x) as a function of x is also called the cumulative
distribution function (CDF)of the random variable Z.
F(x)↗ from 0 to 1 as x goes from −∞ to ∞.
41
-
Means, Expectations and VariancesApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
The mean or expectation of Z or its population is defined by
µ = µZ = E(Z) =Z ∞−∞
z f (z)dz≈∑z f (z)∆(z) = ∑zp(z)a probability weighted average of
z values.
It is the center of probability mass balance.
By extension the mean or expectation of g(Z) is defined by
E(g(Z)) =Z ∞−∞
g(z) f (z)dz
The variance of Z is defined by
σ2 = var(Z) = E((Z−µ)2
)=
Z ∞−∞
(z−µ)2 f (z)dz
σ = σZ =√
var(Z) is called the standard deviation of Z or its
population.
Measure of distribution spread.
42
-
Multivariate Densities or PopulationsApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
f (z1, . . . ,zn) is a multivariate density if it has the
following properties:
f (z1, . . . ,zn)≥ 0 for all z1, . . . ,zn andZ ∞−∞
. . .Z ∞−∞
f (z1, . . . ,zn) dz1, . . . ,dzn = 1 .
It describes the behavior of the infinite population of such
n-tuples (z1, . . . ,zn).
A random element (Z1, . . . ,Zn) drawn from such a population is
a random vector.
We say that Z1, . . . ,Zn in such a random vector are
(statistically) independent whenthe following property holds:
f (z1, . . . ,zn) = f1(z1)×·· ·× fn(zn)
Here fi(zi) is the marginal density of Zi. It is obtainable from
the multivariate densityby integrating out all other variables,
e.g.,
f2(z2) =Z ∞−∞
. . .Z ∞−∞
f (z1,z2,z3, . . . ,zn) dz1dz3 . . .dzn .
43
-
Random SampleApplied Statistics and Experimental Design Fritz
Scholz — Fall 2006
When drawing repeatedly values Z1, . . . ,Zn from a common
infinite population withdensity f (z) we get a multivariate random
vector (Z1, . . . ,Zn).
If the drawings are physically unrelated or “independent,” we
may consider Z1, . . . ,Znas statistically independent, i.e., the
random vector has density
h(z1, . . . ,zn) = f (z1)×·· ·× f (zn) .
Z1, . . . ,Zn is then also referred to as a random sample.
We also express this as Z1, . . . ,Zni.i.d.∼ f .
Here i.i.d. = independent and identically distributed.
44
-
Rules of Expectations & Variances (Review)Applied Statistics
and Experimental Design Fritz Scholz — Fall 2006
For any set of random variables X1, . . . ,Xn and constants
a0,a1, . . . ,an we have
E (a0 +a1×X1 + . . .+an×Xn) = a0 +a1×E(X1)+ . . .+an×E(Xn)
provided the expectations E(X1), . . . ,E(Xn) exist and are
finite.
This holds whether X1, . . . ,Xn are independent or not.
For any set of independent random variables X1, . . . ,Xn and
constants a0,a1, . . . ,anwe have
var(a0 +a1×X1 + . . .+an×Xn) = a21× var(X1)+ . .
.+a2n×var(Xn)
provided the variances var(X1), . . . ,var(Xn) exist and are
finite. var(a0) = 0.
This is also true under the weaker (than independence) condition
cov(Xi,X j) =E(XiX j)−E(Xi)E(X j) = 0 for i 6= j. In that case X1,
. . . ,Xn are uncorrelated.
45
-
Rules for AveragesApplied Statistics and Experimental Design
Fritz Scholz — Fall 2006
E (X̄) = E
(1n
n
∑i=1
Xi
)=
1n
E
(n
∑i=1
Xi
)=
1n
n
∑i=1
E(Xi) =1n
n
∑i=1
µi = µ̄
whether X1, . . . ,Xn are independent or not.
If µ1 = . . . = µn = µ then E(X̄) = µ.
If X1, . . . ,Xn are independent we also have
var(X̄) = var
(1n
n
∑i=1
Xi
)=
1n2
n
∑i=1
var(Xi) =1n2
n
∑i=1
σ2i =1n
σ̄2 ↘ 0 as n→ ∞ ,
where σ̄2 =1n
n
∑i=1
σ2i . σ̄2 = σ2 when σ21 = . . . = σ
2n = σ
2 .
46
-
A Normal Random SampleApplied Statistics and Experimental Design
Fritz Scholz — Fall 2006
X1, . . . ,Xn is called a normal random sample when the common
density of the Xi isa normal density of the following form:
f (x) =1√2πσ
exp
(−(x−µ)
2
2σ2
)
This density or population has mean µ and standard deviation
σ.
When µ = 0 and σ = 1 one calls it the standard normal
density
ϕ(x) =1√2π
exp
(−x
2
2
)with CDF Φ(x) =
Z x−∞
ϕ(z) dz .
If X ∼N (µ,σ2) then (X−µ)/σ∼N (0,1).
⇒ P(X ≤ x) = P((X−µ)/σ≤ (x−µ)/σ) = Φ((x−µ)/σ).
47
-
The CLT & the Normal Population ModelApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
The normal population model is motivated by the Central Limit
Theorem (CLT).
This comes about because many physical or natural measured
phenomena can beviewed as the addition of several independent
source inputs or factors.
Y = X1 + . . .+Xk or Y = a0 +a1X1 + . . .+akXk
for constants a0,a1, . . . ,ak.
More generally but also approximately extend this via a 1-term
Taylor expansion
Y = g(X1, . . . ,Xk) ≈ g(µ1, . . . ,µk)+k
∑i=1
(Xi−µi)∂g(µ1, . . . ,µk)
∂µi= a0 +a1X1 + . . .+akXk
provided the linearization provides a good approximation to
g.48
-
Central Limit Theorem (CLT) IApplied Statistics and Experimental
Design Fritz Scholz — Fall 2006
• Suppose we randomly and independently draw random variables
X1, . . . ,Xnfrom n possibly different populations with respective
means µ1, . . . ,µn andstandard deviations σ1, . . . ,σn
• Suppose further that
max
(σ2i
σ21 + . . .+σ2n
)→ 0 , as n→ ∞
i.e., none of the variances dominates among all variances
• Then Yn = X1 + . . . + Xn has an approximate normal
distribution with meanand variance given by
µY = µ1 + . . .+µn and σ2Y = σ21 + . . .+σ
2n .
49
-
Central Limit Theorem (CLT) IIApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
standard normal population
x1
Den
sity
−2 0 2 4
0.0
0.2
0.4
uniform population on (0,1)
x2
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
1.2
a log−normal population
x3
Den
sity
0.0 0.5 1.0 1.5
01
23
45
Weibull population
x4
Den
sity
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
0.0
0.2
0.4
0.6
0.8
50
-
Central Limit Theorem (CLT) IIIApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
Central Limit Theorem at Work
x1 + x2 + x3 + x4
Den
sity
−2 0 2 4 6
0.00
0.05
0.10
0.15
0.20
0.25
0.30
51
-
Central Limit Theorem (CLT) IVApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
standard normal population
x1
Den
sity
−4 −2 0 2 4
0.0
0.2
0.4
uniform population on (0,1)
x2
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.6
1.2
a log−normal population
x3
Den
sity
0.0 0.5 1.0 1.5
02
4
Weibull population
x4
Den
sity
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.4
0.8
Weibull population
x5
Den
sity
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
0.0
0.4
0.8
52
-
Central Limit Theorem (CLT) VApplied Statistics and Experimental
Design Fritz Scholz — Fall 2006
Central Limit Theorem at Work
x1 + x2 + x3 + x4
Den
sity
−2 0 2 4 6
0.00
0.15
0.30
Central Limit Theorem at Work
x2 + x3 + x4 + x5
Den
sity
1 2 3 4 5 6 7
0.0
0.2
0.4
53
-
Central Limit Theorem (CLT) VIApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
standard normal population
x1
Den
sity
−4 −2 0 2 4
0.0
0.2
0.4
uniform population on (0,1)
x2
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
1.2
a log−normal population
x3
Den
sity
0 5 10 15
0.0
0.2
0.4
Weibull population
x4
Den
sity
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.4
0.8
54
-
Central Limit Theorem (CLT) VIIApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
Central Limit Theorem at Work (not so good)
x1 + x2 + x3 + x4
Den
sity
0 10 20 30 40
0.00
0.05
0.10
0.15
0.20
55
-
Central Limit Theorem (CLT) VIIIApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
standard normal population
x1
Den
sity
−4 −2 0 2 4
0.0
0.2
0.4
uniform population on (0,1)
x2
Den
sity
0 5 10 15 20
0.00
0.02
0.04
0.06
a log−normal population
x3
Den
sity
0.0 0.5 1.0 1.5
01
23
45
Weibull population
x4
Den
sity
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
0.0
0.4
0.8
56
-
Central Limit Theorem (CLT) IXApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
Central Limit Theorem at Work (not so good)
x1 + x2 + x3 + x4
Den
sity
−20 −10 0 10 20 30 40
0.00
0.02
0.04
0.06
0.08
57
-
Derived Distributions from Normal ModelApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
Since the normal model will be our assumed model throughout
it is worthwhile to characterize some distributions that are
derived from it.
They will play a significant role later on.
The chi-square distribution, the Student t-distribution, and the
F-distribution.
These distributions come about as sampling distributions of
certain test statistics
based on normal random samples.
58
-
Properties of Normal Random VariablesApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
Assume that X1, . . . ,Xn are independent normal random
variables with respectivemeans and variances given by: µ1, . . .
,µn and σ21, . . . ,σ
2n. Then
Y = X1 + . . .+Xn ∼N (µ1 + . . .+µn,σ21 + . . .+σ2n)
If X ∼N (µ,σ2) thenX−µ
σ∼N (0,1)
or more generally for constants a and b
a+bX ∼N (a+bµ,b2σ2)
Caution: Some people write X ∼N (µ,σ) when I would write X ∼N
(µ,σ2).
59
-
The Chi-Square DistributionApplied Statistics and Experimental
Design Fritz Scholz — Fall 2006
When Z1, . . . ,Z fi.i.d.∼ N (0,1) we say that
C f =f
∑i=1
Z2i memorize this definition!
has a chi-square distribution with f degrees of freedom, we also
write C f ∼ χ2f .
It has mean f and variance 2 f , worth memorizing.
Density, CDF, quantiles,and random samples of or from the
chi-square distribution
can be obtained in R via: dchisq(x,f), pchisq(x,f), qchisq(p,f),
rchisq(N,f),
respectively.
If C f1 ∼ χ2f1
and C f2 ∼ χ2f2
are independent then C f1 +C f2 ∼ χ2f1+ f2
.
Why? Think definition!
60
-
χ2 DensitiesApplied Statistics and Experimental Design Fritz
Scholz — Fall 2006
0 5 10 15 20 25 30
0.0
0.1
0.2
0.3
0.4
0.5
s
de
nsity
df=1df=2df=5df=10df=20
61
-
The Student t-DistributionApplied Statistics and Experimental
Design Fritz Scholz — Fall 2006
When Z ∼N (0,1) is independent of C f ∼ χ2f we say that
t =Z√
C f / fmemorize this definition!
has a Student t-distribution with f degrees of freedom. We also
write t ∼ t f .
It has mean 0 (for f > 1) and variance f /( f −2) if f >
2.
For large f (say f ≥ 30) the t-distribution is approximately
standard normal.
Density, CDF, quantiles, and random samples of or from the
Student t-distributioncan be obtained in R via: dt(x,f),pt(x,f),
qt(p,f), and rt(N,f) respectively.
62
-
Densities of the Student t-DistributionApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
t
de
nsity
df=1df=2df=5df=10df=20df=30df= ∞
63
-
The Noncentral Student t-DistributionApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
When X ∼N (δ,1) is independent of C f ∼ χ2f we say that
t =X√
C f / fmemorize this definition!
has a noncentral Student t-distribution with f degrees of
freedom and noncentralityparameter ncp =δ. We also write t ∼ t f
,δ.
Density and CDF of the noncentral Student t-distribution can be
obtained in R via:dt(x,f,ncp) and pt(x,f,ncp), respectively.
The corresponding quantile function qnct(p,f,ncp) can be
downloaded from myweb site for use in R.
Random samples from t f ,δ:
(rnorm(N)+ncp)/sqrt(rchisq(N,f)/f)
64
-
Densities of the Noncentral Student t-Distribution
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
−5 0 5 10 15
0.0
0.1
0.2
0.3
t
de
nsity
df = 6, nct = 0df = 6, nct = 1df = 6, nct = 2df = 6, nct = 4
These densities march to the left for negative ncp.
65
-
The F-DistributionApplied Statistics and Experimental Design
Fritz Scholz — Fall 2006
When C f1 ∼ χ2f1
and C f2 ∼ χ2f2
are independent χ2 random variables with f1 andf2 degrees of
freedom, respectively, we say that
F =C f1/ f1C f2/ f2
memorize this definition!
has an F distribution with f1 and f2 degrees of freedom. We also
write F ∼ Ff1, f2.
Density, CDF, quantiles,and random samples of or from the Ff1,
f2-distribution canbe obtained in R via: df(x,f1,f2), pf(x,f1,f2),
qf(p,f1,f2), rf(N,f1,f2),respectively.
If t ∼ t f then t2∼F1, f . Why? Because t2 = Z2/(C f / f )=
(C1/1)/(C f / f )with the required independence of C1 and C f .
Also 1/F ∼ Ff2, f1. Just look at the above definition!66
-
F DensitiesApplied Statistics and Experimental Design Fritz
Scholz — Fall 2006
0 1 2 3 4 5
0.0
0.5
1.0
1.5
F
de
nsity
df1 = 1 , df2 = 3df1 = 2 , df2 = 5df1 = 5 , df2 = 5df1 = 10 ,
df2 = 20df1 = 20 , df2 = 20df1 = 50 , df2 = 100
67
-
Decomposition of Sum of Squares (SS)Applied Statistics and
Experimental Design Fritz Scholz — Fall 2006
We illustrate here an early example of the SS decomposition.
n
∑i=1
X2i =n
∑i=1
(Xi− X̄ + X̄)2 with X̄ =n
∑i=1
Xi/n
=n
∑i=1
(Xi− X̄)2 +2n
∑i=1
(Xi− X̄)X̄ +nX̄2
=n
∑i=1
(Xi− X̄)2 +nX̄2 .
We used the fact that ∑(Xi− X̄) = ∑Xi−nX̄ = ∑Xi−∑Xi = 0,
i.e., the residuals sum to zero.
Such decompositions are a recurring theme in the Analysis of
Variance (ANOVA).
68
-
Distribution of X̄ and ∑(Xi− X̄)2Applied Statistics and
Experimental Design Fritz Scholz — Fall 2006
Assume that (X1, . . . ,Xn)i.i.d.∼ N (µ,σ2). Then X̄ ∼N (µ,σ2/n)
and
n
∑i=1
(Xi− X̄)2 has the same distribution as σ2Cn−1 where Cn−1 ∼ χ2n−1
.
We also express this with the symbol ∼ asn
∑i=1
(Xi− X̄)2 ∼ σ2Cn−1 or∑ni=1(Xi− X̄)
2
σ2∼Cn−1 .
Furthern
∑i=1
(Xi− X̄)2 and X̄ are statistically independent ,
in spite of the fact that X̄ appears in both expressions.69
-
One-Sample t-TestApplied Statistics and Experimental Design
Fritz Scholz — Fall 2006
Assume that X = (X1, . . . ,Xn)i.i.d.∼ N (µ,σ2).
We want to test the hypothesis H0 : µ = µ0 against the
alternatives H1 : µ 6= µ0.σ is left unspecified and is unknown.
X̄ is a good indicator for µ since its mean is µ and its
variance is σ2(X̄) = σ2/n.
Thus a reasonable test statistic may be X̄ −µ0 ∼N (µ−µ0,σ2/n) =
N (0,σ2/n)when H0 is true. Unfortunately we do not know σ.
√n(X̄−µ0)/σ = (X̄−µ0)/(σ/
√n)∼N (0,1) suggests replacing the unknown σ
by suitable estimate to get a single reference distribution
under H0.
From the previous slide: ⇒ s2 = ∑ni=1(Xi− X̄)2/(n−1)∼
σ2Cn−1/(n−1)
is independent of X̄ . Note E(s2) = σ2, i.e., s2 is an unbiased
estimate of σ2.70
-
One-Sample t-StatisticApplied Statistics and Experimental Design
Fritz Scholz — Fall 2006
Replacing σ by s in the standardization√
n(X̄−µ0)/σ ⇒ one-sample t-statistic
t(X)=√
n(X̄−µ0)s
=√
n(X̄−µ0)/σ√s2/σ2
=√
n(X̄−µ0)/σ√Cn−1/(n−1)
=Z√
Cn−1/(n−1)∼ tn−1
since under H0 we have that Z =√
n(X̄−µ0)/σ∼N (0,1) and Cn−1 ∼ χ2n−1are independent of each
other. We thus satisfy the definition of the t-distribution.
Hence we can use t(X) in conjunction with the known reference
distribution tn−1under H0 and reject H0 for large values of
|t(X)|.
The 2-sided level α test has critical value tcrit = tn−1,1−α/2 =
qt(1−α/2,n−1)and we reject H0 when |t(X)| ≥ tcrit.
The 2-sided p-value for the observed t-statistic tobs(x)
isP(|tn−1| ≥ |tobs(x)|) = 2P(tn−1 ≤−|tobs(x)|) =
2∗pt(−|tobs(x)|,n−1).
71
-
The t.test in RApplied Statistics and Experimental Design Fritz
Scholz — Fall 2006
R has a function, t.test, that performs 1- and 2-sample
t-tests.
See ?t.test for documentation. We focus here on the 1-sample
test.
> t.test(rnorm(20)+.4)
One Sample t-test
data: rnorm(20) + 0.4
t = 2.2076, df = 19, p-value = 0.03976
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.02248992 0.84390488
sample estimates:
mean of x
0.433197472
-
Calculation of the Power Function of Two-Sided t-Test
Applied Statistics and Experimental Design Fritz Scholz — Fall
2006
The power function of this two-sided t-test is given by
β(µ,σ)= P(|t| ≥ tcrit)= P(t ≤−tcrit)+P(t ≥ tcrit)= P(t
≤−tcrit)+1−P(t < tcrit)
t = t(X) =√
n(X̄−µ0)s
=√
n(X̄−µ+(µ−µ0))/σs/σ
=√
n(X̄−µ)/σ+√
n(µ−µ0)/σs/σ
=Z +δ√
Cn−1/(n−1)∼ tn−1,δ
noncentral t-distribution with noncentrality parameter δ =√
n(µ−µ0)/σ.
Thus the power function depends on µ and σ only through δ and we
write
β(δ) = P(tn−1,δ ≤−tcrit)+1−P(tn−1,δ < tcrit)
= pt(−tcrit,n−1,δ)+1−pt(tcrit,n−1,δ) ↗ as |δ| ↗
73
-
Power Function of Two-Sided t-TestApplied Statistics and
Experimental Design Fritz Scholz — Fall 2006
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
δ = n(µ − µ0) σ
β(δ)
sample size n = 10
α = 0.05α = 0.01
74
-
How to Use the Power FunctionApplied Statistics and Experimental
Design Fritz Scholz — Fall 2006
From the previous plot we can read off for the level α = .05
test
β(δ)≈ .6 for δ =±√
n(µ0−µ)/σ≈ 2.5 or |µ0−µ| ≈ 2.5σ/√
n.
The smaller the natural variability σ the smaller the difference
|µ0−µ|we can detect with probability .6.
Similarly, the larger the sample size n the smaller the
difference |µ0−µ|we can detect with probability .6, note however
the effect of
√n.
Both of these conclusions are intuitive because σ(X̄) = σ/√
n.
Given a required detection difference |µ−µ0| and with some upper
boundknowledge σu ≥ σ we can plan the appropriate minimum sample
size n to achievethe desired power .6: 2.5×σ/|µ−µ0| ≤
2.5×σu/|µ−µ0|=
√n.
For power 6= .6 replace 2.5 by the appropriate value from the
previous plot.75
-
Where is the Flaw in Previous Argument?Applied Statistics and
Experimental Design Fritz Scholz — Fall 2006
We tacitly assumed that that the power curve plot would not
change with n.
Both tcrit = qt(1−α/2,n−1) and P(tn−1,δ ≤±tcrit) depend on
n.
See the next 3 plots.
Thus it does not suffice to consider the n in δ alone.
However, typically the sample size requirements will ask for
large values of n.
In that case the power functions stay more or less stable.
Compare n = 100 and n = 1000.
Will provide a function that gets us out of this dilemma.
76