Sampling Distribution of the Mean
Central Limit Theorem
Given population with and the sampling distribution will have:
2
A mean
A variance
Standard Error (mean)
As N increases, the shape of the distribution becomes normal (whatever the shape of the population)
Nx
)(
)( x
Nx
22 )(
Nx
22 )(
Testing Hypothesis Known and
Remember:
We could test a hypothesis concerning a population and a single score by
x
z
Obtain )(zp and use z table
We will continue the same logic
Given: Behavior Problem Score of 10 years olds
10,50
)5( N Sample of 10 year olds under stress 56x
500 H 501 H
Because we know and , we can use the Central Limit Theorem to obtain the Sampling Distribution when H0 is true.
Sampling Distribution will have
205
10,50
222
Nxx
47.4)tan( N
errordardsx
We can find areas under the distribution by referring to Z table
We need to know 56)( xp
Minor change from z score
x
z NOWx
xz
or
N
xz
34.147.4
6
47.4
5056
zWith our data
Changes in formula because we are dealing with distribution of means NOT individual scores.
From Z table we find
34.1)( zp is 0.0901
Because we want a two-tailed test we double 0.0901
(2)0.0901 = 0.1802
05.01802.0 NOT REJECT H0
or
025.00901.0 is
One-Sample t test
Pop’n = known & unknown we must estimate
with
22
2S
Because we use S, we can no longer declare the answer to be a Z, now it is a t
Why?
Sampling Distribution of t
-
-
S2 is unbiased estimator of 2
The problem is the shape of the S2 distribution
positively
skewed
thus: S2 is more likely to UNDERESTIMATE
(especially with small N)
2
thus: t is likely to be larger than Z (S2 is in denominator)
t - statistic
N
x
N
xxz
x2
and substitute S2 for 2
N
S
x
N
Sx
S
xt
x2
To treat t as a Z would give us too many significant results
Guinness Brewing Company (student)
Student’s t distribution we switch to the t Table when we use S2
Go to Table
Unlike Z, distribution is a function of with df ztN ,
Degrees of Freedom
For one-sample cases, 1Ndf
df1 lost because we used (sample mean) to calculate S2
x
,0)( xx all x can vary save for 1
Example: One-Sample Unknown 2
Effect of statistic tutorials:
Last 100 years:
this years:
0.763.79x
(no tutorials)
(tutorials)
N = 20, S = 6.4
76:0 H 76:1 H
error standard
meann Pop' -Mean Sample
xS
xt
N
sx
20
4.6763.79
43.1
3.3 31.2
Go to t-Table
t-Table
- not area (p) above or below value of t
- gives t values that cut off critical areas, e.g., 0.05
- t also defined for each df
N=20 df = (N-1) = 20-1 = 19
Go to Table
t.05(19) is 2.093critical value
093.231.2
reject 0H
Difference between and
Factors Affecting Magnitude of t & Decision
1. x )( xthe larger the numerator, the larger the t value
2.
as S2 decreases, t increases
Size of S2
3. Size of Nas N increases, denominator decreases, t increases
4. level
5. One-, or two-tailed test
Confidence Limits on Mean
Point estimate
Specific value taken as estimator of a parameter
Interval estimates
A range of values estimated to include parameter
Confidence limits
Range of values that has a specific (p) of bracketing the parameter. End Points = confidence limits.
How large or small could be without rejecting if we ran a t-test on the obtained sample mean.
0H
Confidence Limits (C.I.)
N
Sx
S
xt
x
We already know , S and x N
We know critical value for t at 05.
093.2)19(05. t
We solve for
43.1
3.79
20
4.63.79
093.2
Rearranging
3.79)43.1(093.2 3.79993.2
Using +2.993 and -2.993
29.823.79993.2 upper31.763.79993.2 lower
29.8231.76. 95. IC
Two Related Samples t
Related Samples
Design in which the same subject is observed under more than one condition (repeated measures, matched samples)
Each subject will have 2 measures and that will be correlated. This must be taken into account.
1x 2x
Promoting social skills in adolescents
Before and after intervention
0: 21210 orHbefore after
Difference Scores
Set of scores representing the difference between the subject’s performance or two occasions )( 21 xandx
933.2998.5914.6
200.2133.11333.13
6915
2108
11110
41418
31215
61420
231
033
12726
51217
4812
21113
21719
145
61218
)(21
DDifferencexx
xS
our data can be the D column0:0 DH from 021 0H
we are testing a hypothesis using ONE sample
Related Samples t
now
N
SD
S
Dt
DD
00
remember
xS
xt
N
DD
N = # of D scores
Degrees of Freedomsame as for one-sample case
df = (N - 1) = (15 - 1) = 14
our data91.2
757.0
20.2
15
933.2020.2
t
Go to table
91.2145.2)14(05. tt
0Hreject
Advantages of Related Samples
2.
Avoids problems that come with subject to subject variability.
The difference between(x1) 26 and (x2) 24 is the same as between (x1) 6 and (x2) 4
(increases power)
(less variance, lower denominator, greater t)
1.
Control of extraneous variables
3. Requires fewer subjects
Disadvantages
1. Order effects
2. Carry-over effects
record means and and the differences between , and for each pair of samples
Two Independent Samples t
0: 210 H
2x
1x
Sampling distribution of differences between means
Suppose: 2 pop’ns and1x
1 and 221 and
22
draw pairs of samples: sizes N1, and N2
2x1x 2x )( 21 xx
repeat times
1x
1x11x 2111 xx
2x
2x1x
Mean Difference
21x
12x 22x
2x1x
2111 xx
2111 xx
Mean
Variance
StandardError
Variance Sum Law
Variance of a sum or difference of two INDEPENDENT variables = sum of their variances
The distribution of the differences is also normal
12 21
1
21
N
1
1
N
2
22
1
21
NN
2
22
N
2
2
N
2
22
1
21
NN
We must estimate with
t Difference Between Means
21
)()( 2121
xx
xxz
2 2s
2
22
1
21
2121 )()(
NN
xx
21
)()( 2121
xxs
xxt
Because
0: 210 H
21
)( 21
xxS
xxt
2
22
1
21
21 )(
Ns
Ns
xxt
or
When we need a better estimate of
2
22
1
21
21 )(
ns
ns
xxt
21 nn 2
is O.K. only when the N’s are the same size
We must assume homogeneity of variance )( 22
21
Rather than using or to estimate , we use their average.
21s
22s 2
Because 21 nn
we need a Weighted Average
weighted by their degrees of freedom )( 2ps
2
)1()1(
21
222
2112
nn
snsnsp Pooled
Variance
Now
21
21
xxs
xxt
21
11
nn
2ps
2
22
1
21
21
ns
ns
xx
21
2
21
11nn
s
xx
p
come from formula for Standard Error
Degrees of Freedom
two means have been used to calculate
)1()1( 21 nndf
dfnn 221
Example:
671.3286.525.1500.18
1517171315152016161415151613211420161819151816131814221318172118171317
2Group1Group
x2s
Example:
We have numerator
We need denominator
18.00 – 15.25
???????
Pooled Variance because 21 nn
2
)1()1(
21
222
2112
nn
snsnsp
22015
)671.3(19)286.5(14
356.433
749.69004.74
20
356.4
15
356.4
20
1
15
1356.4
Denominator becomes
=
21
2
21
11nn
s
xxt
p
20356.4
15356.4
)25.1500.18(
5082.0
75.2
86.3713.0
75.2 33)22015( df
04.2)33(05. t04.286.3 t
0Hreject
Go to Table
If is known and is unknown, then replaces
in Z score formula; replaces
Summary
2
n
xz
2 x x
xs
n
sx
t
D x
Ds
n
sD
tD
0
If and are known, then treat as
s in xs
If two related samples, then replaces
and replaces xs
then and are replaced by
size, then is replaced by
If two independent samples, and Ns are of equal
Ds2
2
1
1
n
s
n
s
ns
ns
xxt
22
21
21
21s
22s
2ps
21
2
21
11nn
s
xxt
p
If two independent samples, and Ns are NOT equal,