Sampling Distribution of the Mean Central Limit Theorem Given population with and the sampling distribution will have: A mean A variance Standard Error.

Sampling Distribution of the Mean

Central Limit Theorem

Given population with and the sampling distribution will have:

2

A mean

A variance

Standard Error (mean)

As N increases, the shape of the distribution becomes normal (whatever the shape of the population)

Nx

)(

)( x

Nx

22 )(

Nx

22 )(

Testing Hypothesis Known and

Remember:

We could test a hypothesis concerning a population and a single score by

x

z

Obtain )(zp and use z table

We will continue the same logic

Given: Behavior Problem Score of 10 years olds

10,50

)5( N Sample of 10 year olds under stress 56x

500 H 501 H

Because we know and , we can use the Central Limit Theorem to obtain the Sampling Distribution when H0 is true.

Sampling Distribution will have

205

10,50

222

Nxx

47.4)tan( N

errordardsx

We can find areas under the distribution by referring to Z table

We need to know 56)( xp

Minor change from z score

x

z NOWx

xz

or

N

xz

34.147.4

6

47.4

5056

zWith our data

Changes in formula because we are dealing with distribution of means NOT individual scores.

From Z table we find

34.1)( zp is 0.0901

Because we want a two-tailed test we double 0.0901

(2)0.0901 = 0.1802

05.01802.0 NOT REJECT H0

or

025.00901.0 is

One-Sample t test

Pop’n = known & unknown we must estimate

with

22

2S

Because we use S, we can no longer declare the answer to be a Z, now it is a t

Why?

Sampling Distribution of t

-

-

S2 is unbiased estimator of 2

The problem is the shape of the S2 distribution

positively

skewed

thus: S2 is more likely to UNDERESTIMATE

(especially with small N)

2

thus: t is likely to be larger than Z (S2 is in denominator)

t - statistic

N

x

N

xxz

x2

and substitute S2 for 2

N

S

x

N

Sx

S

xt

x2

To treat t as a Z would give us too many significant results

Guinness Brewing Company (student)

Student’s t distribution we switch to the t Table when we use S2

Go to Table

Unlike Z, distribution is a function of with df ztN ,

Degrees of Freedom

For one-sample cases, 1Ndf

df1 lost because we used (sample mean) to calculate S2

x

,0)( xx all x can vary save for 1

Example: One-Sample Unknown 2

Effect of statistic tutorials:

Last 100 years:

this years:

0.763.79x

(no tutorials)

(tutorials)

N = 20, S = 6.4

76:0 H 76:1 H

error standard

meann Pop' -Mean Sample

xS

xt

N

sx

20

4.6763.79

43.1

3.3 31.2

Go to t-Table

t-Table

- not area (p) above or below value of t

- gives t values that cut off critical areas, e.g., 0.05

- t also defined for each df

N=20 df = (N-1) = 20-1 = 19

Go to Table

t.05(19) is 2.093critical value

093.231.2

reject 0H

Difference between and

Factors Affecting Magnitude of t & Decision

1. x )( xthe larger the numerator, the larger the t value

2.

as S2 decreases, t increases

Size of S2

3. Size of Nas N increases, denominator decreases, t increases

4. level

5. One-, or two-tailed test

Confidence Limits on Mean

Point estimate

Specific value taken as estimator of a parameter

Interval estimates

A range of values estimated to include parameter

Confidence limits

Range of values that has a specific (p) of bracketing the parameter. End Points = confidence limits.

How large or small could be without rejecting if we ran a t-test on the obtained sample mean.

0H

Confidence Limits (C.I.)

N

Sx

S

xt

x

We already know , S and x N

We know critical value for t at 05.

093.2)19(05. t

We solve for

43.1

3.79

20

4.63.79

093.2

Rearranging

3.79)43.1(093.2 3.79993.2

Using +2.993 and -2.993

29.823.79993.2 upper31.763.79993.2 lower

29.8231.76. 95. IC

Two Related Samples t

Related Samples

Design in which the same subject is observed under more than one condition (repeated measures, matched samples)

Each subject will have 2 measures and that will be correlated. This must be taken into account.

1x 2x

Promoting social skills in adolescents

Before and after intervention

0: 21210 orHbefore after

Difference Scores

Set of scores representing the difference between the subject’s performance or two occasions )( 21 xandx

933.2998.5914.6

200.2133.11333.13

6915

2108

11110

41418

31215

61420

231

033

12726

51217

4812

21113

21719

145

61218

)(21

DDifferencexx

xS

our data can be the D column0:0 DH from 021 0H

we are testing a hypothesis using ONE sample

Related Samples t

now

N

SD

S

Dt

DD

00

remember

xS

xt

N

DD

N = # of D scores

Degrees of Freedomsame as for one-sample case

df = (N - 1) = (15 - 1) = 14

our data91.2

757.0

20.2

15

933.2020.2

t

Go to table

91.2145.2)14(05. tt

0Hreject

Advantages of Related Samples

2.

Avoids problems that come with subject to subject variability.

The difference between(x1) 26 and (x2) 24 is the same as between (x1) 6 and (x2) 4

(increases power)

(less variance, lower denominator, greater t)

1.

Control of extraneous variables

3. Requires fewer subjects

Disadvantages

1. Order effects

2. Carry-over effects

record means and and the differences between , and for each pair of samples

Two Independent Samples t

0: 210 H

2x

1x

Sampling distribution of differences between means

Suppose: 2 pop’ns and1x

1 and 221 and

22

draw pairs of samples: sizes N1, and N2

2x1x 2x )( 21 xx

repeat times

1x

1x11x 2111 xx

2x

2x1x

Mean Difference

21x

12x 22x

2x1x

2111 xx

2111 xx

Mean

Variance

StandardError

Variance Sum Law

Variance of a sum or difference of two INDEPENDENT variables = sum of their variances

The distribution of the differences is also normal

12 21

1

21

N

1

1

N

2

22

1

21

NN

2

22

N

2

2

N

2

22

1

21

NN

We must estimate with

t Difference Between Means

21

)()( 2121

xx

xxz

2 2s

2

22

1

21

2121 )()(

NN

xx

21

)()( 2121

xxs

xxt

Because

0: 210 H

21

)( 21

xxS

xxt

2

22

1

21

21 )(

Ns

Ns

xxt

or

When we need a better estimate of

2

22

1

21

21 )(

ns

ns

xxt

21 nn 2

is O.K. only when the N’s are the same size

We must assume homogeneity of variance )( 22

21

Rather than using or to estimate , we use their average.

21s

22s 2

Because 21 nn

we need a Weighted Average

weighted by their degrees of freedom )( 2ps

2

)1()1(

21

222

2112

nn

snsnsp Pooled

Variance

Now

21

21

xxs

xxt

21

11

nn

2ps

2

22

1

21

21

ns

ns

xx

21

2

21

11nn

s

xx

p

come from formula for Standard Error

Degrees of Freedom

two means have been used to calculate

)1()1( 21 nndf

dfnn 221

Example:

671.3286.525.1500.18

1517171315152016161415151613211420161819151816131814221318172118171317

2Group1Group

x2s

Example:

We have numerator

We need denominator

18.00 – 15.25

???????

Pooled Variance because 21 nn

2

)1()1(

21

222

2112

nn

snsnsp

22015

)671.3(19)286.5(14

356.433

749.69004.74

20

356.4

15

356.4

20

1

15

1356.4

Denominator becomes

=

21

2

21

11nn

s

xxt

p

20356.4

15356.4

)25.1500.18(

5082.0

75.2

86.3713.0

75.2 33)22015( df

04.2)33(05. t04.286.3 t

0Hreject

Go to Table

If is known and is unknown, then replaces

in Z score formula; replaces

Summary

2

n

xz

2 x x

xs

n

sx

t

D x

Ds

n

sD

tD

0

If and are known, then treat as

s in xs

If two related samples, then replaces

and replaces xs

then and are replaced by

size, then is replaced by

If two independent samples, and Ns are of equal

Ds2

2

1

1

n

s

n

s

ns

ns

xxt

22

21

21

21s

22s

2ps

21

2

21

11nn

s

xxt

p

If two independent samples, and Ns are NOT equal,

Sampling Distribution of the Mean Central Limit Theorem Given population with and the sampling distribution will have: A mean A variance Standard Error.

Documents

Sampling Distribution of the Mean Central Limit Theorem Given population with and the sampling distribution will have: A mean A variance Standard Error.