Sampling Distribution & Confidence Interval CI - 1 1 A Normal Distribution Example: Consider the distribution of serum cholesterol levels for 40- to 70-year-old males living in community A has a mean of 211 mg/100 ml, and the standard deviation of 46 mg/100 ml. If an individual is selected from this population, what is the probability that his/her serum cholesterol level is higher than 225? 2 P(X > 225) = ? 225 0 z x 211 X ~ N (m = 211, s = 46) .30 225 - 211 46 = .30 z = .382 .382 Statistical Inference • Estimation • Testing Hypothesis 3 4 Statistics Used to Estimate Population Parameters Sample Mean, Sample Variance, s 2 Sample Proportion, … Estimators p ˆ x m population mean s 2 population variance p population proportion Parameters Statistics 5 Sampling Distribution Sampling distribution is probability distribution of the sample Statistic. What is the sampling distribution of mean? • Shape: Normal • Parameters: Mean, Standard Deviation In many situations, mean and standard deviation can completely determine the distribution of a specific shape. 6 Sampling Distribution of Mean (Parameters) n x s s m m x
13
Embed
Sampling Distribution & Confidence Intervalgchang.people.ysu.edu/class/s3717/L3717_7_SamplingDistCI...Sampling Distribution & Confidence Interval CI - 1 1 A Normal Distribution Example:
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Sampling Distribution & Confidence Interval
CI - 1
1
A Normal Distribution
Example: Consider the distribution of serum
cholesterol levels for 40- to 70-year-old
males living in community A has a mean of
211 mg/100 ml, and the standard deviation
of 46 mg/100 ml. If an individual is selected
from this population, what is the probability
that his/her serum cholesterol level is higher
than 225?
2
P(X > 225) = ?
225
0
z
x
211
X ~ N (m = 211, s = 46)
.30
225 - 211
46 = .30 z =
.382
.382
Statistical Inference
• Estimation
• Testing Hypothesis
3 4
Statistics Used to Estimate Population Parameters
Sample Mean,
Sample Variance, s2
Sample Proportion,
…
Estimators
p̂
x m population mean
s 2 population variance
p population proportion
Parameters Statistics
5
Sampling Distribution
Sampling distribution is probability distribution of the sample Statistic.
What is the sampling distribution of mean?
• Shape: Normal
• Parameters: Mean, Standard Deviation
In many situations, mean and standard
deviation can completely determine the
distribution of a specific shape.
6
Sampling Distribution of Mean (Parameters)
nx
ss mm x
Sampling Distribution & Confidence Interval
CI - 2
7
Sampling Distribution
s = 2
m = 8
x Population
Distribution
m = 8
x
4.025
2xsSampling
Distribution of Mean (Sample size n=25)
8
Standard Error of Mean
1. Formula
2. Standard Deviation of the sampling distribution of the Sample Means,X
3. Less Than Pop. Standard Deviation
n
s
nx
ss
ss
n
9
Sampling Distribution of Mean (Distribution shape)
Normal distribution theorem: If a random
sample is taken from a normally distributed population, then the sampling distribution of mean would be normal.
Central Limit Theorem: When a relative
large random sample is taken from any
population, regardless of the distribution of
the population, the sampling distribution of
mean would be approximately normal.
10
X
Central Limit Theorem
As
sample
size gets
large
enough
(n 30) ...
sampling
distribution
becomes
almost
normal.
ss
xn
m mx
11
A Random Sample from Population
Population mean = 19.9, standard deviation = 12.6
Random Sample of Size 400 from Population
110.0
100.090.0
80.070.0
60.050.0
40.030.0
20.010.0
0.0
120
100
80
60
40
20
0
Std. Dev = 12.92
Mean = 20.7
N = 400.00
12
Simulated Sampling Distribution of Means
SIZE2
77.073.0
69.065.0
61.057.0
53.049.0
45.041.0
37.033.0
29.025.0
21.017.0
13.09.0
5.01.0
70
60
50
40
30
20
10
0
Std. Dev = 8.88
Mean = 20.3
N = 400.00
n=2 SIZE4
77.073.0
69.065.0
61.057.0
53.049.0
45.041.0
37.033.0
29.025.0
21.017.0
13.09.0
5.01.0
70
60
50
40
30
20
10
0
Std. Dev = 5.40
Mean = 19.4
N = 400.00
n=4 SIZE10
77.073.0
69.065.0
61.057.0
53.049.0
45.041.0
37.033.0
29.025.0
21.017.0
13.09.0
5.01.0
100
80
60
40
20
0
Std. Dev = 4.32
Mean = 19.9
N = 400.00
n=10
SIZE25
77.00
73.00
69.00
65.00
61.00
57.00
53.00
49.00
45.00
41.00
37.00
33.00
29.00
25.00
21.00
17.00
13.009.00
5.001.00
200
100
0
Std. Dev = 2.23
Mean = 19.84
N = 400.00
n=25 SIZE50
77.00
73.00
69.00
65.00
61.00
57.00
53.00
49.00
45.00
41.00
37.00
33.00
29.00
25.00
21.00
17.00
13.009.00
5.001.00
200
100
0
Std. Dev = 1.64
Mean = 19.75
N = 400.00
n=50 SIZE100
77.00
73.00
69.00
65.00
61.00
57.00
53.00
49.00
45.00
41.00
37.00
33.00
29.00
25.00
21.00
17.00
13.009.00
5.001.00
300
200
100
0
Std. Dev = 1.20
Mean = 19.81
N = 400.00
n=100
Sampling Distribution & Confidence Interval
CI - 3
13
Probability Related to Mean
Example: Consider the distribution of serum
cholesterol levels for 40- to 70-year-old
males living in community A has a mean of
211 mg/100 ml, and the standard deviation
of 46 mg/100 ml. If a random sample of
100 individuals is taken from this population,
what is the probability that the average
serum cholesterol level of these 100
individuals is higher than 225?
),(n
NX xx
ssmm
14
P(X > 225) = ? Cholesterol Level has a mean 211, sd. 46.
The sampling distribution of the mean is normally distributed.
6.4100
46
211
nx
x
ss
mmParameters of the sampling distribution of the mean:
211
x100
)6.4,211(~
n
NX xx sm
15
P(X > 225) = ?
225
3.04
225 - 211
4.6 = 3.04
.001
Cholesterol Level has a mean 211, s.d. 46.
001.0
)04.3()225(
ZPXP
211
n = 100
x
)6.4,211( xxNX sm
0
z
16
Introduction to Estimation
Confidence Intervals
&
Sample Size
HT - 17
Sampling Distribution of Sample Proportion
Parameter: Population Proportion p (or p)
(Percentage of people has no health insurance)
Statistic: Sample Proportion n
xp ˆ
x is number of successes
n is sample size
Data: 1, 0, 1, 0, 0 4.5
2ˆ p
4.5
00101
x
xp ˆ
HT - 18
For a large random sample of size n from a population with proportion of successes p , and
therefore proportion of failures 1 – p , the sampling distribution of sample proportion,
= x/n, where x is the number of successes in the sample, is approximately normal with
a mean = p
and standard deviation =
under the following two conditions.
p̂
n
pp )1( -
Sampling Distribution of Sample Proportion, 𝑝
Sampling Distribution & Confidence Interval
CI - 4
19
Sample Size Condition: The sampling distribution of sample proportion is approximately normal under the assumptions that
• np and n(1-p) > 10, i.e., at least 10
failures and 10 successes in the sample [Central Limit Theorem], and
• p is not too close to 0 or 1.
20
Population Size Condition: The standard deviation of the sampling distribution is when either • population is infinitely large, or • the sample is from a finite population and
the size of the sample is no more than 10% of this population.
n
pp )1( -
21
Disadvantage of Point Estimation
1. Provides Single Value
Based on Observations from 1 Sample.
* Sample Proportion = .32 or 32% is a
Point Estimate of Unknown Population proportion.
2. Gives No Information about How Close Value Is to the Unknown Population Parameter
Which of the following statistics do you prefer? a. 32% b. 32% with a margin of error 3%
p̂
22
Estimation
You’re interested in finding the percentage of people in favor of candidate A?
How can we estimate this average with a measure of reliability?
32% 30% 32% 15 % 32% 3%
23
Interval Estimation for Proportion
Margin of Error Gives Information about How Close the Estimated Value Is to the Unknown Population Proportion.
24
Sampling Error
Sample statistic
(point estimate)
p p̂
Sampling Error = | p – | p̂
Sampling Distribution & Confidence Interval
CI - 5
25
Key Elements of Interval Estimation
Sample statistic
(point estimate)
Confidence
limit (lower)
Confidence
limit (upper)
Confidence
interval
Confidence Level: A probability that the
population parameter falls somewhere
within the interval.
32% 3%
Margin of Error p̂26
Za/2 Notation
z
0
a/2 1 - a
a/2
za /2 - za /2
27
Za /2 Notation
z
0
.025
.95
.025
z.025 - z.025
28
A Special Notation
Z .05 .06 .07
1.8 .032 .031 .031
1.9 .026 .025 .024
2.0 .020 .020 .019
2.1 .016 .015 .015
za = the z score that the proportion of
the standard normal distribution to the
right of it is a.
z.025 = ?
0 z.025
1.96
.025
29 30
Sampling Distribution of Proportion
s
p
Within how many standard deviations of
the population proportion, p, will have
95% of the sampling distribution?
.025
.025
p - ?s
.95
p̂
p + ?s p̂p̂
Sampling Distribution & Confidence Interval
CI - 6
31
The Confidence Interval
95% Sample
Proportions
p
1- a = .95
Confidence Level
a/2 a/2 = .025
1.96 = z.025 s p̂
p + 1.96s p̂p - 1.96s p̂
p̂Confidence Interval => p - 1.96s p + 1.96s p̂
p̂p̂
n
pp )1( -
n
pp )ˆ1(ˆ -
32
Confidence Interval
Proportion
1. Assumptions
Normal Approximation Can Be Used If np and n(1 – p) are both greater than 10.
2. Confidence Interval Estimate
(for large sample)
) )ˆ1(ˆ
ˆ , )ˆ1(ˆ
ˆ ( 22n
ppzp
n
ppzp
-
-- aa
n
ppp
)ˆ1(ˆzˆ
2
- a
33
95% Samples
s
p
2.5% 2.5%
95 % of
intervals
contain p.
5% do not.
The Confidence Interval
34
Factors Affecting Interval Width
• Data Dispersion (Affects standard error)
• Sample Size (Affects standard error)
• Level of Confidence, 1 - a, (Affects Za/2)
)ˆ1(ˆ
n
pp -
) )ˆ1(ˆ
ˆ , )ˆ1(ˆ
ˆ ( 22n
ppzp
n
ppzp
-
-- aa
Standard Error:
35
90% Samples
95% Samples
99% Samples
m + 1.65s m + 2.58s
s
m+1.96s
m - 2.58s m - 1.65s
m-1.96s
m
Size of Interval
p̂
p̂p̂ p̂
p̂
p̂
p̂
p̂
36
Estimation Example Proportion
A random sample of 400 from a large community showed that 32 have diabetes. Set up a 95% confidence interval estimate for p, the percentage of people that have diabetes.
96.1
08.0400
32ˆ
400
025.2
zz
p
n
a
)ˆ1(ˆ
ˆ2/
n
ppZp
- a
400
)08.1(08.96.108.
-
) %7.10 , %3.5 (
%7.2%8 .027 .08
Sampling Distribution & Confidence Interval
CI - 7
37
Thinking Challenge
A member of a health department wish to see what percentage of people in a community will support an environmental policy. Of 200 survey forms sent and received, 35 responded that they support the policy and the rest of them do not support the policy.
Find a 90% confidence interval estimate of the percentage of the population in this community that support the policy?
38
Confidence Interval Solution*
645.1 ,200 175.200
35ˆ
2/ azn,p
)ˆ1(ˆ
ˆ 2/n
ppzp
- a
200
)825(.175.645.1175.
) %92.21 , %08.13 (
4.42%17.5%0442. .175
39
Example:
Researchers wish to estimate the percentage of hospital employees infected by SARS in a certain country. Out of 500 randomly chosen hospital employees, 14 were infected. Find the 95% confidence interval estimate for percentage of hospital employees infected by SARS in this country.
40
Sample Size
to get the largest sample to
achieve the goal.
if pilot study is done.
n
ppp
)ˆ1(ˆzˆ :C.I.
2
- a
n
ppZB
)ˆ1(ˆError ofMargin
2
- a
)ˆ1(ˆ2
2
2 ppB
zn -
a
25.0
or
2
2
2 B
zn
a
41
Sample Size (No prior information on p)
Sample Size Example: If one wishes to do a survey to estimate the population proportion with 95% confidence and a margin of error of 3%, how large a sample is needed?
Za/2 = 1.96; B = .03
n = (1.962/.032) x .25 = 1067.11
A sample of size 1068 is needed.
42
Sample Size (With prior information on p)
Sample Size Example: If one wishes to to estimate the percentage of people infected with West Nile in a population with 95% confidence and a margin of error of 3%, how large a sample is needed? (A pilot study has been done, and the sample proportion was 6%.)
Za/2 = 1.96; B = .03
n = (1.962/.032) x .06 x (1 – .06) = 240.7
A sample of size 241 is needed.
How large a sample was used for pilot study?
Sampling Distribution & Confidence Interval
CI - 8
43
Interval Estimation for Mean
Margin of Error Gives Information about How Close the Estimated Value Is to the Unknown Population Mean.
44
Sampling Error in Estimating Mean
Sample statistic
(point estimate)
x m
Sampling Error = | m – x |
45
Key Elements of Interval Estimation
Sample statistic
(point estimate)
Confidence
limit (lower)
Confidence
limit (upper)
Confidence
interval
Confidence Level: A probability that the
population parameter falls somewhere
within the interval.
x Margin of Error
98 1 F
46
(1-a)·100% Confidence Interval Estimate for mean of a normal population
or
) , ( 2/2/n
zxn
zxss
aa -
2/n
zxs
a Margin of Error
Confidence Interval for Mean
(s Known)
“s Known” may mean that we have very good estimate of s.
It is not practical to assume that we know s.
47
The Confidence Interval
95% Sample
Means
s x _
X
m + 1.96sx m - 1.96sx
m
1- a = .95
Confidence Level
a/2 a/2 = .025
1.96 = z.025
x + 1.96sx x - 1.96sx
x
Confidence Interval => 48
Confidence Interval of Mean
(s unKnown and n 30)
(1-a)·100% Confidence Interval Estimate for mean of a population when sample size is relative large
or
) , ( 2/2/n
szx
n
szx - aa
2/n
szx a
Sampling Distribution & Confidence Interval
CI - 9
49
The Confidence Interval
95% Samples
s x _
X
m + 1.96sx m - 1.96sx
m
x - 1.96sx x + 1.96sx
x
Confidence Interval =>
95% Confidence
Interval
50
95% Samples
s x _
X m
2.5% 2.5%
95 % of
intervals
contain m.
5% do not.
The Confidence Interval
51
Factors Affecting Interval Width
1. Data Dispersion Measured by s
2. Sample Size Affects standard error:
3. Level of Confidence (1 - a) Affects Za/2
n
x
ss
) , ( 2/2/n
zxn
zxss
aa -
52
90% Samples
95% Samples
99% Samples
m + 1.65s x m + 2.58sx
s x _
X
m+1.96s x
m - 2.58s x m - 1.65sx
m-1.96s x
m
Size of Interval
53
Estimation Example Mean (s Known)
The average weight of a random sample of n = 25 subjects isX = 140. Set up a 95% confidence interval estimate for m if s = 10. (Assume Normal population.)
3.92140or ) 92.341 , 08.631 (
) 25
1096.1041 ,
25
1096.1041 (
) , (
1.96. z .025, 2
.05, ,95.1
2/2/
2
-
-
-
nZX
nZX
ss
aaa
aa
a
2/n
zxs
a
143.92) (136.08,
92.3 140 25
1096.1401
54
Interpretation
We can be 95% confident that the population mean is in (136.08, 143.92).
We can be 95% confident that the maximum sampling error using this interval estimate for estimating mean is within 3.92.
Sampling Distribution & Confidence Interval
CI - 10
55
Confidence Interval of Mean
(s unKnown and n 30)
(1-a)·100% Confidence Interval Estimate for mean of a population when sample size is relative large
or
) , ( 2/2/n
szx
n
szx - aa
2/n
szx a
56
Thinking Challenge
Example: A city uses a certain noise index to monitor the noise pollution at a certain area of the city. A random sample of 100 observations from randomly selected days around noon showed an average index value of x = 1.99 and standard deviation s = 0.05. Find the 90% confidence interval estimate of the average noise index at noon.
57
Confidence Interval Solution*
) 998.1 , 982.1 (
0.008 1.99100
05.64.199.1
1.64z z
.05 /2 .1, 90.1 .90, 1
2/
.052 /
n
szx
--
a
a
aaa
58
Interval Estimation for Mean
In a survey, the BMI for a random sample of 64 individuals who lived in a community were measured. The sample mean was 26.5, and the sample standard deviation was 3.5. Find the 95% confidence interval estimate for the average BMI for people lived in this community.
59
Finding Sample Sizes
for Estimating m
B = Margin of Error or Bound
2
22
2
2
2
Error of Margin
nzx :C.I.
B
zn
nzB
s
s
s
a
a
a
60
Sample Size Example
What sample size is needed to be 90% confident of being correct within 5? A pilot study suggested that the standard deviation is 45.
2202.2195
45645.12
22
2
22
05. B
zn
s
Sampling Distribution & Confidence Interval
CI - 11
61
Thinking Challenge
You plan to survey residents in your county to find the average health insurance premium that they are paying. You want to be 95% confident that the sample mean is within ± $50. A pilot study showed that s was about $400. What sample size should you use?
62
Sample Size Solution*
24686.245
50
40096.12
22
2
22
025.0
B
zn
s
63
Confidence Interval Mean (s Unknown & n < 30)
1. Assumptions
Population Standard Deviation Is Unknown
Population Must Be Normally Distributed
2. Use Student’s t Distribution
3. Confidence Interval Estimate
) , ( 1,2/1,2/n
stx
n
stx nn - -- aa
n
stx
n
-1 ,2
a64
t
Student’s t Distribution
0
t (df = 5)
Z
Standard
Normal (Z)
Bell-Shaped
Symmetric
‘Fatter’ Tails
t (df = 13)
ns
xt
m-
65
Student’s t Table
t values
t0
.05
For a 90% C.I.:
n = 3
df = n - 1 = 2
a = .10
a/2 =.05
ta/2 = ?
2.920 66
Sampling Distribution & Confidence Interval
CI - 12
67
Estimation Example Mean (s Unknown)
A random sample of weights of 25 subjects, has a sample mean 140 and sample standard deviation 8. Set up a 95% confidence interval estimate for m.
) 31.341 , 69.631 (
3.31 140 25
8064.2041
064.2
.025, /2 .05,.951 .95, 1
025.024 , /2
--
tt dfa
aaa
1,2/n
stx n -a
68
Thinking Challenge
The numbers of community hospital beds per 1000 population that are available in each different regions of the country is normally distributed. A random sample 6 regions were selected and the rates of beds per 1000 were recorded and they are
3.6, 4.2, 4.0, 3.5, 3.8, 3.1.
Find the 90% confidence interval estimate of the mean bed-rate in the country.
69
Confidence Interval Solution*
= 3.7
s = 0.38987
x
1592.6
38987.
n
s
(use 90% confidence level)
n = 6, df = n - 1 = 6 - 1 = 5
t.05, 5 = 2.015
( 3.7 - (2.015)(0.1592), 3.7 + (2.015)(0.1592) )
3.7 ± 0.321 => ( 3.379, 4.021 )
n
stx n -1 ,2/a
70
Confidence interval with z-score:
The (1- a% confidence interval estimate for population mean:
Assumption: If sampled from normal
population with known variance, s,
Assumption: If large sample and if
unknown variance, s replaces s,
nzx
sa 2/
n
szx 2/a
71
Confidence interval with t-score:
The (1- a% confidence interval estimate for population mean:
Assumption: If sampled from normal
population with unknown variance, s,
n
stx ndf - 1 ,2/a
(If sample size is large the normality assumption is
insignificant.) t z as sample becomes large
72
Average Weight for Female Ten Year Children In US
Info. from a random sample: n = 10, x = 80 lb, s = 18.05 lb, assume weight is normally distributed, find the 95% confidence interval estimate for average weight.
How do we know whether normality assumption is OK?
Sampling Distribution & Confidence Interval
CI - 13
73
Tests of Nor ma lity
.171 10 .200* .930 10 .452weight (pounds) of participantStatistic df Sig. Statistic df Sig.
Kolmogorov-Smirnov a Shapiro-Wilk
This is a lower bound of the true significance.*.
Lilliefors Significance Correctiona.
Both are greater than 0.05, normality assumption is acceptable.
74
Average Weight for Female Ten Year Children In US
Info. from a random sample: n = 10, x = 80 lb, s = 18.05 lb, assume weight is normally distributed, find the 95% confidence interval estimate for average weight.
In a survey, the BMI for a random sample of 16 individuals who lived in a community were measured. The sample mean was 26.5, and the sample standard deviation was 3.5. Find the 95% confidence interval estimate for the average BMI for people lived in this community.