Hypothesis Testing for Proportions 1 HT - 1 Statistical Inference Sampling Distribution 2 Review Normal Distribution Example: Consider the distribution of serum cholesterol levels for 40- to 70-year-old males living in community A has a mean of 211 mg/100 ml, and the standard deviation of 46 mg/100 ml. If an individual is selected from this population, what is the probability that his/her serum cholesterol level is higher than 225? 3 P(X > 225) = ? 225 0 z x 211 X ~ N (m = 211, s = 46) .30 225 - 211 46 = .30 z = .382 .382 4 Statistical Inference 1. Type of Inference: – Estimation – Hypothesis Testing 2. Purpose – Make Decisions about Population Characteristics Population? 5 Statistics Used to Estimate Population Parameters – Sample Mean, – Sample Variance, s 2 – Sample Proportion, … Estimators p ˆ x m population mean s 2 population variance p population proportion Parameters Statistics Theoretical Basis Is Sampling Distribution. 6 Theoretical Probability Distribution of the Sample Statistic. Sampling Distribution What is the Shape of this distribution? What are the values of the parameters such as mean and standard deviation?
12
Embed
Review Normal Distribution Statistical Inferencegchang.people.ysu.edu/class/s5817/L/L5817_1_1_OneProp-First-SamplingDist.pdf · Statistical Inference 1. Type of Inference: –Estimation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Hypothesis Testing for Proportions
1
HT - 1
Statistical Inference
Sampling Distribution
2
Review Normal Distribution
Example: Consider the distribution of serum
cholesterol levels for 40- to 70-year-old
males living in community A has a mean of
211 mg/100 ml, and the standard deviation
of 46 mg/100 ml. If an individual is selected
from this population, what is the probability
that his/her serum cholesterol level is higher
than 225?
3
P(X > 225) = ?
225
0
z
x
211
X ~ N (m = 211, s = 46)
.30
225 - 211
46 = .30 z =
.382
.382
4
Statistical Inference
1. Type of Inference:
– Estimation
– Hypothesis Testing
2. Purpose
– Make Decisions
about Population
Characteristics
Population?
5
Statistics Used to Estimate Population Parameters
– Sample Mean,
– Sample Variance, s 2
– Sample Proportion,
…
Estimators
p̂
x m population mean
s 2 population variance
p population proportion
Parameters Statistics
Theoretical Basis Is Sampling Distribution.
6
Theoretical Probability Distribution of the
Sample Statistic.
Sampling Distribution
What is the Shape of this distribution?
What are the values of the parameters
such as mean and standard deviation?
Hypothesis Testing for Proportions
2
7
Sampling Distribution of The
Mean (Normal Theorem) If a random sample is taken from a Normally
distributed population that has a mean m and
a standard deviation s, the sampling
distribution of the sample mean, x, will be
Normal with a mean that is the same as the
population mean, and will have a standard
deviation that is equal to the standard
deviation of the population divided by the
square root of the sample size.
ss
xn
m mx 8
Sampling Distribution of The
Mean (Central Limit Theorem) If a relatively large random sample is taken
from a population that has a mean m and a
standard deviation s, the sampling
distribution of the sample mean, x, will be
approximately a normal distribution with
mean that is the same as the population
mean, and a standard deviation that is equal
to the standard deviation of the population
divided by the square root of the sample
size.
ss
xn
m mx
9
Standard Error of Mean
1. Formula
2. Standard Deviation of the sampling
distribution of the Sample Means,X
3. Less Than Pop. Standard Deviation
n
s
nx
ss
ss
n
10
Sampling Distribution
s = 2
m = 8
x Population
Distribution
m = 8
x
4.025
2xsSampling
Distribution of
Mean for a
sample of size 25
11
Probability Related to Mean
Example: Consider the distribution of serum
cholesterol levels for all 20- to 74-year-old
males living in United States has a mean of
211 mg/100 ml, and the standard deviation
of 46 mg/100 ml. If a random sample of
100 individuals is taken from the population,
what is the probability that the average
serum cholesterol level of these 100
individuals is higher than 225?
12
P(X > 225) = ?
225
3.04
225 - 211
4.6 = 3.04
.001
Cholesterol Level has a mean 211, s.d. 46.
001.0
)04.3()225(
ZPXP
211
n = 100
x
)6.4,211( xxNX sm
0
z
Hypothesis Testing for Proportions
3
Inference on Proportions
HT - 13 HT - 14
Inference on Proportion
Parameter: Population Proportion p (or p)
(Percentage of people has no health insurance)
Statistic: Sample Proportion n
xp ˆ
x is number of successes
n is sample size
Data: 1, 0, 1, 0, 0 4.5
2ˆ p
4.5
00101
x
xp ˆ
HT - 15
Sampling Distribution of
Sample Proportion A random sample of size n from a large population
with proportion of successes (usually represented by a value 1) p , and therefore proportion of failures (usually represented by a value 0) 1 – p , the sampling distribution of sample proportion,
= x/n, where x is the number of successes in the sample, is asymptotically normal with a mean p
and standard deviation . n
pp )1( -
p̂
HT - 16
Sampling Distribution
When sampling from a population that has a
proportion of successes p, the distribution of is approximately normal if n is large,
m = p
n
ppp
)1(ˆ
-s
p̂
p̂
17
Sampling Error
Sample statistic
(point estimate)
ө
Sampling Error = | ө – |
18
Key Elements of
Interval Estimation
Confidence
interval
Sample statistic
(point estimate)
Confidence
limit (lower)
Confidence
limit (upper)
Confidence Level: A probability that the
population parameter falls somewhere
within the interval.
Point estimate Margin of Error
Hypothesis Testing for Proportions
4
HT - 19
Sampling Distribution
m = p p̂
95%
za/2·
n
pp )ˆ1(ˆ -
1.96
p̂
za/2· n
pp )ˆ1(ˆ -p̂
HT - 20
Confidence Interval
Confidence interval: The (1- a)x100%
confidence interval estimate for population
proportion is
za/2·
n
pp )ˆ1(ˆ -p̂
Large Sample Assumption:
Both np and n(1-p) are greater than 5, that is, it is
expected that there at least 5 counts in each category.
HT - 21
1026 ,481026
495ˆ n.p
)ˆ1(ˆ
ˆ2/
n
ppZp
- a
1026
)48.1(48.96.148.
-
) %51 , %45( %3%48 .03 .48
Margin of Error
22
Sample Size
n
ppp
)ˆ1(ˆzˆ :C.I.
2
- a
n
ppZE
)ˆ1(ˆError ofMargin
2
- a
if pilot study is done. )ˆ1(ˆ2
2
2 ppE
zn -
a
to get the largest sample to
achieve the goal. 25.0
2
2
2 E
zn
a
23
Sample Size (No prior information on p)
Sample Size Example: If one wishes to do a
survey to estimate the population proportion
with 95% confidence and a margin of error of
3%, how large a sample is needed?
Za/2 = 1.96; E = .03
n = (1.962/.032) x .25 = 1067.11
A sample of size 1068 is needed.
24
Sample Size (With prior information on p)
Sample Size Example: If one wishes to to estimate
the percentage of people infected with West Nile in a
population with 95% confidence and a margin of
error of 3%, how large a sample is needed? (A pilot
study has been done, and the sample proportion was
6%.)
Za/2 = 1.96; E = .03
n = (1.962/.032) x .06 x (1 – .06) = 240.7
A sample of size 241 is needed.
How large a sample was used for pilot study?
Hypothesis Testing for Proportions
5
HT - 25
An Alternative Method
nz
nznppznzp
/1
)4/(/)ˆ1(ˆ)2/(ˆ2
2/
22
2/2/
2
2/
a
aaa
-
2//)1(
|/|az
npp
pny
-
-By solving for in p
n
yˆ
The (1- a)x100% Confidence Interval for p is
HT - 26
Hypothesis Testing
1. State research hypotheses or questions.
2. Gather data or evidence (observational or experimental) to answer the question.
3. Summarize data and test the hypothesis.
4. Draw a conclusion.
p = 30% ?
%2525.ˆ p
HT - 27
Statistical Hypothesis
Null hypothesis (H0):
Hypothesis of no difference or no relation,
often has =, , or notation when testing
value of parameters.
Example:
H0: p = 30% or
H0: Percentage of votes for A is 30%. HT - 28
Statistical Hypothesis
Alternative hypothesis (H1 or Ha)
Usually corresponds to research hypothesis and opposite to null hypothesis,
often has >, < or notation in testing mean.
Example:
Ha: p 30% or
Ha: Percentage of votes for A is not 30%.
HT - 29
Hypotheses Statements Example
• A researcher is interested in finding out
whether percentage of people in favor of
policy A is different from 60%.
H0: p = 60%
Ha: p 60%
[Two-tailed test]
HT - 30
Hypotheses Statements Example
• A researcher is interested in finding out
whether percentage of people in a community
that has health insurance is more than 77%.
H0: p = 77% ( or p 77% )
Ha: p > 77%
[Right-tailed test]
Hypothesis Testing for Proportions
6
HT - 31
Hypotheses Statements Example
• A researcher is interested in finding out
whether the percentage of bad product is
less than 10%.
H0: p = 10% ( or p 10% )
Ha: p < 10%
[Left-tailed test]
HT - 32
Evidence
Test Statistic (Evidence): A sample
statistic used to decide whether to reject
the null hypothesis.
HT - 33
Logic Behind
Hypothesis Testing
In testing statistical hypothesis,
the null hypothesis is first assumed to
be true.
We collect evidence to see if the evidence
is strong enough to disprove (reject) the null
hypothesis and therefore support the
alternative hypothesis.
HT - 34
One Sample Z-Test for Proportion
(Large sample test)
Two-Sided Test
HT - 35
I. Hypothesis
One wishes to test whether the percentage
of votes for A is different from 30%
Ho: p = 30% v.s. Ha: p 30%
HT - 36
What will be the key statistic (evidence) to use for testing the hypothesis about population proportion?
Evidence
pSample Proportion:
A random sample of 100 subjects is
chosen and the sample proportion is 25%
or .25.
Hypothesis Testing for Proportions
7
HT - 37
Sampling Distribution
If H0: p = 30% is true, sampling distribution of sample proportion will be approximately normally distributed with mean .3 and standard
deviation (or standard error)
.30
0458.0ˆ ps
0458.0100
)3.1(3.
-
p̂
HT - 38
This implies that the statistic is 1.09 standard
deviations away from the mean .3 under H0 ,
and is to the left of .3 (or less than .3)
09.1
100
)3.1(3.
3.25.
)1(
ˆˆ
00
0
ˆ
0
--
-
-
-
-
n
pp
ppppz
ps
II. Test Statistic
-1.09 0
Z
.25 .30
p̂
HT - 39
Level of Significance
Level of significance for the test (a)
A probability level selected by the
researcher at the beginning of the
analysis that defines unlikely values of
sample statistic if null hypothesis is true.
Total tail area = a
c.v. 0 c.v.
c.v. = critical value
HT - 40
III. Decision Rule Critical value approach: Compare the test statistic
with the critical values defined by significance level a,
usually a = 0.05.
We reject the null hypothesis, if the test statistic
z < –za/2 = –z0.025 = –1.96, or z > za/2 = z0.025 = 1.96.
( i.e., | z | > za/2 )
–1.09
–1.96 0 1.96 Z
a/2=0.025 a/2=0.025
Rejection
region Rejection
region
Two-sided Test
Critical values
HT - 41
III. Decision Rule p-value approach: Compare the probability of the evidence or more extreme evidence to occur when null hypothesis is true. If this probability is less than the level of significance of the test, a, then we reject the null hypothesis. (Reject H0 if p-value < a)
p-value = P(Z -1.09 or Z 1.09) = 2 x P(Z -1.09) = 2 x .1379 = .2758
Z 0
Left tail area .1379
–1.09 Two-sided Test
1.09
Right tail area .138
HT - 42
p-value p-value
The probability of obtaining a test statistic that is as extreme or more extreme than actual sample statistic value given null hypothesis is true. It is a probability that indicates the extremeness of evidence against H0.
The smaller the p-value, the stronger the evidence for supporting Ha and rejecting H0 .
Hypothesis Testing for Proportions
8
HT - 43
IV. Draw conclusion
Since from either
critical value approach z = -1.09 > -za/2= -1.96
or p-value approach p-value = .2758 > a = .05 ,
we do not reject null hypothesis.
Therefore we conclude that there is no
sufficient evidence to support the alternative
hypothesis that the percentage of votes
would be different from 30%.
HT - 44
Steps in Hypothesis Testing
1. State hypotheses: H0 and Ha.
2. Choose a proper test statistic, collect data, checking the assumption and compute the value of the statistic.
3. Make decision rule based on level of significance(a).
4. Draw conclusion. (Reject or not reject null hypothesis)
(Support or not support alternative hypothesis)
HT - 45
When do we use this z-test for
testing the proportion of a
population?
• Large random sample.
HT - 46
One-Sided Test
Example with the same data:
A random sample of 100 subjects is chosen
and the sample proportion is 25% .
HT - 47
I. Hypothesis
One wishes to test whether the percentage
of votes for A is less than 30%
Ho: p = 30% v.s. Ha: p < 30%
HT - 48
What will be the key statistic (evidence) to use for testing the hypothesis about population proportion?
Evidence
pSample Proportion:
A random sample of 100 subjects is
chosen and the sample proportion is 25%
or .25.
Hypothesis Testing for Proportions
9
HT - 49
Sampling Distribution
If H0: p = 30% is true, sampling distribution of sample proportion will be approximately normally distributed with mean .3 and standard
deviation (or standard error)
.30
0458.0ˆ ps
0458.0100
)3.1(3.
-
p̂
HT - 50
This implies that the statistic is 1.09 standard
deviations away from the mean .3 under H0 ,
and is to the left of .3 (or less than .3)
09.1
100
)3.1(3.
3.25.
)1(
ˆˆ
00
0
ˆ
0
--
-
-
-
-
n
pp
ppppz
ps
II. Test Statistic
-1.09 0
Z
.25 .30
p̂
HT - 51
III. Decision Rule Critical value approach: Compare the test statistic
with the critical values defined by significance level a,
usually a = 0.05.
We reject the null hypothesis, if the test statistic
z < –za = –z0.05 = –1.645,
–1.09
–1.645 0 Z
a = .05
Rejection
region
Left-sided Test
HT - 52
III. Decision Rule
p-value approach: Compare the probability of the
evidence or more extreme evidence to occur when
null hypothesis is true. If this probability is less than
the level of significance of the test, a, then we
reject the null hypothesis.
p-value = P(Z -1.09) = P(Z -1.09) = .1379
Z
–1.09 0
Left tail area .1379
Left-sided Test
Z-Table
HT - 53
IV. Draw conclusion
Since from either
critical value approach z = -1.09 > -za/2= -1.645
or p-value approach p-value = .1379 > a = .05 ,
we do not reject null hypothesis.
Therefore we conclude that there is no
sufficient evidence to support the alternative
hypothesis that the percentage of votes is
less than 30%.
HT - 54
Can we see data and then
make hypothesis?
1. Choose a test statistic, collect data, checking the assumption and compute the value of the statistic.
2. State hypotheses: H0 and HA.
3. Make decision rule based on level of significance(a).
4. Draw conclusion. (Reject null hypothesis or not)