Medical StatisticsMedical Statistics (full English(full English class)class)
Shaoqi Rao, PhD
School of Public Health
Sun Yat-Sen University
Slides adapted from Professor Fang Ji-Qian’s
Chapter 3Chapter 3
Sampling Error Sampling Error
and Confidence Intervaland Confidence Interval
For several samples For several samples from the same populationfrom the same population
Usually the sample means are not equal to the
population meanthe sample means are different one another
----sampling error
3.1 The Distribution of Sample Mean3.1 The Distribution of Sample Mean
3.1.1 Distribution of sample mean from a population 3.1.1 Distribution of sample mean from a population of normal distributionof normal distribution
Experiment 3.1 Sampling from a normal distribution. Assume the red cell counts of healthy males follow a
normal distribution 100 samples are drawn, The sample means are showed in the second column of
Table 3.1.
)5746.0,6602.4( 2N5n
Features of sample mean Features of sample mean as a random variableas a random variable
(1) Any of the sample means is not necessary equal to the population mean;
(2) The differences exist among the sample means;
(3) The distribution of sample means follows certain rule that more in center, less in two ends and symmetry around the center. (4) The range of variation for the sample mean is much narrower than that of the initial variable.
If the random samples with n individuals are drawn from a normal distribution , then the sample mean follows a normal distribution
(3.1)
),( 2N
, then the sample mean follows a normal
distribution
),(~ 2xNX
(3.1)
),(~ 2xNX
(5) The range of variation for the sample means tends to be narrow with the increase of sample sizes.
: Standard deviation of the initial variable
: Standard deviation of the sample mean
---- standard error of sample mean
or standard error.
For sample,
x
nx
n
SSx
3.1.2 Distribution of sample mean from a 3.1.2 Distribution of sample mean from a population with non-normal distribution population with non-normal distribution
Experiment 3.2 Sampling from positive skew distribution
(1) The distribution of sample means tends to be symmetric with the increase of sample size;
when n=30, it looks similar to normal distribution.
(2) The range of variation for the sample means also tends to be narrow with the increase of sample sizes.
Experiment 3.3 Sampling from an asymmetric hook-like distribution
For the population with a non-normal distribution,
although the distribution of sample means is not a
normal distribution, it will be similar to a normal
distribution when sample size is big (say,
approximately, we still have
)30n
),(~2
nNX
3.23.2 t t DistributionDistribution
3.2.1 Standard t deviate3.2.1 Standard t deviate
WhenWhen
),(~2
nNX
)1,0(~ NX
x
?)1,0(~ NS
X
x
W.S. Gosett (1908) explored its distribution
dist. ~ tS
X
x
1n
),(~ 2NX
3.2.2 The probability density 3.2.2 The probability density and critical values of t distribution and critical values of t distribution
The two-side probabilities and corresponding critical
values of t distribution are given in the Table 5 of the
Appendix 2.
For instance,
When degrees of freedom is 20, corresponding
to two-side probability 0.05, the critical value of t
distribution
Corresponding to one-side probability 0.05, the
critical value of t distribution
In general,
96.1086.220,2/05.0 t
64.1725.120,05.0 t
2/,2/ Zt Zt ,
3.3 The Confidence Interval 3.3 The Confidence Interval for Population Mean for Population Mean
of Normal Distributionof Normal Distribution),(~ 2NX
Therefore, 95% of the sample means meet the inequality (but not all)
For any sample, if we claim is located in such an interval, then in theory, we might be right about 95 times out of 100 times.
and are unknownA sample is drawn, and ,
X xS ?
dist. ~ tS
X
x
1n
,2/05.0,2/05.0 tS
Xt
x
xx StXStX 2/05.02/05.0
In general, given a random sample of the population,
if the sample size, sample mean and sample standard
deviation are denoted as ; , then
is called with confidence interval of the
population mean
: confidence level
: precision of the confidence interval
When sample size is big enough,
sxn and , nssx /),(: ,2/,2/ xx stxstx
)1(
)1(
xst
),(: 2/2/ xx sZxsZx
Example 3.1 Randomly select 20 cases from the patientswith certain kind of disease. The sample mean of blood sedimentation (mm/h) (血沉 ) is 9.15, sample standard deviation is 2.13. To estimate the 95% confidence interval and 99% confidence interval (Assume the blood sedimentation of this kind of disease follow a normal distribution).
Question: If both of higher confidence level and better precision are expected, What should we do?
20,13.2,15.9 nsx
15.8 and 15.1020
13.2093.215.919,2/05.019,2/05.0
n
stxstx x
87.7 and 51.1020
13.2861.215.919,2/01.019,2/01.0
n
stxstx x
3.4 Confidence Interval 3.4 Confidence Interval for the Difference for the Difference
between Two Population Meansbetween Two Population Means),(~ 2
11 NX ),(~ 222 NX
and , 21 unknown. Two samples with
The confidence interval for ?111 ,, sxn 222 ,, sxn
21
),(~1
2
11 nNX
),(~2
2
22 nNX
),(~2
2
1
2
2121 nnNXX
)1,0(~
)11
(
)()(
21
2
2121 N
nn
XX
Since is unknown, it could be replaced by , 2cS
2
)1()1(
21
222
2112
nn
SnSnSc 221 nn
?)1,0(~
)11
(
)()(
21
2
2121 N
nnS
XX
c
dist. ~
)11
(
)()(
21
2
2121 t
nnS
XX
c
221 nn
The )1( confidence interval of 21 is
])11
()(,)11
()[(21
2,2/21
21
2,2/21 nn
stxxnn
stxx cc
Example 3.2 Assume the red cell counts of healthy male
residents and healthy female residents of certain city
follow two normal distributions respectively
, ,95% CI for the difference between male and female?
15,20 21 nn 18.4,66.4 21 xx 45.0,47.0 21 ss
2209.047.0 221 s 2025.045.0 22
2 s
2131.021520
)45.0)(115()47.0)(120( 222
cs 3321520
042.230,2/05.0 t 021.240,2/05.0 t 034.2310
021.2042.2041.233,2/05.0
t
)15
1
20
1(2131.0034.2)18.466.4()
11()(
21
233,2/05.021
nnstxx c
16.0 and 80.0)1577.0(034.248.0
3.5 Confidence Intervals for 3.5 Confidence Intervals for Probability and the Difference Probability and the Difference
between Two Probabilitiesbetween Two Probabilities 3.5.1 Confidence interval for population probability3.5.1 Confidence interval for population probability
When sample size is smallWhen sample size is small, given , given XX and and nn the 95% and the 95% and
99% confidence interval of can be obtained from Table99% confidence interval of can be obtained from Table
3 of Appendix 2.3 of Appendix 2. When sample size is big enoughWhen sample size is big enough, can be estimated , can be estimated
by normal approximation by normal approximation
),(~ nBX n
XP
)
)1(,(~
nNP
))1(
,)1(
(: 2/2/ n
ppZp
n
ppZp
Comparing to the confidence Comparing to the confidence interval ofinterval of
),(: 2/2/ xx sZxsZx
))1(
,)1(
(: 2/2/ n
ppZp
n
ppZp
xp
xsn
pp
)1(
3.5.2 Confidence intervals 3.5.2 Confidence intervals for two population probabilitiesfor two population probabilities
),(~ 111 nBX ),(~ 222 nBX
1
11 n
XP
2
22 n
XP
))1(
,(~1
1111 n
NP
))1(
,(~2
2222 n
NP
2
22
1
112/2121
)1()1()(:
n
pp
n
ppZpp
Comparing to the confidence Comparing to the confidence interval ofinterval of 21
)11
()(:21
22/2121 nn
sZxx c
2
22
1
112/2121
)1()1()(:
n
pp
n
ppZpp
2121 2121 xxpp
)11
()1()1(
21
2
2
22
1
11
nns
n
pp
n
ppc
Example 3.4 Comparison between two drugs.
3.6 The Sample Size 3.6 The Sample Size for Estimation of Confidence Intervalfor Estimation of Confidence Interval
3.6.1 Sample size for confidence interval of the me3.6.1 Sample size for confidence interval of the mean of normal populationan of normal population
3.6.1 Sample size for confidence interval of the me3.6.1 Sample size for confidence interval of the mean of normal populationan of normal population
Given (1) the confidence level (1-)
(2) the half width of confidence interval δ
(3) the estimate of the standard deviation s
Let
Replace with , approximately
n
st ,2/
2/t 2/Z
n
sZ 2/
22/ )( sZ
n
Example 3.5 It is learnt from a pilot study that the
standard deviation of a biochemical index is about 10
units. In order to have a 95% confidence interval of
the population mean, of which the half of the width
equals to 2.5 units. What is the sample size needed?
Since s=10, δ=2.5, ≈2, 2/Z
64)5.2
102()( 222/
sZ
n
3.6.2 Sample size for confidence interval of the 3.6.2 Sample size for confidence interval of the probability of binomial population probability of binomial population
Given (1) the confidence level (1-),
(2) the half width of confidence interval δ
(3) the estimate of frequency p
Let
This formula shows, the large sample size will be needed if
the population probability is close to 0.5 (big variation).
n
ppZ
)1(
)1()( 2 ppZ
n
Example 3.6 It is learnt from a pilot study that the
probability of relapse in one year for a disease is about
10%. Now a survey is planed to further estimate the 95%
confidence interval for the probability of relapse in one
year, of which the half width is required with 3%. What
is the sample size needed?
Since p=10%, ≈2, 2/Z
400)1.01(1.0)03.0
2( 2 n
)1()( 2 ppZ
n
SummarySummary1. Sampling error The sample means are not
equal to the population mean; the sample means are different one another.
2. Distribution of sample mean If the random samples with n individuals are drawn from a normal distribution, then the sample mean follows a normal distribution.
If the random samples with n individuals are drawn from a non-normal distribution, although the distribution of sample means is not a normal distribution, it will be approximate to a normal distribution when sample size is big.
3. Confidence interval
When When
Given a random sample of
if the sample size, sample mean and sample
standard deviation are denoted as ,
then the confidence interval of the
population mean is
),(~ 2NX
dist. ~ tS
X
x
1n
sxn and ,
nssx /
),(: ,2/,2/ xx stxstx
),(~ 2NX
Given a random sample of ,
then the confidence interval of the population
probability is
Given two random samples of
and then the confidence interval of
the difference is
),(~ nBX n
XP
))1(
,)1(
(: 2/2/ n
ppZp
n
ppZp
),(~ 222 NX
),(~ 211 NX
)11
()(:21
22/2121 nn
sZxx c
Given two random samples of and then the confidence interval of
the difference is
4. Sample size Sample size for confidence interval of the mean o
f normal population
Sample size for confidence interval of the probability of binomial population
),(~ 222 nBX ),(~ 111 nBX
2
22
1
112/2121
)1()1()(:
n
pp
n
ppZpp
22/ )( sZ
n
)1()( 2 ppZ
n
Thank