Top Banner
Introduction to Introduction to Biostatistics (ZJU Biostatistics (ZJU 2008) 2008) Wenjiang Fu, Ph.D Wenjiang Fu, Ph.D Associate Professor Associate Professor Division of Biostatistics, Division of Biostatistics, Department of Epidemiology Department of Epidemiology Michigan State University Michigan State University East Lansing, Michigan 48824, USA East Lansing, Michigan 48824, USA Email: Email: [email protected] [email protected] www: www: http://www.msu.edu/~fuw http://www.msu.edu/~fuw
30

Introduction to Biostatistics (ZJU 2008)

Jan 04, 2016

Download

Documents

curran-horn

Introduction to Biostatistics (ZJU 2008). Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University East Lansing, Michigan 48824, USA Email: [email protected] www: http://www.msu.edu/~fuw. Homework 1. correction: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Biostatistics (ZJU 2008)

Introduction to Introduction to Biostatistics (ZJU Biostatistics (ZJU

2008)2008)Wenjiang Fu, Ph.DWenjiang Fu, Ph.DAssociate ProfessorAssociate Professor

Division of Biostatistics, Department of Division of Biostatistics, Department of Epidemiology Epidemiology

Michigan State UniversityMichigan State UniversityEast Lansing, Michigan 48824, USAEast Lansing, Michigan 48824, USA

Email: Email: [email protected]@msu.eduwww: www: http://www.msu.edu/~fuwhttp://www.msu.edu/~fuw

Page 2: Introduction to Biostatistics (ZJU 2008)

HHomework 1omework 1

correction:correction: 2. Referring to Table 3.42. Referring to Table 3.4

The three people are:The three people are:

77 years old man, 76 years old 77 years old man, 76 years old woman and 82 years old woman.woman and 82 years old woman.

Page 3: Introduction to Biostatistics (ZJU 2008)

Parameter estimationParameter estimationWhat we have learned so far:What we have learned so far:

Random variables.Random variables. Distributions of random variables Distributions of random variables

(Bin, Pois, Gaussian).(Bin, Pois, Gaussian). Calculation of probability based on Calculation of probability based on

distributions including approximation distributions including approximation methods.methods.

Application of probability theory Application of probability theory (small probability events).(small probability events).

All the above are based on known All the above are based on known distribution: know types of distribution distribution: know types of distribution and known parameters of distribution.and known parameters of distribution.

Page 4: Introduction to Biostatistics (ZJU 2008)

Parameter estimationParameter estimationDistributions of random variables (Bin, Pois, Distributions of random variables (Bin, Pois, Gaussian).Gaussian).

Calculation of probability based on distributions Calculation of probability based on distributions including approximation methods.including approximation methods.

Examples:Examples: D.B.P. D.B.P. NN (80, 12.5 (80, 12.522))# cases of cancer # cases of cancer Pois (Pois () , ) , = 6 = 6# lymphocytes# lymphocytes B(100, .34)B(100, .34)

Calculate probability based on assumptions of the Calculate probability based on assumptions of the distribution and the parameters of the distribution.distribution and the parameters of the distribution.

Application of probability theory (small probability Application of probability theory (small probability events).events).

All the above are based on known distribution: know All the above are based on known distribution: know types of distribution and known parameters of types of distribution and known parameters of distribution. distribution.

Where do we find the info for parameters?Where do we find the info for parameters? The only answer is from the data (or samples)!The only answer is from the data (or samples)!

Page 5: Introduction to Biostatistics (ZJU 2008)

Parameter estimationParameter estimation

The only answer is from the data The only answer is from the data (or samples).(or samples).

Data setData set Estimation of Estimation of parametersparameters

Hypothesis testingHypothesis testing

Statistical inferenceStatistical inference Estimation: Estimation: point estimationpoint estimation

Interval estimation: Interval estimation: CI.CI.

Page 6: Introduction to Biostatistics (ZJU 2008)

Relation between population Relation between population and sampleand sample Random sample Random sample --- selection of some members of population --- selection of some members of population

such that each member is independently chosen and has a such that each member is independently chosen and has a known non-zero probability.known non-zero probability.Example 1: 10 birth weights Example 1: 10 birth weights xx1, …, 1, …, xx10 is a sample from the 10 is a sample from the entire population of birth weight.entire population of birth weight.Example 2: WBC of 30 students independently selected from Example 2: WBC of 30 students independently selected from MSU MSU xx1, …, 1, …, xx30 is a sample from the population of WBC of all 30 is a sample from the population of WBC of all MSU students.MSU students.

Simple random sampleSimple random sample --- a random sample in which each --- a random sample in which each member has the sample probability of being selected. A member has the sample probability of being selected. A random sample is referred to a simple random sample. random sample is referred to a simple random sample. Some non-simple random samples: cluster sampling:Some non-simple random samples: cluster sampling:Within state choose clusters (geographic locations, regions, Within state choose clusters (geographic locations, regions, sparse populations)sparse populations)

Random samples within selected clusterRandom samples within selected clusterThe reference, target or study population is the group we The reference, target or study population is the group we wish to study (to make inference). The random sample is wish to study (to make inference). The random sample is selected from the study population (hoped to be a good selected from the study population (hoped to be a good representation of the study population to draw conclusion representation of the study population to draw conclusion from).from).

Page 7: Introduction to Biostatistics (ZJU 2008)

Estimation of the MeanEstimation of the Mean Estimation of the mean of a distribution Estimation of the mean of a distribution = =

E(X) E(X) A random sample A random sample xx11, …, , …, xxnn from the distribution of from the distribution of

X.X. The natural estimate of The natural estimate of is the sample mean is the sample mean

Let Let xx11, …, , …, xxnn be a random sample drawn from the be a random sample drawn from the same population with mean same population with mean . Then the sample . Then the sample mean satisfies mean satisfies

E(x) = E(x) = , an unbiased estimator., an unbiased estimator. An estimator An estimator ee of parameter of parameter is unbiased if E( is unbiased if E(ee) )

= = .. Then we know x is an unbiased estimator of Then we know x is an unbiased estimator of ..

1 1 1

1 1 1( ) ( ) ( )

n n n

i ii i i

E x E x E xn n n

Page 8: Introduction to Biostatistics (ZJU 2008)

Estimation of the MeanEstimation of the Mean For normal distribution For normal distribution NN ( (, , 22), x is the “best” ), x is the “best”

unbiased estimator – having the smallest variance unbiased estimator – having the smallest variance and no bias.and no bias.

Standard Error of the MeanStandard Error of the Meanxx11, …, , …, xxnn a random sample from a underlying a random sample from a underlying distribution with mean distribution with mean and variance and variance 22. Then. Then

Standard error of the mean (SEM) is the standard Standard error of the mean (SEM) is the standard deviation of the sample mean x, which is equal to deviation of the sample mean x, which is equal to Standard error of the mean is estimated Standard error of the mean is estimated by . by . SS22 sample variance. sample variance.

I.I.D. (or i.i.d.)I.I.D. (or i.i.d.) – independently identically distributed – independently identically distributedxx11, …, , …, xxnn iid r.v.’s – iid r.v.’s – xx11, …, , …, xxnn are indep r.v. with the are indep r.v. with the same distribution (same mean, variance, quantiles, same distribution (same mean, variance, quantiles, etc.).etc.).

A random sample from a population is iid.A random sample from a population is iid.

22

1 1

1 1 1var( ) var( ) var( )

n n

i ii i

x x xn n n

/ n/S n

Page 9: Introduction to Biostatistics (ZJU 2008)

Estimation of the Estimation of the standard errorstandard error

Estimation of standard error (s.e.)Estimation of standard error (s.e.)se (x) = .se (x) = .n – n – sample size, usually known. sample size, usually known. ---- may be known or unknown. may be known or unknown.

If If 2 2 is unknown, use sample variance is unknown, use sample variance

Use S to estimate Use S to estimate and use to and use to estimate estimate se (x).se (x).

/ n

( )x xni

2

1

2

2( )1

iSx xn

/S n

Page 10: Introduction to Biostatistics (ZJU 2008)

Central Limit TheoremCentral Limit Theorem NotationNotation ^ to be an estimate of certain parameter^ to be an estimate of certain parameter

-- the estimate of the mean -- the estimate of the mean , , -- the estimate of -- the estimate of the variance the variance

Central Limit Theorem for normal rv.Central Limit Theorem for normal rv.If If xx11, …, , …, xxnn are iid then are iid then . .

In fact, for large n, even for iid r.v.'s In fact, for large n, even for iid r.v.'s xx11, …, , …, xxnn not not normally distributed, normally distributed, the central limit theorem still the central limit theorem still holds.holds.

Central limit theoremCentral limit theoremIf If xx11, …, , …, xxnn are indep r.v.'s with the same mean are indep r.v.'s with the same mean and and variance . Then for large variance . Then for large n n

the mean is approximately normally distributed with the mean is approximately normally distributed with mean mean and variance /and variance /nn . .

Point estimation:Point estimation: se(x)se(x) Interval estimationInterval estimation confidence interval (C.I.) confidence interval (C.I.)

estimate of precisionestimate of precision

2

21)~ ( ,

nx N

x

2)( ,N

2 21)~ ( ,

nx N

2^

Page 11: Introduction to Biostatistics (ZJU 2008)

Interval EstimationInterval Estimation Interval estimation – known variance Interval estimation – known variance

Assume population follows normal distr. Assume population follows normal distr. N N ((, , 22). ). A random sample A random sample xx11, …, , …, xxnn has mean x has mean x NN ( (, , 22//n n ))

If If and and 22 are known, then we have are known, then we have

or equivalently Pr (or equivalently Pr ( - - 1.96 1.96 / < x < / < x < + + 1.96 1.96 // ) = .95) = .95or equivalently or equivalently Pr (x - 1.96 Pr (x - 1.96 / < / < << x + 1.96 x + 1.96 / ) / ) = .95= .95

DefinitionDefinition (Confidence Interval) (Confidence Interval)A 95% confidence interval (C.I.) for A 95% confidence interval (C.I.) for when when 22 is is known is defined by the intervalknown is defined by the interval( x - 1.96 ( x - 1.96 / ,/ , x + 1.96 x + 1.96 / )/ )

InterpretationInterpretation: we are 95% confident that the : we are 95% confident that the population mean population mean is in the CI ( x - 1.96 is in the CI ( x - 1.96 / ,/ , x + x + 1.96 1.96 / )./ ).

n

1.96Pr( 1.96 ) .95x

n n

n

n

n n

n

Page 12: Introduction to Biostatistics (ZJU 2008)

Confidence IntervalConfidence Interval

Note CI (x-1.96Note CI (x-1.96/ ,/ , x+1.96 x+1.96/ ) is / ) is randomrandomsince it depends on x , which is since it depends on x , which is random and depends on the random random and depends on the random sample. If many samples are drawn sample. If many samples are drawn from the population, and the CI is from the population, and the CI is calculated for each sample, then over calculated for each sample, then over the collection of all 95% CIs that could the collection of all 95% CIs that could be constructed from the repeated be constructed from the repeated random samples of size random samples of size nn, 95% will , 95% will contain the parameter contain the parameter of population.of population.

nn

Page 13: Introduction to Biostatistics (ZJU 2008)

95% Confidence Intervals95% Confidence Intervals

Page 14: Introduction to Biostatistics (ZJU 2008)

95% Confidence Intervals95% Confidence Intervals

## plot standard normal ## plot standard normal N(0,1)N(0,1)

## density and construct ## density and construct 95% 95%

## CI for random samples ## CI for random samples ofof

## size 20 from N(0,1)## size 20 from N(0,1)

#### Plot of density N(0,1)#### Plot of density N(0,1)a <- c(-100:100)/25a <- c(-100:100)/25plot (a, dnorm(a), type = ‘l’ ,plot (a, dnorm(a), type = ‘l’ ,

ylim = c(-1,.5) )ylim = c(-1,.5) )abline (v = 0, col=2)abline (v = 0, col=2)

## Construct 95% CI for a## Construct 95% CI for a## random sample of size ## random sample of size

2020## and repeat for 1000 ## and repeat for 1000

timestimesB <- 1000B <- 1000size <- 20size <- 20CImat <- matrix(NA, B, 3)CImat <- matrix(NA, B, 3)

for (i in 1:B) {for (i in 1:B) {samp <- rnorm(size, mean=0, sd = 1)samp <- rnorm(size, mean=0, sd = 1)samp.mean <- mean(samp)samp.mean <- mean(samp)

## normal N(0,1) distribution with known ## normal N(0,1) distribution with known variance 1variance 1CImat[i,1:2] <- c(samp.mean-CImat[i,1:2] <- c(samp.mean-1.96*sd(samp) /sqrt (size), 1.96*sd(samp) /sqrt (size), samp.mean+1.96*sd(samp)/sqrt(size) )samp.mean+1.96*sd(samp)/sqrt(size) )

## normal distribution with unknown variance## normal distribution with unknown variance#### CImat[i,1:2] <- c (samp.mean –qt CImat[i,1:2] <- c (samp.mean –qt

(p=.975,df=size-1)*sd (samp) /sqrt(size), (p=.975,df=size-1)*sd (samp) /sqrt(size), samp.mean+qt (p=.975, df=size-samp.mean+qt (p=.975, df=size-1)*sd(samp)/sqrt(size))1)*sd(samp)/sqrt(size))

CImat[i,3] <- 1*(CImat[i,1]*CImat[i,2]<=0)CImat[i,3] <- 1*(CImat[i,1]*CImat[i,2]<=0)## plot a segment for the CI at a random ## plot a segment for the CI at a random

height with different colorsheight with different colorslines ( CImat[i,1:2], rep(runif(1,min=-lines ( CImat[i,1:2], rep(runif(1,min=-1,max=0), col=5*(i/5-ceiling(i/5))+1,2) )1,max=0), col=5*(i/5-ceiling(i/5))+1,2) )

}}sum(CImat[,3]) / Bsum(CImat[,3]) / B

Page 15: Introduction to Biostatistics (ZJU 2008)

Confidence Intervals of Confidence Intervals of MeanMean Length of CI:Length of CI: the larger the CI, the less precise the the larger the CI, the less precise the

estimate.estimate.CI – a safeguard: not to make mistakes in estimation.CI – a safeguard: not to make mistakes in estimation.Large CI – not to make mistakes frequently, but useless.Large CI – not to make mistakes frequently, but useless.

Example. SBP If Example. SBP If 1122 = 100, = 100, 22

22 = 400, = 400, nn = 9, = 9, then 95% CI = ?then 95% CI = ?Sample 1. xSample 1. x11 1.96 1.96 11/√9 =150 /√9 =150 1.96x 10/3=150 1.96x 10/3=150 6.53 = 6.53 = (143.47, 156.53)(143.47, 156.53)Sample 2. xSample 2. x22 1.96 1.96 22/√9 =150 /√9 =150 1.96x 20/3=150 1.96x 20/3=150 13.07 = 13.07 = (136.93, 163.07)(136.93, 163.07)

CI at any CI at any - level - level : Factors affecting the length of CI : Factors affecting the length of CI (width) (width)

1). 1). nn – sample size: – sample size: nn increases, length of CI decreases: increases, length of CI decreases: narrower;narrower;

2). 2). -- standard deviation of population: -- standard deviation of population: increases, length increases, length of of CI increases (wider);CI increases (wider);

3). 3). -- (1- -- (1-) level of confidence: (1- ) level of confidence: (1- ) increases, length of CI ) increases, length of CI increases (wider).increases (wider).

1 2 150x x

Page 16: Introduction to Biostatistics (ZJU 2008)

Confidence Intervals of Confidence Intervals of MeanMean

CI at any CI at any - level - level : Using percentile : Using percentile ZZuu, Pr (X , Pr (X ZZuu ) = ) = u u for X for X N (0, 1) N (0, 1)Pr (X Pr (X - -ZZ1-1-/2/2) = Pr (X ) = Pr (X ZZ1-1-/2/2)=)=/2 left tail and right tail /2 left tail and right tail prob.prob. Tail probability = Pr (|X| Tail probability = Pr (|X| ZZ1-1-/2/2 ) = ) = (1-(1-) x 100% CI for ) x 100% CI for is (x - is (x - ZZ1-1-/2/2 / ,/ , x + x + ZZ1-1-/2/2 / )/ ) can be any level. the most frequently used can be any level. the most frequently used are .01, .05, .1are .01, .05, .1

Interval estimation - Interval estimation - 22 unknown. unknown. Using estimate Using estimate SS22 to estimate to estimate 22 for CI. for CI.For For 22 known: known: NN (0, 1) (0, 1) For For 22 unknown: unknown: ttn-1n-1

t-t- distribution (Student's t-distribution) (W. Gossett) distribution (Student's t-distribution) (W. Gossett) ttn-1n-1, a student's , a student's tt- distribution with (- distribution with (n-n-1) degrees of freedom 1) degrees of freedom (df)(df)Percentile of Percentile of tt- distribution: - distribution: ttd,ud,u of (100x u) %of (100x u) %or Pr (or Pr (ttdd ttd, ud, u ) = u ) = u t-t-distribution table.distribution table.

n n

/xn

/xn

Page 17: Introduction to Biostatistics (ZJU 2008)

Confidence Intervals of Confidence Intervals of MeanMean

Note that Note that t t dd NN (0, 1) for very large (0, 1) for very large dd : :

When d < 30, we see the difference When d < 30, we see the difference between between ttd, ud, u and and ZZ u u..

When When dd > 30, the difference is small. > 30, the difference is small. CI of CI of with with 22 unknown: unknown:

Estimate Estimate 22 by by SS 2 2 and change Z and change Zuu to to ttn-n-11,u,u

to follow similar procedure for to follow similar procedure for 22 known.known.

(1- (1- )x 100% C.I. for )x 100% C.I. for when when 22 is is unknown is unknown is

( x - ( x - ttn-n-11,,1-1-/2/2 S S / ,/ , x x + + ttn-n-11,,1-1-/2/2 S S / )/ )

n n

Page 18: Introduction to Biostatistics (ZJU 2008)

Confidence Interval of Confidence Interval of MeanMean

Example. Table 6.9, Example. Table 6.9, 27 rats with LVEF27 rats with LVEF

It is known that It is known that xxii = 6.05, = 6.05, xxii22 = 1.522, = 1.522,

Assume normal distr. Calculate mean, Assume normal distr. Calculate mean, SS22, , s.e., 95% CIs.e., 95% CI

x = x = xxii / /n = n = 6.05/27 = .2246.05/27 = .224

SS22 = .0064 = .0064 S S = .08= .08 s.e.(x) = s.e.(x) = SS/√27 /√27 = .0154= .0154

95% CI : x95% CI : x tt26, .97526, .975 SS/ √27/ √27

= .224 = .224 2.056 x .0154 2.056 x .0154

= .224 = .224 .0317 = (.1923, .2557) .0317 = (.1923, .2557)

Page 19: Introduction to Biostatistics (ZJU 2008)

Estimation of VarianceEstimation of Variance Point estimation Point estimation

natural estimate: sample variance natural estimate: sample variance E(E(SS22) = ) = 22 ? ?

Theorem. If Theorem. If xx11, …, , …, xxnn is a random sample from is a random sample from population with mean population with mean and variance and variance 22, then E(, then E(SS

22) = ) = 22

i.e. i.e. SS 22 is an unbiased estimator of is an unbiased estimator of 22.. If we use denominator If we use denominator nn rather than ( rather than (nn-1) in -1) in SS 2 2 to to

estimate estimate 22, , E{E{ 22} = E{ [(} = E{ [(nn-1)/-1)/nn]] S S2 2 } = [(} = [(nn-1)/-1)/nn] E {] E {SS22} }

= [(= [(nn-1)/-1)/nn] ] 22 < < 22

i.e. the average of the squared distance from the i.e. the average of the squared distance from the sample mean is a biased estimator of sample mean is a biased estimator of 22. .

2 2

1

1 ( )1

n

ii

S x xn

~

Page 20: Introduction to Biostatistics (ZJU 2008)

Interval estimation of Interval estimation of variancevariance

Chi-squares distributionChi-squares distribution

If If GG =X =X1122 + … + X + … + Xnn

22 , where , where XX11, …, , …, XXnn iid iid NN (0, 1), then G is said to follow a Chi-squares (0, 1), then G is said to follow a Chi-squares distribution with distribution with nn degrees of freedom. degrees of freedom.

Denote Denote G G 22nn , it only takes positive values , it only takes positive values

with mean E (with mean E (22nn) = ) = nn. .

uu-th percentile of -th percentile of 22nn, denoted by , denoted by 22

n, un, u, satisfies , satisfies

Pr (Pr (22nn < < 22

n, un, u) = u, can be obtained from ) = u, can be obtained from 22nn table. table.

Distribution of Distribution of SS22

xx11, …, , …, xxnn an iid random sample from an iid random sample from NN ( (, , 22))ThenThen

or equivalently or equivalently

22

12 22

1

( 1) 1 ( ) n

n

ii

n S x x

22

211 nS

n

Page 21: Introduction to Biostatistics (ZJU 2008)

Confidence Interval of Confidence Interval of VarianceVariance

Similar to the derivation of the CI for Similar to the derivation of the CI for , , we have we have

(1-(1-) x 100% CI for ) x 100% CI for 22 is is

Example S.B.P. Example S.B.P. NN ( (, , 2)2)

sample 1. sample 1. 1 = 1501 = 150 SS1122 = 250 = 250 n = n = 55

sample 2.sample 2. 2 = 1502 = 150 SS2222 = 1700 = 1700 n = n = 55

2

2 2 2 21, /2 1,1 /2Pr 11 1

n nSn n

2 22

2 21,1 /2 1, /2

Pr 1( 1) ( 1)

n n

n S n S

2 2

2 21,1 /2 1, /2

,( 1) ( 1)

n n

n S n S

Page 22: Introduction to Biostatistics (ZJU 2008)

95% CI for 95% CI for σσ22 95% CI : (1-95% CI : (1-) = .95, ) = .95, = .05= .05

95% CI = [ (95% CI = [ (nn-1) -1) SS22//χχ224,.9754,.975, (, (nn-1)-1)SS22/ / χχ22

4,.0254,.025 ] ]

sample 1:sample 1: [4x250 / 11.14, [4x250 / 11.14, 4x250 / .484]4x250 / .484]

= [89.77, 2066.12]= [89.77, 2066.12]

sample 2: [4x1700 / 11.14, sample 2: [4x1700 / 11.14, 4x1700 / .484]4x1700 / .484]

= [610.4, 14049.6]= [610.4, 14049.6]

2 24,.9755 1,1 /2 11.14 2 2

4,.0255 1, /2 .484

Page 23: Introduction to Biostatistics (ZJU 2008)

Estimation for Bin(n, p)Estimation for Bin(n, p) Example. A random sample of 1000 adults. Among Example. A random sample of 1000 adults. Among

them 30 had heart attack(s). How to estimate them 30 had heart attack(s). How to estimate pp ? ? p p = Pr (having heart attack(s) before) = relative freq.= Pr (having heart attack(s) before) = relative freq.

= 30 / 1000 = .03= 30 / 1000 = .03 Q: Is this a good estimator?Q: Is this a good estimator? A: XA: X B ( B (n, pn, p). Let ). Let XX11, …, , …, XXnn be indep. Bernoulli trials. be indep. Bernoulli trials.

Pr (Pr (XXii=1) = =1) = pp, and Pr (, and Pr (XXii= 0) = 1- = 0) = 1- pp. 1 ≤ i ≤ n. 1 ≤ i ≤ n

Then Then X = X = ∑ ∑11nnXXii and and X/n = X/n = ∑∑11

nnXXi i /n/n = X = X

--- sample mean with expected value E(X--- sample mean with expected value E(Xii) = ) = pp

E (X) = E (∑E (X) = E (∑XXii /n /n) = ) = pp So, So, X= X/n X= X/n is an unbiased estimator for is an unbiased estimator for pp. or. or p = X/n, p = X/n, s.e.(p) = ? s.e.(p) = ?^ ^

Page 24: Introduction to Biostatistics (ZJU 2008)

Estimation for Bin(n, p)Estimation for Bin(n, p) var (var (pp) = var(X/n)= var(X)/) = var(X/n)= var(X)/nn22 = =

npqnpq//nn22 = =pq/npq/n

s.e. (s.e. (pp) = with) = with qq = (1- = (1- pp)) Estimate s.e. (Estimate s.e. (pp): replace ): replace pp with with pp s.e. (s.e. (pp) = (pq/n)) = (pq/n)1/21/2

Example: n = 1000, X = 30.Example: n = 1000, X = 30.

p = X/p = X/nn =.03 =.03 s.e. (s.e. (pp) = .00539) = .00539

^

^ /pq n

^ ^

/pq n

^

^

^

Page 25: Introduction to Biostatistics (ZJU 2008)

Interval estimation of Interval estimation of Binomial Binomial pp Normal theory methodNormal theory method

XX B ( B (n, pn, p) then ) then X = X = ∑∑11nnXXi i with with

indep. Bernoulli trialsindep. Bernoulli trials X X11, …, , …, XXnn p p = X/n, sample mean of Bernoulli = X/n, sample mean of Bernoulli

trials.trials. By central limit theorem (CLT), By central limit theorem (CLT), pp N N ((p, pq/np, pq/n) then use normal ) then use normal

distributiondistribution Condition: Condition: npqnpq 5. 5.

^

^

Pr 1.96 1.96 .95ˆ

/p ppq n

Page 26: Introduction to Biostatistics (ZJU 2008)

Confidence Interval for Confidence Interval for pp 95% CI for 95% CI for pp with normal theory with normal theory

((npqnpq≥5) is ≥5) is

(p-1.96(pq/n)(p-1.96(pq/n)1/2 1/2 , p+1.96(pq/n), p+1.96(pq/n)1/2 1/2 )) (1-(1-)x 100% CI for )x 100% CI for pp with normal theory with normal theory

((npqnpq≥5) is≥5) is

(p-z(p-z1- 1- /2 /2 (pq/n)(pq/n)1/2 1/2 , p+z, p+z1- 1- /2/2 (pq/n) (pq/n)1/2 1/2 )) Example. Eosinophils : Example. Eosinophils : p p = 2/100 = 2/100

= .02 , = .02 , npnp(1-(1-pp) = 100x.02x.98=1.96 < 5) = 100x.02x.98=1.96 < 5 Normal approximation does not work!Normal approximation does not work! Exact method: use Table 7 for 95% CI. Exact method: use Table 7 for 95% CI.

^

^^

^^

^^^ ^^

^^^^ ^^

Page 27: Introduction to Biostatistics (ZJU 2008)

One-sided CIOne-sided CI Example: hypertensive treatment to lower BP Example: hypertensive treatment to lower BP

Comparing Standard v.s. new drug. Comparing Standard v.s. new drug. Suppose out of 100 hypertensives, new drug brings Suppose out of 100 hypertensives, new drug brings

40 subjects’ BP down to normal while the standard 40 subjects’ BP down to normal while the standard has 30% efficacy. has 30% efficacy. Q: 1). Is the new drug different from the standard ?Q: 1). Is the new drug different from the standard ? 2). Is the new drug better than the standard?2). Is the new drug better than the standard?

A: 1). Two sided. Can be better or worse if A: 1). Two sided. Can be better or worse if different.different. 2). One sided. Can be better or no better.2). One sided. Can be better or no better.

Upper one-sided (1-Upper one-sided (1-)x 100% CI for )x 100% CI for p p of B(of B(n; pn; p))

p > p - Zp > p - Z(1-(1-)) (pq/n) (pq/n)1/21/2 for npq >= 5 for npq >= 5Lower one-sided (1-Lower one-sided (1-)x 100% CI for )x 100% CI for p p of B(of B(n;pn;p))

p < p +Zp < p +Z(1-(1-)) (pq/n) (pq/n)1/21/2 for npq >= 5 for npq >= 5

^ ^ ^

^ ^ ^

Page 28: Introduction to Biostatistics (ZJU 2008)

CI: one-sided vs two-CI: one-sided vs two-sidedsided Example. Hypertension studyExample. Hypertension study

100 people receive drug for treatment on high 100 people receive drug for treatment on high BP. 20 of them got BP lowered by the drug. If by BP. 20 of them got BP lowered by the drug. If by reference, BP is also lowered by placebo on 10% reference, BP is also lowered by placebo on 10% people. Q1: any drug effect? Q2: drug better than people. Q1: any drug effect? Q2: drug better than placebo?placebo?

A: Pr (lowering BP in drug group) = 20/100 = .2 > .1 of A: Pr (lowering BP in drug group) = 20/100 = .2 > .1 of PlaceboPlaceboUseUse p p = .2 = .2 to calculate to calculate npnp(1-(1-pp) = 100x.2x.8 = 16 > 5 ) = 100x.2x.8 = 16 > 5 Normal approximation valid:Normal approximation valid: pp N N ((p, pq/np, pq/n) . 95% CI of ) . 95% CI of pp is is pp 1.96 (npq) 1.96 (npq)1/21/2 = .2 = .2 1.96 x .04 = .2 1.96 x .04 = .2 .0784 = (.1216, .0784 = (.1216, .2784).2784)

Since Since p p = .1 for placebo and .1 is not in the 95% CI for = .1 for placebo and .1 is not in the 95% CI for pp, , i.e. we are 95% confident that the placebo effect (i.e. we are 95% confident that the placebo effect (p=p=.1) is .1) is different from the drug effect. different from the drug effect.

^

^ ^

^ ^ ^

Page 29: Introduction to Biostatistics (ZJU 2008)

CI: one-sided vs two-CI: one-sided vs two-sidedsided Example. Hypertension studyExample. Hypertension study

100 people receive drug for treatment on high 100 people receive drug for treatment on high BP. 20 of them got BP lowered by the drug. If by BP. 20 of them got BP lowered by the drug. If by reference, BP is also lowered by placebo on 10% reference, BP is also lowered by placebo on 10% people. Q1: any drug effect? Q2: drug better people. Q1: any drug effect? Q2: drug better than placebo?than placebo?

Q2. Is the drug better than placebo: Q2. Is the drug better than placebo: p > .p > .11A: One-sided 95% CI for A: One-sided 95% CI for p p is is

p > p > pp - Z - Z1-.051-.05 (pq/n) (pq/n)1/2 1/2 = .2 – 1.645 x .04 = .2 = .2 – 1.645 x .04 = .2 - .0658 = .1342 - .0658 = .1342

One-sided 95% CI: (0.1342, +∞)One-sided 95% CI: (0.1342, +∞)Compare with two-sided (.1216, .2784)Compare with two-sided (.1216, .2784)

^ ^

0.1216

0.2784

0.1342

Page 30: Introduction to Biostatistics (ZJU 2008)

Estimation for Poisson Estimation for Poisson DistributionDistribution

Pois (Pois () . ) . = = t, t, -- -- intensityintensity Estimator Estimator = = / t, / t, estimated by estimated by

X/tX/t Instead of estimating the mean, Instead of estimating the mean,

estimate the intensity.estimate the intensity.

where t can be area, time duration, where t can be area, time duration, etc. etc.