Top Banner
Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM
53

Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Jan 19, 2016

Download

Documents

Fay Lane
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Biostatistics

Lecture 5 (3/23 & 3/24/2015)

Last modified 3/24/2015 1:14:21 PM

Page 2: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Chapter 6 Probability and Diagnostic Tests - II

Page 3: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Outline

• 6.1 Operations on Events and Probability• 6.2 Conditional Probability• 6.3 Bayes’ Theorem• 6.4 Diagnostic Tests

– 6.4.1 Sensitivity and Specificity

– 6.4.2 Application of Bayes’ Theorem

– 6.4.3 ROC Curves

– 6.4.4 Prevalence evaluation

• 6.5 The Relative Risk (RR) and the Odds Ratio (OR)

Page 4: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

6.4.4 Prevalence evaluation

• Prevalence or prevalence proportion, in epidemiology, is the proportion of a population found to have a condition (typically a disease or a risk factor such as smoking or seat-belt use).

• It is usually expressed as a fraction, as a percentage or as the number of cases per 10,000 or 100,000 people. 

Page 5: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Example #1

• A program was conducted to screen HIV infections in mothers. (The purpose is to know whether a mother is infected or not.)

• Since maternal antibodies cross the placenta, the presence of antibodies in an infant signals infection in the mother.

• Because the tests were performed anonymously, no verification of the results is possible. (No mother is tested!!!)

Page 6: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Defining various events

• H : the event that a mother is infected with HIV.

• HC : the event that a mother is NOT infected with HIV.

• n : total number of infants tested• n+ : number of infants with positive results• T+ : the event for a positive test result for

an infant • T- : the event for a negative test result for

an infant

Page 7: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.
Page 8: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Taking Manhattan for example

• n = 50,364 infants were tested and n+ = 799 were positive, that is:

• In other words, P(T+) = 0.0159 (from infants)

• We want to know P(H) : the prevalence of mother infection.

• Is it true P(Is it true P(HH) = P() = P(TT++) = 0.0159?) = 0.0159?

Page 9: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

A test is not perfect…

• If the screen tests were perfect, then P(H) = P(T+) = 0.0159.

• However, However, TT++ can contain both true can contain both true positive and false positive cases.positive and false positive cases.

• Similarly, Similarly, TT-- can contain both true can contain both true negative and false negative too.negative and false negative too.

• Note the “screening tests” or Note the “screening tests” or “cases” we meant here are for “cases” we meant here are for mother infections. mother infections.

Page 10: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

)](1)[|()()|(

)()|()()|(

)()()(

HPHTPHPHTP

HPHTPHPHTP

HTPHTPTP

C

CC

C

Mother infected, and infant tested positive (true positive)

Mother not infected, and infant tested positive (false positive)

Infants tested positive came from two sources:

Page 11: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

)](1)[|()()|(

)()|()()|(

)()()(

HPHTPHPHTP

HPHTPHPHTP

HTPHTPTP

C

CC

C

)]|()|()[()|()(

)|()]|()|()[()(

)()|()|()()|()(

CC

CC

CC

HTPHTPHPHTPTP

HTPHTPHTPHPTP

HPHTPHTPHPHTPTP

(from previous page)

Page 12: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

• Solving the previous equation for P(H) leads to:

• P(T+ | H) : Those infected mothers being tested positive (in infants). This is the sensitivity of the test.This is the sensitivity of the test.

• P(T+ | Hc) = 1 – P (T- | Hc), the last term represents healthy mothers being tested negative (in infants). This This is the specificity of the test.is the specificity of the test.

Page 13: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

• Assuming that this test has 0.99 sensitivity and 0.998 specificity:

• A scale-down from 0.0159 to 0.0141.

Page 14: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

• Considering the upstate urban region of New York, in which we have:

• By using the same formula, we have:

• This however, turns into a negative This however, turns into a negative prevalence? [Making no sense!!!]prevalence? [Making no sense!!!]

Page 15: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

A brief summary

• Note that P(T+)=0.0014 in the second case, which is very small comparing with 0.0159 in the first case.

• The testing procedure is not The testing procedure is not accurate enough to measure the accurate enough to measure the very low prevalence of HIV in the very low prevalence of HIV in the second case. second case.

Page 16: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

6.5.1 The Relative Risk

• Relative risk (RR) is the risk of an event (or risk of developing a disease) relative to exposureexposure. (For example, exposed to second-hand smoke…)

Page 17: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Cont’d

• Often useful to compare the probabilities of disease in two different groups or situations.

• Relative risk is a ratio of the probability of the event occurring in the exposed groupthe exposed group versus a non-non-exposed (often called a control) exposed (often called a control) groupgroup.

Page 18: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

The Relative Risk (RR)

risk to the unexposed

)/()/(

)|~()|(

dbbcaa

EDPEDP

RR

risk to the exposed

Exposure (E)No Exposure

(~E)

Disease (D) a bNo Disease

(~D) c d

a+c b+d

Page 19: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Example #2

• It has been proposed that women first gave birth at an older age are more susceptible to breast cancer.

Page 20: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Example #2 – cont’d

• Two groups are considered:

– One is “exposed” if she first gave birth at 25 or older25 or older. Out of 1,628 women in this group, 31 were diagnosed with cancer.

– The “unexposed” group (first gave birth younger than 25younger than 25). Out of 4,540 in this group, 65 developed cancer.

Page 21: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Exposure (E)No Exposure

(~E)

Disease (D) a=31 b=65No Disease

(~D) c=1,597 d=4,475

a+c=1,628 b+d=4,540

33.14540/65

1628/31)/()/(

dbbcaa

RR

- Out of 1,628 women who first gave birth at 25 or older, 31 were diagnosed with cancer. - Out of 4,540 women who first gave birth younger than 25, 65 developed cancer.

Page 22: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

6.5.2 The Odds (wiki)

• The odds in favor of an event or a proposition is the ratio of the probability that the event will happen to the probability that the event will not happen.

• Often 'odds' are quoted as odds odds againstagainst, rather than as odds in favor.

Page 23: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

The odds ‘in favor of’

• If an event takes place with probability p, the odds in favor ofin favor of the event are p/(1p/(1p) p) to 1to 1. If p=0.5, for example, the odds are 0.5/0.5=1 to 1. (Or we call it 50/50.)

• For example, if you chose a random day of the week (7 days), then the oddsodds that you would choose a Sunday would be

6

1

7/6

7/1

7/11

7/1

Note that the probability of picking up Sunday would be 1/7.

Page 24: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

The odds ‘against’

• The odds againstagainst you choosing Sunday are 6/1=6, meaning that it's 6 times more likely that you don't choose Sunday.

• These 'odds' are actually relative relative probabilitiesprobabilities.

Page 25: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

The Odds Ratio (OR)

• As the name suggested, the odds ratio (OR) is the ratio between two odds – the odds for a disease to occur in an the odds for a disease to occur in an exposed groupexposed group to the odds for a the odds for a disease to occur in an unexposed disease to occur in an unexposed (control) group(control) group:

)]|(1/[)|(

)]|(1/[)|(

unexposeddiseasePunexposeddiseaseP

exposeddiseasePexposeddiseasePOR

Page 26: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Cont’d

• OR can also be defined as the odds of the odds of exposure among diseased individualsexposure among diseased individuals divided by the odds of exposure among the odds of exposure among those who are not diseasedthose who are not diseased, as

• Odds ratio is also known as “relative relative oddsodds”.

)]|(1/[)|(

)]|(1/[)|(

dnondiseaseexposurePdnondiseaseexposureP

diseasedexposurePdiseasedexposurePOR

Page 27: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Example #3

• Among 989 women who had breast cancer, 273 had previously used oral contraceptives (口服避孕藥 ) and 716 had not.

• Of 9,901 women who did not have breast cancer, 2,641 had previously used oral contraceptives and 7,260 had not.

• We’d like to know the OR for women the OR for women previously used oral contraceptives previously used oral contraceptives to have breast cancerto have breast cancer.

Page 28: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

• In this case, ‘exposure’ represents previously using oral contraceptives, ‘diseased’ means having breast cancer and ‘nondiseased’ means not having breast cancer.

• Among 989989 women who had breast cancer, 273273 had previously used oral contraceptives (口服避孕藥 ) and 716 had not.

• - Of 9,9019,901 women who did not have breast cancer, 2,6412,641 had previously used oral contraceptives and 7,260 had not.

0481.1)]9901/2641(1/[)9901/2641(

)]989/273(1/[)989/273(

)]|(1)[|(

)]|(1)[|(

dnondiseaseexposurePdnondiseaseexposureP

diseasedexposurePdiseasedexposurePOR

-

Page 29: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Building a frequency table for these statistics

Cancer No Cancer

oral contraceptives

273 2641 2941

No oral contraceptives

716 7260 7976

989 9901

Page 30: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

• Rephrase the statistics:– Among 29142914 women who had previously used

oral contraceptives, 273273 had breast cancer.– Among 79767976 women who had not previously

used oral contraceptives, 716716 had breast cancer.

-

0481.1

)]7976/716(1/[)7976/716(

)]2914/273(1/[)2914/273(

)]|(1/[)|(

)]|(1/[)|(

unexposeddiseasePunexposeddiseaseP

exposeddiseasePexposeddiseasePOR

Page 31: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Conclusion

• Women who have used oral contraceptives have an odds of developing breast cancer that is only 1.0481 times the odds of non-users.

• This is not significantnot significant, meaning that one cannot conclude that women using oral contraceptives is susceptible to breast cancer.

Page 32: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.
Page 33: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Summary• Some studies use relative risks (RRs) to describe

results; others use odds ratios (ORs). Both are calculated from simple 2x2 tables. The question of which statistic to use is subtle but very important.

• OR and RR are usually comparable in magnitude when the disease studied is rarerare (e.g., most cancers). However, an OR can overestimate and an OR can overestimate and magnify risk, especially when the disease is magnify risk, especially when the disease is more commonmore common (e.g., hypertension) and should be avoided in such cases if RR can be used.

Page 34: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Reminder

• Next Monday (3/30/2015) we will have the first mid-term exam.

• This test covers up to Chapter 6 (inclusive).

• A close-book test (using hand calculator, Excel or MATLAB only).

Page 35: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

APPENDIX – MATLAB’S STATISTICS TOOLBOX – AN OVERVIEW

Page 36: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.
Page 37: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.
Page 38: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.
Page 39: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.
Page 40: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Useful commands to explore these examples• MATLAB command “who” displays

what variables are loaded in your memory.

• “clear” cleans all variable from memory.

• “size(var”) displays the size of the variable “var”.

Page 41: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.
Page 42: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.
Page 43: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Page 1 of 2

Page 44: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Page 2 of 2

Page 45: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Page 1 of 3

Page 46: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Page 2 of 3

Page 47: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Page 3 of 3

Page 48: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.
Page 49: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Exercise problem - 1• Suppose that you have just came back from a winter vacation in

Fukushima (日本福島縣 ). Because of the threat of radiation leak (輻射外洩 ) from the nuclear power plant damaged by the recent tsunami (海嘯 ), you thought it would be safe if you have a medical exam in a hospital. After the check, the doctor told you that your test for radiation is positive. The radiation test you received has the following facts:– (1) Among every 1,000 people got radiated, 983 are tested positive and

17 negative– (2) Among every 1,000 healthy people (did not get the radiation), 975

would be tested negative and 25 positive.– (3) For every 1,000 tourists visited the same area during the same period

of time, only one would be infected from a statistical point of view.

Page 50: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Cont’d

• Based on the provided information, estimate the probability that you were actually infected. [Write down your variable definition, formula, and computation steps in details. Write your answer to 3 digits after decimal point, e.g., 45.667%.]

Page 51: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Solution• Let “D+” be the sample for diseased people, “D” be non-diseased.

Again let “+” represents people having the test positive, and “–“ be the ones tested negative. From the three statements we have from the doctor, one may have the following facts:– (1) P(+|D+)=0.983, or P(|D+)=0.017.

– (2) P(|D)=0.975, or P(+|D)=0.025

– (3) P(D+)=0.001, or P(D)=0.999.

• According to Bayes’ theorem, we wish to know P(D+|+), which is given by the formula (note that D and D+ are mutually exclusive)

%787.325958

983

24975983

983

99925983

983

999.0025.0001.0983.0

001.0983.0

)()|()()|(

)()|()|(

DPDPDPDP

DPDPDP

Page 52: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Exercise problem - 2

• Continue from previous question. Assuming that you have been worried about the test accuracy and decided to take the same test again for a second time (following your first positive test). The result, unfortunately, showed positive again. Now, please estimate the probability that you actually got infected based on the fact that you have been repeatedly tested positive for 2 consecutive times. (Write your answer to 3 digits after decimal point, e.g., 45.667%.)

Page 53: Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Solution

• Let “++” be people having tested positive on their second test provided their first test is also positive. We wish to know P(D+|++), which is given by the formula (note that D and D+ are still mutually exclusive)

%748.601590664

966289

624375966289

966289

99925983

983

999.0)025.0(001.0983.0

001.0)983.0(

)()|()()|(

)()|()|(

22

2

22

2

DPDPDPDP

DPDPDP