Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Biostatistics

Lecture 5 (3/23 & 3/24/2015)

Last modified 3/24/2015 1:14:21 PM

Chapter 6 Probability and Diagnostic Tests - II

Outline

• 6.1 Operations on Events and Probability• 6.2 Conditional Probability• 6.3 Bayes’ Theorem• 6.4 Diagnostic Tests

– 6.4.1 Sensitivity and Specificity

– 6.4.2 Application of Bayes’ Theorem

– 6.4.3 ROC Curves

– 6.4.4 Prevalence evaluation

• 6.5 The Relative Risk (RR) and the Odds Ratio (OR)

6.4.4 Prevalence evaluation

• Prevalence or prevalence proportion, in epidemiology, is the proportion of a population found to have a condition (typically a disease or a risk factor such as smoking or seat-belt use).

• It is usually expressed as a fraction, as a percentage or as the number of cases per 10,000 or 100,000 people.

Example #1

• A program was conducted to screen HIV infections in mothers. (The purpose is to know whether a mother is infected or not.)

• Since maternal antibodies cross the placenta, the presence of antibodies in an infant signals infection in the mother.

• Because the tests were performed anonymously, no verification of the results is possible. (No mother is tested!!!)

Defining various events

• H : the event that a mother is infected with HIV.

• HC : the event that a mother is NOT infected with HIV.

• n : total number of infants tested• n+ : number of infants with positive results• T+ : the event for a positive test result for

an infant • T- : the event for a negative test result for

an infant

Taking Manhattan for example

• n = 50,364 infants were tested and n+ = 799 were positive, that is:

• In other words, P(T+) = 0.0159 (from infants)

• We want to know P(H) : the prevalence of mother infection.

• Is it true P(Is it true P(HH) = P() = P(TT++) = 0.0159?) = 0.0159?

A test is not perfect…

• If the screen tests were perfect, then P(H) = P(T+) = 0.0159.

• However, However, TT++ can contain both true can contain both true positive and false positive cases.positive and false positive cases.

• Similarly, Similarly, TT-- can contain both true can contain both true negative and false negative too.negative and false negative too.

• Note the “screening tests” or Note the “screening tests” or “cases” we meant here are for “cases” we meant here are for mother infections. mother infections.

)](1)[|()()|(

)()|()()|(

)()()(

HPHTPHPHTP

HPHTPHPHTP

HTPHTPTP

C

CC

C

Mother infected, and infant tested positive (true positive)

Mother not infected, and infant tested positive (false positive)

Infants tested positive came from two sources:

)](1)[|()()|(

)()|()()|(

)()()(

HPHTPHPHTP

HPHTPHPHTP

HTPHTPTP

C

CC

C

)]|()|()[()|()(

)|()]|()|()[()(

)()|()|()()|()(

CC

CC

CC

HTPHTPHPHTPTP

HTPHTPHTPHPTP

HPHTPHTPHPHTPTP

(from previous page)

• Solving the previous equation for P(H) leads to:

• P(T+ | H) : Those infected mothers being tested positive (in infants). This is the sensitivity of the test.This is the sensitivity of the test.

• P(T+ | Hc) = 1 – P (T- | Hc), the last term represents healthy mothers being tested negative (in infants). This This is the specificity of the test.is the specificity of the test.

• Assuming that this test has 0.99 sensitivity and 0.998 specificity:

• A scale-down from 0.0159 to 0.0141.

• Considering the upstate urban region of New York, in which we have:

• By using the same formula, we have:

• This however, turns into a negative This however, turns into a negative prevalence? [Making no sense!!!]prevalence? [Making no sense!!!]

A brief summary

• Note that P(T+)=0.0014 in the second case, which is very small comparing with 0.0159 in the first case.

• The testing procedure is not The testing procedure is not accurate enough to measure the accurate enough to measure the very low prevalence of HIV in the very low prevalence of HIV in the second case. second case.

6.5.1 The Relative Risk

• Relative risk (RR) is the risk of an event (or risk of developing a disease) relative to exposureexposure. (For example, exposed to second-hand smoke…)

Cont’d

• Often useful to compare the probabilities of disease in two different groups or situations.

• Relative risk is a ratio of the probability of the event occurring in the exposed groupthe exposed group versus a non-non-exposed (often called a control) exposed (often called a control) groupgroup.

The Relative Risk (RR)

risk to the unexposed

)/()/(

)|~()|(

dbbcaa

EDPEDP

RR

risk to the exposed

Exposure (E)No Exposure

(~E)

Disease (D) a bNo Disease

(~D) c d

a+c b+d

Example #2

• It has been proposed that women first gave birth at an older age are more susceptible to breast cancer.

Example #2 – cont’d

• Two groups are considered:

– One is “exposed” if she first gave birth at 25 or older25 or older. Out of 1,628 women in this group, 31 were diagnosed with cancer.

– The “unexposed” group (first gave birth younger than 25younger than 25). Out of 4,540 in this group, 65 developed cancer.

Exposure (E)No Exposure

(~E)

Disease (D) a=31 b=65No Disease

(~D) c=1,597 d=4,475

a+c=1,628 b+d=4,540

33.14540/65

1628/31)/()/(

dbbcaa

RR

- Out of 1,628 women who first gave birth at 25 or older, 31 were diagnosed with cancer. - Out of 4,540 women who first gave birth younger than 25, 65 developed cancer.

6.5.2 The Odds (wiki)

• The odds in favor of an event or a proposition is the ratio of the probability that the event will happen to the probability that the event will not happen.

• Often 'odds' are quoted as odds odds againstagainst, rather than as odds in favor.

http://en.wikipedia.org/wiki/Event_(probability_theory)

http://en.wikipedia.org/wiki/Proposition

The odds ‘in favor of’

• If an event takes place with probability p, the odds in favor ofin favor of the event are p/(1p/(1p) p) to 1to 1. If p=0.5, for example, the odds are 0.5/0.5=1 to 1. (Or we call it 50/50.)

• For example, if you chose a random day of the week (7 days), then the oddsodds that you would choose a Sunday would be

6

1

7/6

7/1

7/11

7/1

Note that the probability of picking up Sunday would be 1/7.

The odds ‘against’

• The odds againstagainst you choosing Sunday are 6/1=6, meaning that it's 6 times more likely that you don't choose Sunday.

• These 'odds' are actually relative relative probabilitiesprobabilities.

The Odds Ratio (OR)

• As the name suggested, the odds ratio (OR) is the ratio between two odds – the odds for a disease to occur in an the odds for a disease to occur in an exposed groupexposed group to the odds for a the odds for a disease to occur in an unexposed disease to occur in an unexposed (control) group(control) group:

)]|(1/[)|(

)]|(1/[)|(

unexposeddiseasePunexposeddiseaseP

exposeddiseasePexposeddiseasePOR

Cont’d

• OR can also be defined as the odds of the odds of exposure among diseased individualsexposure among diseased individuals divided by the odds of exposure among the odds of exposure among those who are not diseasedthose who are not diseased, as

• Odds ratio is also known as “relative relative oddsodds”.

)]|(1/[)|(

)]|(1/[)|(

dnondiseaseexposurePdnondiseaseexposureP

diseasedexposurePdiseasedexposurePOR

Example #3

• Among 989 women who had breast cancer, 273 had previously used oral contraceptives (口服避孕藥 ) and 716 had not.

• Of 9,901 women who did not have breast cancer, 2,641 had previously used oral contraceptives and 7,260 had not.

• We’d like to know the OR for women the OR for women previously used oral contraceptives previously used oral contraceptives to have breast cancerto have breast cancer.

• In this case, ‘exposure’ represents previously using oral contraceptives, ‘diseased’ means having breast cancer and ‘nondiseased’ means not having breast cancer.

• Among 989989 women who had breast cancer, 273273 had previously used oral contraceptives (口服避孕藥 ) and 716 had not.

• - Of 9,9019,901 women who did not have breast cancer, 2,6412,641 had previously used oral contraceptives and 7,260 had not.

0481.1)]9901/2641(1/[)9901/2641(

)]989/273(1/[)989/273(

)]|(1)[|(

)]|(1)[|(

dnondiseaseexposurePdnondiseaseexposureP

diseasedexposurePdiseasedexposurePOR

-

Building a frequency table for these statistics

Cancer No Cancer

oral contraceptives

273 2641 2941

No oral contraceptives

716 7260 7976

989 9901

• Rephrase the statistics:– Among 29142914 women who had previously used

oral contraceptives, 273273 had breast cancer.– Among 79767976 women who had not previously

used oral contraceptives, 716716 had breast cancer.

-

0481.1

)]7976/716(1/[)7976/716(

)]2914/273(1/[)2914/273(

)]|(1/[)|(

)]|(1/[)|(

unexposeddiseasePunexposeddiseaseP

exposeddiseasePexposeddiseasePOR

Conclusion

• Women who have used oral contraceptives have an odds of developing breast cancer that is only 1.0481 times the odds of non-users.

• This is not significantnot significant, meaning that one cannot conclude that women using oral contraceptives is susceptible to breast cancer.

Summary• Some studies use relative risks (RRs) to describe

results; others use odds ratios (ORs). Both are calculated from simple 2x2 tables. The question of which statistic to use is subtle but very important.

• OR and RR are usually comparable in magnitude when the disease studied is rarerare (e.g., most cancers). However, an OR can overestimate and an OR can overestimate and magnify risk, especially when the disease is magnify risk, especially when the disease is more commonmore common (e.g., hypertension) and should be avoided in such cases if RR can be used.

Reminder

• Next Monday (3/30/2015) we will have the first mid-term exam.

• This test covers up to Chapter 6 (inclusive).

• A close-book test (using hand calculator, Excel or MATLAB only).

APPENDIX – MATLAB’S STATISTICS TOOLBOX – AN OVERVIEW

Useful commands to explore these examples• MATLAB command “who” displays

what variables are loaded in your memory.

• “clear” cleans all variable from memory.

• “size(var”) displays the size of the variable “var”.

Page 1 of 2

Page 2 of 2

Page 1 of 3

Page 2 of 3

Page 3 of 3

Exercise problem - 1• Suppose that you have just came back from a winter vacation in

Fukushima (日本福島縣 ). Because of the threat of radiation leak (輻射外洩 ) from the nuclear power plant damaged by the recent tsunami (海嘯 ), you thought it would be safe if you have a medical exam in a hospital. After the check, the doctor told you that your test for radiation is positive. The radiation test you received has the following facts:– (1) Among every 1,000 people got radiated, 983 are tested positive and

17 negative– (2) Among every 1,000 healthy people (did not get the radiation), 975

would be tested negative and 25 positive.– (3) For every 1,000 tourists visited the same area during the same period

of time, only one would be infected from a statistical point of view.

Cont’d

• Based on the provided information, estimate the probability that you were actually infected. [Write down your variable definition, formula, and computation steps in details. Write your answer to 3 digits after decimal point, e.g., 45.667%.]

Solution• Let “D+” be the sample for diseased people, “D” be non-diseased.

Again let “+” represents people having the test positive, and “–“ be the ones tested negative. From the three statements we have from the doctor, one may have the following facts:– (1) P(+|D+)=0.983, or P(|D+)=0.017.

– (2) P(|D)=0.975, or P(+|D)=0.025

– (3) P(D+)=0.001, or P(D)=0.999.

• According to Bayes’ theorem, we wish to know P(D+|+), which is given by the formula (note that D and D+ are mutually exclusive)

%787.325958

983

24975983

983

99925983

983

999.0025.0001.0983.0

001.0983.0

)()|()()|(

)()|()|(

DPDPDPDP

DPDPDP

Exercise problem - 2

• Continue from previous question. Assuming that you have been worried about the test accuracy and decided to take the same test again for a second time (following your first positive test). The result, unfortunately, showed positive again. Now, please estimate the probability that you actually got infected based on the fact that you have been repeatedly tested positive for 2 consecutive times. (Write your answer to 3 digits after decimal point, e.g., 45.667%.)

Solution

• Let “++” be people having tested positive on their second test provided their first test is also positive. We wish to know P(D+|++), which is given by the formula (note that D and D+ are still mutually exclusive)

%748.601590664

966289

624375966289

966289

99925983

983

999.0)025.0(001.0983.0

001.0)983.0(

)()|()()|(

)()|()|(

22

2

22

2

DPDPDPDP

DPDPDP

Biostatistics Lecture 5 (3/23 & 3/24/2015) Last modified 3/24/2015 1:14:21 PM.

Documents