Top Banner
Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Statistics for Health Research Research
40

Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Dec 17, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Introduction to Distributions and

Probability Peter T. Donnan

Professor of Epidemiology and Biostatistics

Statistics for Health Statistics for Health ResearchResearch

Page 2: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

OverviewOverview

•DistributionsDistributions

•History of probabilityHistory of probability

•Definitions of probabilityDefinitions of probability

•Random variableRandom variable

•Probability density functionProbability density function

•Normal, Binomial and Poisson Normal, Binomial and Poisson distributionsdistributions

•DistributionsDistributions

•History of probabilityHistory of probability

•Definitions of probabilityDefinitions of probability

•Random variableRandom variable

•Probability density functionProbability density function

•Normal, Binomial and Poisson Normal, Binomial and Poisson distributionsdistributions

Page 3: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Introduction to Probability Introduction to Probability Density FunctionsDensity Functions

•Normal Distribution / Normal Distribution / •Gaussian / Bell curveGaussian / Bell curve•Poisson named after French Poisson named after French MathematicianMathematician•Binomial related to binary Binomial related to binary factors (Bernoulli Trials)factors (Bernoulli Trials)

Page 4: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Early use of Early use of Normal Normal

DistributionDistribution•Gauss was a German Gauss was a German

mathematician who solved mathematician who solved mystery of where Ceres would mystery of where Ceres would appear after it disappeared appear after it disappeared behind the Sun. behind the Sun.

•He assumed the errors formed a He assumed the errors formed a Normal distribution and Normal distribution and managed to accurately predict managed to accurately predict the orbit of Ceres the orbit of Ceres

•Gauss was a German Gauss was a German mathematician who solved mathematician who solved mystery of where Ceres would mystery of where Ceres would appear after it disappeared appear after it disappeared behind the Sun. behind the Sun.

•He assumed the errors formed a He assumed the errors formed a Normal distribution and Normal distribution and managed to accurately predict managed to accurately predict the orbit of Ceres the orbit of Ceres

Page 5: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

What is the What is the relationship relationship between the between the

Normal or Normal or Gaussian Gaussian

distribution and distribution and probability?probability?

Page 6: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

ProbabilityProbability

““The probable is what usually The probable is what usually happens”happens”

AristotleAristotle

““I cannot believe that God I cannot believe that God plays dice with the cosmos”plays dice with the cosmos”

Albert EinsteinAlbert Einstein

Page 7: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Origins of ProbabilityOrigins of Probability

• Early interest in permutations Early interest in permutations Vedic literature 400 BCVedic literature 400 BC

• Distinguished origins in betting Distinguished origins in betting and gambling!and gambling!

• Pascal and Fermat studied division Pascal and Fermat studied division of stakes in gambling (1654)of stakes in gambling (1654)

• Enlightenment – seen as helping Enlightenment – seen as helping public policy, social equitypublic policy, social equity

• Astronomy – Gauss (1801)Astronomy – Gauss (1801)• Social and genetic – Galton (1885)Social and genetic – Galton (1885)• Experimental design – Fisher Experimental design – Fisher

(1936)(1936)

• Early interest in permutations Early interest in permutations Vedic literature 400 BCVedic literature 400 BC

• Distinguished origins in betting Distinguished origins in betting and gambling!and gambling!

• Pascal and Fermat studied division Pascal and Fermat studied division of stakes in gambling (1654)of stakes in gambling (1654)

• Enlightenment – seen as helping Enlightenment – seen as helping public policy, social equitypublic policy, social equity

• Astronomy – Gauss (1801)Astronomy – Gauss (1801)• Social and genetic – Galton (1885)Social and genetic – Galton (1885)• Experimental design – Fisher Experimental design – Fisher

(1936)(1936)

Page 8: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Types of ProbabilityTypes of Probability

Two basic definitions:Two basic definitions:Two basic definitions:Two basic definitions:

1) Frequentist1) Frequentist

ClassicalClassical

Proportion of Proportion of times an times an event occurs event occurs in a long in a long series of series of ‘trials’‘trials’

2) Subjectivist2) Subjectivist

BayesianBayesian

Strength of belief Strength of belief in event in event happeninghappening

Page 9: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Frequentists vs. Frequentists vs. BayesiansBayesians

•Two entrenched camps Two entrenched camps

•Scientists tend to use the Scientists tend to use the frequentist approachfrequentist approach

•Bayesians gaining groundBayesians gaining ground

•Most scientists use frequentist Most scientists use frequentist methods but incorrectly methods but incorrectly interpret results in a Bayesian interpret results in a Bayesian way!way!

•Two entrenched camps Two entrenched camps

•Scientists tend to use the Scientists tend to use the frequentist approachfrequentist approach

•Bayesians gaining groundBayesians gaining ground

•Most scientists use frequentist Most scientists use frequentist methods but incorrectly methods but incorrectly interpret results in a Bayesian interpret results in a Bayesian way!way!

Page 10: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Frequentists Frequentists

•Consider tossing a fair coinConsider tossing a fair coin

• In any trial, event may be a In any trial, event may be a ‘head’ or ‘tail’ i.e. binary‘head’ or ‘tail’ i.e. binary

•Repeated tossing gives Repeated tossing gives series of ‘events’series of ‘events’

• In long run prob of In long run prob of heads=0.5heads=0.5

•Consider tossing a fair coinConsider tossing a fair coin

• In any trial, event may be a In any trial, event may be a ‘head’ or ‘tail’ i.e. binary‘head’ or ‘tail’ i.e. binary

•Repeated tossing gives Repeated tossing gives series of ‘events’series of ‘events’

• In long run prob of In long run prob of heads=0.5heads=0.5

TTHHTTTTHHHHHHHHTTHHHHHHTTHHHHHHTTTTHHTTTTTTHHHHTTTTHHTTTTHHHHHHTTTTTTHHHHTTHHHHHHTTTTTTTTTTHHHHHH

0.6 0.56 0.6 0.56 0.52 0.52

Page 11: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Frequentist Frequentist Probability Probability

• Note the difference between ‘long run’ Note the difference between ‘long run’ probability and an individual trialprobability and an individual trial

• In an individual trial a head either In an individual trial a head either occurs (X=1) or does not occur (X=0)occurs (X=1) or does not occur (X=0)

• Patient either survives or dies Patient either survives or dies following an MIfollowing an MI

• Prob of dying after MI ≈ 30% based on Prob of dying after MI ≈ 30% based on a previous long series from a a previous long series from a population of individuals who population of individuals who experienced MI experienced MI

• Note the difference between ‘long run’ Note the difference between ‘long run’ probability and an individual trialprobability and an individual trial

• In an individual trial a head either In an individual trial a head either occurs (X=1) or does not occur (X=0)occurs (X=1) or does not occur (X=0)

• Patient either survives or dies Patient either survives or dies following an MIfollowing an MI

• Prob of dying after MI ≈ 30% based on Prob of dying after MI ≈ 30% based on a previous long series from a a previous long series from a population of individuals who population of individuals who experienced MI experienced MI

Page 12: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Subjective Subjective ProbabilityProbability

•Based on strength of beliefBased on strength of belief•ButBut more akin to thinking of more akin to thinking of

clinician making a diagnosisclinician making a diagnosis•Faced with patient with chest Faced with patient with chest

pain, based on past experience, pain, based on past experience, believes prob of heart disease is believes prob of heart disease is 20%20%

•Person tossing coin believes prob Person tossing coin believes prob of head is 1/2of head is 1/2

•Based on strength of beliefBased on strength of belief•ButBut more akin to thinking of more akin to thinking of

clinician making a diagnosisclinician making a diagnosis•Faced with patient with chest Faced with patient with chest

pain, based on past experience, pain, based on past experience, believes prob of heart disease is believes prob of heart disease is 20%20%

•Person tossing coin believes prob Person tossing coin believes prob of head is 1/2of head is 1/2

Page 13: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Comparison of Comparison of definitions of definitions of ProbabilityProbability

• Problems of subjective probabilityProblems of subjective probability

• Probability for same patient can vary Probability for same patient can vary even with same clinicianeven with same clinician

• Person can Person can believebelieve prob of head is 0.1 prob of head is 0.1 even if it is a fair coineven if it is a fair coin

• Subjectivists argue they are more Subjectivists argue they are more realisticrealistic

• This course sticks to ‘frequentist’ and This course sticks to ‘frequentist’ and ‘model-based’ methods of probability‘model-based’ methods of probability

• Problems of subjective probabilityProblems of subjective probability

• Probability for same patient can vary Probability for same patient can vary even with same clinicianeven with same clinician

• Person can Person can believebelieve prob of head is 0.1 prob of head is 0.1 even if it is a fair coineven if it is a fair coin

• Subjectivists argue they are more Subjectivists argue they are more realisticrealistic

• This course sticks to ‘frequentist’ and This course sticks to ‘frequentist’ and ‘model-based’ methods of probability‘model-based’ methods of probability

Page 14: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Random Random VariableVariable

•Consider rolling 2 dice and we want Consider rolling 2 dice and we want to summarise the probabilities of all to summarise the probabilities of all possible outcomespossible outcomes

•We call the outcome a random We call the outcome a random variable X which can have any value variable X which can have any value in this case from 2 to 12 in this case from 2 to 12

•Enumerate all probabilities in Enumerate all probabilities in sample space Ssample space S

•P (2) = 1/6x1/6 = 1/36, P (3)=2/36, P (2) = 1/6x1/6 = 1/36, P (3)=2/36, P (4) = 3/36, etc…..P (4) = 3/36, etc…..

•Consider rolling 2 dice and we want Consider rolling 2 dice and we want to summarise the probabilities of all to summarise the probabilities of all possible outcomespossible outcomes

•We call the outcome a random We call the outcome a random variable X which can have any value variable X which can have any value in this case from 2 to 12 in this case from 2 to 12

•Enumerate all probabilities in Enumerate all probabilities in sample space Ssample space S

•P (2) = 1/6x1/6 = 1/36, P (3)=2/36, P (2) = 1/6x1/6 = 1/36, P (3)=2/36, P (4) = 3/36, etc…..P (4) = 3/36, etc…..

Page 15: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Probability Density Function Probability Density Function for rolling two dicefor rolling two dice

2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 8 9 10 11 12

1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 8 9 10 11 12

6/36

5/36

4/36

3/36

2/36

1/36

Page 16: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Probability Density Function Probability Density Function for rolling two dicefor rolling two dice

2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 8 9 10 11 12

6/36

5/36

4/36

3/36

2/36

1/36

What is probability of getting 12? Answer 1/36What is probability of getting 12? Answer 1/36

What is probability of getting more than 8? Ans. What is probability of getting more than 8? Ans. 10/3610/36

Page 17: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Probability Density Function Probability Density Function for continuous variablefor continuous variable

2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 8 9 10 11 12

1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 8 9 10 11 12

6/36

5/36

4/36

3/36

2/36

1/36

Page 18: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Consider distribution of Consider distribution of weight in kg; all values weight in kg; all values

possible not just discretepossible not just discrete

2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 8 9 10 11 12

20…….30……40…… 50 ……20…….30……40…… 50 ……60…….70…….80…..90….100….110…… 120 60…….70…….80…..90….100….110…… 120

Probability

Probability

Weight in kilogramsWeight in kilograms

Page 19: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Probability Density Probability Density Function in SPSSFunction in SPSS

Use Analyze / Descriptive Statistics / FrequenciesUse Analyze / Descriptive Statistics / Frequencies

and select no table and charts box as belowand select no table and charts box as below

Page 20: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Probability Density Probability Density Function in SPSSFunction in SPSS

Data from ‘LDL Data.sav’ of baseline LDL Data from ‘LDL Data.sav’ of baseline LDL cholesterolcholesterol

Page 21: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Normal DistributionNormal Distribution

Note that a Normal or Gaussian Note that a Normal or Gaussian

curve is defined by two parameters:curve is defined by two parameters:

Mean µMean µ and and Standard Deviation Standard Deviation σσ

And often written as N ( µ, And often written as N ( µ, σσ ) )

Hence any Normal distribution has Hence any Normal distribution has mathematical formmathematical form

Impossible to be integrated so area under the Impossible to be integrated so area under the curve obtained by numerical integration and curve obtained by numerical integration and tabulated! tabulated!

Page 22: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Normal DistributionNormal Distribution

As noted earlier the curve is symmetrical As noted earlier the curve is symmetrical about the mean and so p ( x ) > mean = 0.5 or about the mean and so p ( x ) > mean = 0.5 or 50%50%

And p ( x ) < mean = 0.5 or 50%And p ( x ) < mean = 0.5 or 50%

And p (a < x < b) = p(b) – p(a) And p (a < x < b) = p(b) – p(a)

50% 50%

Page 23: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Normal Distribution and Normal Distribution and ProbabilitiesProbabilities

So we now have a way of working out the So we now have a way of working out the probability of any value or range of values of a probability of any value or range of values of a variables variables IFIF a Normal distribution is a a Normal distribution is a reasonable fit to the datareasonable fit to the data

p (a < x < b) = p(b) – p(a) which is the area p (a < x < b) = p(b) – p(a) which is the area under the curve between a and b under the curve between a and b

50% 50%

Page 24: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Normal DistributionNormal Distribution

Most of area lies between +1 and -1 SD (64%)Most of area lies between +1 and -1 SD (64%)

The large majority lie between +2 and -2 SDs The large majority lie between +2 and -2 SDs (95%)(95%)

Page 25: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Probability Density Probability Density Function (PDF) = Function (PDF) =

Normal Normal DistributionDistribution

Page 26: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

How well does my data How well does my data fit a Normal fit a Normal Distribution?Distribution?

Note median and mean virtually the sameNote median and mean virtually the same

Skewness = 0.039, close to zeroSkewness = 0.039, close to zero

Skewness is measure of symmetry (0=perfect Skewness is measure of symmetry (0=perfect symetry)symetry)

Eyeball test - fitted normal curve looks good!Eyeball test - fitted normal curve looks good!

Statistics

Baseline LDL1383

0

3.454363

3.506214

.9889157

.039

.066

.3345

7.5650

Valid

Missing

N

Mean

Median

Std. Dev iation

Skewness

Std. Error of Skewness

Minimum

Maximum

Page 27: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Try Q-Q plot in Analyze / Try Q-Q plot in Analyze / Descriptive Statistics/ Q-Q Descriptive Statistics/ Q-Q

plotplot

Plot compares Plot compares Expected Normal Expected Normal distribution with distribution with real data and if data real data and if data lies on line y = x lies on line y = x then the Normal then the Normal Distribution is a Distribution is a good fitgood fit

Note still an eyeball Note still an eyeball test!test!

Is this a good fit?Is this a good fit?

Page 28: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

I used to be Normal until I I used to be Normal until I discovered Kilmogorov-discovered Kilmogorov-

Smirnoff!Smirnoff!

Eyeball Test indicates distribution is Eyeball Test indicates distribution is approximately Normal approximately Normal butbut K-S test is K-S test is significant indicating discrepancy compared to significant indicating discrepancy compared to NormalNormal

WARNING: DO NOT RELY ON THIS TESTWARNING: DO NOT RELY ON THIS TEST

One-Sample Kolmogorov-Smirnov Test

1383

3.454363

.9889157

.043

.043

-.043

1.617

.011

N

Mean

Std. Dev iation

Normal Parameters a,b

Absolute

Positive

Negative

Most ExtremeDif f erences

Kolmogorov -Smirnov Z

Asy mp. Sig. (2-tailed)

Baseline LDL

Test distribution is Normal.a.

Calculated f rom data.b.

Page 29: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Consider the distribution of Consider the distribution of survival times following surgery survival times following surgery

for colorectal cancerfor colorectal cancer

Note median=835 days and mean=848Note median=835 days and mean=848

Skewness = 2.081, very skewed (> Skewness = 2.081, very skewed (> 1.0)1.0)

Strong tail to right! Approximately Strong tail to right! Approximately Normal?Normal?

Statistics

Time f rom Surgery476

0

848.3908

835.5000

582.39657

2.081

.112

14.00

5763.00

Valid

Missing

N

Mean

Median

Std. Dev iation

Skewness

Std. Error of Skewness

Minimum

Maximum

Page 30: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Try a log transformation for Try a log transformation for right positive skewed data?right positive skewed data?

Better but now slightly skewed to Better but now slightly skewed to left!left!

Statistics

logtime476

0

6.4346

6.7286

.95059

-1.504

.112

2.67

8.66

Valid

Missing

N

Mean

Median

Std. Dev iation

Skewness

Std. Error of Skewness

Minimum

Maximum

Page 31: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Examples of skewed Examples of skewed distributions in Health distributions in Health

ResearchResearchDiscrete random variables – hospital admissions, Discrete random variables – hospital admissions, cigarettes smoked, alcohol consumption, costscigarettes smoked, alcohol consumption, costs

Continuous RV – BMI, cholesterol, BPContinuous RV – BMI, cholesterol, BP

30%30%

Page 32: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

The Binomial The Binomial DistributionDistribution

• ‘‘Binomial’ means ‘two numbers’. Binomial’ means ‘two numbers’. • Outcomes of health research are Outcomes of health research are

often measured by whether they often measured by whether they have occurred or not. have occurred or not.

• For example, recovered from disease, For example, recovered from disease, admitted to hospital, died, etcadmitted to hospital, died, etc

• May be modelled by assuming that May be modelled by assuming that the the number of events number of events n has a n has a binomial distribution with a fixed binomial distribution with a fixed probability of event pprobability of event p

• ‘‘Binomial’ means ‘two numbers’. Binomial’ means ‘two numbers’. • Outcomes of health research are Outcomes of health research are

often measured by whether they often measured by whether they have occurred or not. have occurred or not.

• For example, recovered from disease, For example, recovered from disease, admitted to hospital, died, etcadmitted to hospital, died, etc

• May be modelled by assuming that May be modelled by assuming that the the number of events number of events n has a n has a binomial distribution with a fixed binomial distribution with a fixed probability of event pprobability of event p

Page 33: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

The Binomial The Binomial DistributionDistribution

• Based on work of Jakob Bernoulli, a Based on work of Jakob Bernoulli, a Swiss mathematicianSwiss mathematician

• Refused a church appointment and Refused a church appointment and instead studied mathematicsinstead studied mathematics

• Early use was for games of chance but Early use was for games of chance but now used in every human endeavournow used in every human endeavour

• When n = 1 this is called a Bernoulli When n = 1 this is called a Bernoulli trialtrial

• Binomial distribution is distribution for Binomial distribution is distribution for a series of Bernoulli trialsa series of Bernoulli trials

• Based on work of Jakob Bernoulli, a Based on work of Jakob Bernoulli, a Swiss mathematicianSwiss mathematician

• Refused a church appointment and Refused a church appointment and instead studied mathematicsinstead studied mathematics

• Early use was for games of chance but Early use was for games of chance but now used in every human endeavournow used in every human endeavour

• When n = 1 this is called a Bernoulli When n = 1 this is called a Bernoulli trialtrial

• Binomial distribution is distribution for Binomial distribution is distribution for a series of Bernoulli trialsa series of Bernoulli trials

Page 34: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

The Binomial The Binomial DistributionDistribution

• Binomial distribution written as B ( n , Binomial distribution written as B ( n , p) where n is the total number of p) where n is the total number of events and p = prob of an eventevents and p = prob of an event

• This is a Binomial This is a Binomial Distribution withDistribution with p=0.25 and n=20p=0.25 and n=20

• Binomial distribution written as B ( n , Binomial distribution written as B ( n , p) where n is the total number of p) where n is the total number of events and p = prob of an eventevents and p = prob of an event

• This is a Binomial This is a Binomial Distribution withDistribution with p=0.25 and n=20p=0.25 and n=20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Successes

0.00

0.05

0.10

0.15

0.20

Pro

babi

lity

of R

Suc

cess

es

Page 35: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

The Binomial The Binomial DistributionDistribution

Binomial distributions used for binary Binomial distributions used for binary factors and so used to assess factors and so used to assess percentages or proportionspercentages or proportionsUtilised in Cross-tabulation and logistic Utilised in Cross-tabulation and logistic regressionregressionNote as N gets larger or Note as N gets larger or P ~0.5 then Binomial is P ~0.5 then Binomial is Equal to Normal Distr.Equal to Normal Distr.

B(n,p) ~ N (np, np(1-p))B(n,p) ~ N (np, np(1-p))

Page 36: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

The Poisson The Poisson DistributionDistribution

Poisson distribution (1838), named Poisson distribution (1838), named after its inventor Simeon Poisson who after its inventor Simeon Poisson who was a French mathematician. He found was a French mathematician. He found that if we have a rare event (i.e. p is that if we have a rare event (i.e. p is small) and we know the expected or small) and we know the expected or mean ( or µ) number of occurrences, mean ( or µ) number of occurrences, the probabilities of 0, 1, 2 ... events the probabilities of 0, 1, 2 ... events are given by:are given by:

Poisson distribution (1838), named Poisson distribution (1838), named after its inventor Simeon Poisson who after its inventor Simeon Poisson who was a French mathematician. He found was a French mathematician. He found that if we have a rare event (i.e. p is that if we have a rare event (i.e. p is small) and we know the expected or small) and we know the expected or mean ( or µ) number of occurrences, mean ( or µ) number of occurrences, the probabilities of 0, 1, 2 ... events the probabilities of 0, 1, 2 ... events are given by:are given by:

!R

e)R(P

R

Page 37: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

The Poisson The Poisson DistributionDistribution

Note similarity to BinomialNote similarity to BinomialIn fact when p is small and n is large In fact when p is small and n is large B(n, p) ~ P (µ = np)B(n, p) ~ P (µ = np)Also for large values of µ:Also for large values of µ:P (µ) ~ N ( µ, µ )P (µ) ~ N ( µ, µ )

Hence if n and p not known Hence if n and p not known could use Poisson instead could use Poisson instead

Note similarity to BinomialNote similarity to BinomialIn fact when p is small and n is large In fact when p is small and n is large B(n, p) ~ P (µ = np)B(n, p) ~ P (µ = np)Also for large values of µ:Also for large values of µ:P (µ) ~ N ( µ, µ )P (µ) ~ N ( µ, µ )

Hence if n and p not known Hence if n and p not known could use Poisson instead could use Poisson instead

Page 38: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

The Poisson The Poisson DistributionDistribution

In health research often used to model the In health research often used to model the number of events assumed to be random: number of events assumed to be random:

Number of hip replacement failures,Number of hip replacement failures,Number of cases of C. diff infection,Number of cases of C. diff infection,Diagnoses of leukaemia around Diagnoses of leukaemia around

nuclear nuclear power stations,power stations,Number of H1N1 cases in Scotland,Number of H1N1 cases in Scotland,Etc.Etc.

In health research often used to model the In health research often used to model the number of events assumed to be random: number of events assumed to be random:

Number of hip replacement failures,Number of hip replacement failures,Number of cases of C. diff infection,Number of cases of C. diff infection,Diagnoses of leukaemia around Diagnoses of leukaemia around

nuclear nuclear power stations,power stations,Number of H1N1 cases in Scotland,Number of H1N1 cases in Scotland,Etc.Etc.

Page 39: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

SummarySummary

•Many of variables measured in Health Research Many of variables measured in Health Research form distributions which approximate to common form distributions which approximate to common distributions with known mathematical distributions with known mathematical propertiesproperties

•Normal, Poisson, Binomial, etc…Normal, Poisson, Binomial, etc…

•Note a relationship for all centred Note a relationship for all centred

around the exponential distributionaround the exponential distribution

Where e = 2.718Where e = 2.718

•All belong to the Exponential Family of All belong to the Exponential Family of distributions distributions

•These probability distributions are critical to These probability distributions are critical to applying statistical methodsapplying statistical methods

RANNORM

2.051.05.05-.95-1.95-2.95

40

30

20

10

0

Std. Dev = .96

Mean = -.04

N = 501.00

Page 40: Introduction to Distributions and Probability Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

SPSS PracticalSPSS Practical

• Read in data file ‘LDL Data.sav’Read in data file ‘LDL Data.sav’

• Consider adherence to statins, Consider adherence to statins, baseline LDL, min Chol achieved, baseline LDL, min Chol achieved, BMI, duration of statin use BMI, duration of statin use

• Assess distributions for normalityAssess distributions for normality

• If non-normal consider a If non-normal consider a transformationtransformation

• Try to carry out Q-Q plots Try to carry out Q-Q plots

• Read in data file ‘LDL Data.sav’Read in data file ‘LDL Data.sav’

• Consider adherence to statins, Consider adherence to statins, baseline LDL, min Chol achieved, baseline LDL, min Chol achieved, BMI, duration of statin use BMI, duration of statin use

• Assess distributions for normalityAssess distributions for normality

• If non-normal consider a If non-normal consider a transformationtransformation

• Try to carry out Q-Q plots Try to carry out Q-Q plots