Top Banner
Statistical Statistical inference inference Its application for health science research Bandit Thinkhamrop, Ph.D. Bandit Thinkhamrop, Ph.D. (Statistics) (Statistics) Department of Biostatistics and Department of Biostatistics and Demography Demography Faculty of Public Health Faculty of Public Health Khon Kaen University Khon Kaen University
44

Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Jan 06, 2018

Download

Documents

Edwin Bailey

Type of the study outcome: Key for selecting appropriate statistical methods Study outcome –Dependent variable or response variable –Focus on primary study outcome if there are more Type of the study outcome –Continuous –Categorical (dichotomous, polytomous, ordinal) –Numerical (Poisson) count –Even-free duration
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Statistical inferenceStatistical inferenceIts application for health science research

Bandit Thinkhamrop, Ph.D.(Statistics)Bandit Thinkhamrop, Ph.D.(Statistics)Department of Biostatistics and DemographyDepartment of Biostatistics and Demography

Faculty of Public HealthFaculty of Public HealthKhon Kaen UniversityKhon Kaen University

Page 2: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Begin at the conclusionBegin at the conclusion

Page 3: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Type of the study outcome: Key for Type of the study outcome: Key for selecting appropriate statistical methodsselecting appropriate statistical methods

Study outcomeStudy outcome– Dependent variable or response variableDependent variable or response variable– Focus on primary study outcome if there are moreFocus on primary study outcome if there are more

Type of the study outcomeType of the study outcome– ContinuousContinuous– Categorical (dichotomous, polytomous, ordinal)Categorical (dichotomous, polytomous, ordinal)– Numerical (Poisson) countNumerical (Poisson) count– Even-free durationEven-free duration

Page 4: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

The outcome determine statisticsThe outcome determine statistics

Continuous

MeanMedian

Categorical

Proportion(PrevalenceOrRisk)

Count

Rate per “space”

Survival

Median survivalRisk of events at T(t)

Linear Reg. Logistic Reg. Poisson Reg. Cox Reg.

Page 5: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Statistics quantify errors for judgmentsStatistics quantify errors for judgmentsParameter estimation

[95%CI]

Hypothesis testing[P-value]

Page 6: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Common types of the statistical goalsCommon types of the statistical goals

Single measurements (no comparison)Single measurements (no comparison)Difference (compared by subtraction)Difference (compared by subtraction)Ratio (compared by division)Ratio (compared by division)Prediction (diagnostic test or predictive Prediction (diagnostic test or predictive model)model)Correlation (examine a joint distribution) Correlation (examine a joint distribution) Agreement (examine concordance or Agreement (examine concordance or similarity between pairs of observations)similarity between pairs of observations)

Page 7: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
Page 8: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
Page 9: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Dependency of the study outcome required Dependency of the study outcome required special statistical methods to handle itspecial statistical methods to handle it

Continuous Categorical Count Survival

MeanMedian

Proportion(PrevalenceOrRisk)

Rate per “space”

Median survivalRisk of events at T(t)

Linear Reg. Logistic Reg. Poisson Reg. Cox Reg.

Mixed model, multilevel model, GEE

Page 10: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Answer the research questionbased on lower or upper limit of the CI

Back to the conclusionBack to the conclusion

Continuous Categorical Count Survival

Magnitude of effect95% CIP-value

MeanMedian

Proportion(Prevalence or Risk)

Rate per “space”

Median survivalRisk of events at T(t)

Appropriate statistical methods

Page 11: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Always report the magnitude of Always report the magnitude of effect and its confidence intervaleffect and its confidence interval

Absolute effects: Absolute effects: – Mean, Mean differenceMean, Mean difference– Proportion or prevalence, Rate or risk, Rate or Risk differenceProportion or prevalence, Rate or risk, Rate or Risk difference– Median survival timeMedian survival time

Relative effects:Relative effects:– Relative risk, Rate ratio, Hazard ratioRelative risk, Rate ratio, Hazard ratio– Odds ratioOdds ratio

Other magnitude of effects: Other magnitude of effects: – Correlation coefficientCorrelation coefficient (r), Intra-class correlation (ICC)(r), Intra-class correlation (ICC)– KappaKappa– Diagnostic performanceDiagnostic performance– Etc.Etc.

Page 12: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Touch the Touch the variabilityvariability (uncertainty) (uncertainty) to understand statistical inferenceto understand statistical inference

id A (x- ) (x- ) 2

11 22 -2-2 4422 22 -2-2 4433 00 -4-4 161644 22 -2-2 4455 1414 1010 100100

Sum (Sum ()) 2020 00 128128Mean( )Mean( ) 44 00 32.032.0

SDSD 5.665.66MedianMedian 22

X

X X2+2+0+2+14 = 20

2+2+0+2+14 = 20 = 4 5 5

0 2 2 2 14

Variance = SD2

Standard deviation = SD

Page 13: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Touch the Touch the variabilityvariability (uncertainty) (uncertainty) to understand statistical inferenceto understand statistical inference

id A (x- ) (x- ) 2

11 22 -2-2 4422 22 -2-2 4433 00 -4-4 161644 22 -2-2 4455 1414 1010 100100

Sum (Sum ()) 2020 00 128128Mean( )Mean( ) 44 00 32.032.0

SDSD 5.665.66MedianMedian 22

X

X X

Measure of variation

Measure of central tendency

Page 14: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

1

2

nXXSD

Degree of freedom

Standard deviation (SD) = The average distant between each data item to their mean

Page 15: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Same mean BUT different variationSame mean BUT different variation

id A11 2222 2233 0044 2255 1414

Sum (Sum ()) 2020MeanMean 44

SDSD 5.665.66MedianMedian 22

id C11 4422 3333 5544 4455 44

Sum (Sum ()) 2020MeanMean 44

SDSD 0.710.71MedianMedian 44

Heterogeneous dataSkew distribution

Heterogeneous dataSymmetry distribution

id B11 0022 3333 4444 5555 88

Sum (Sum ()) 2020MeanMean 44

SDSD 2.912.91MedianMedian 44

Homogeneous dataSymmetry distribution

Page 16: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Facts about VariationFacts about VariationBecause of variability, repeated samples will Because of variability, repeated samples will NOT obtain the same statistic such as mean or NOT obtain the same statistic such as mean or proportion:proportion:– Statistics varies from study to study because of the Statistics varies from study to study because of the

role of chancerole of chance– Hard to believe that the statistic is the parameter Hard to believe that the statistic is the parameter – Thus we need statistical inference to estimate the Thus we need statistical inference to estimate the

parameter based on the statistics obtained from a parameter based on the statistics obtained from a studystudy

Data varied widely = heterogeneous dataData varied widely = heterogeneous dataHeterogeneous data requires large sample size Heterogeneous data requires large sample size to achieve a conclusive findingto achieve a conclusive finding

Page 17: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

The HistogramThe Histogramid A11 22

22 22

33 00

44 22

55 1414

id B11 44

22 33

33 55

44 44

55 44

00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414

00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414

Page 18: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

The Frequency CurveThe Frequency Curveid A11 22

22 22

33 00

44 22

55 1414

id B11 44

22 33

33 55

44 44

55 44

00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414

00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414

Page 19: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Area Under The Frequency CurveArea Under The Frequency Curveid A11 22

22 22

33 00

44 22

55 1414

id B11 44

22 33

33 55

44 44

55 44

00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414

00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414

Page 20: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Central Limit TheoremCentral Limit Theorem

Right SkewX1

Symmetry

X2

Left SkewX3

Normally distributedX1 XX Xn

Page 21: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Distribution of Distribution of thethe sampling meansampling mean

Distribution of Distribution of the raw datathe raw data

Central Limit TheoremCentral Limit Theorem

X1

X2

X3

X1 XX Xn

Page 22: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Central Limit TheoremCentral Limit TheoremDistribution of Distribution of

the raw datathe raw data

X1 XX Xn

Distribution of Distribution of

thethe sampling meansampling mean

(Theoretical) Normal Distribution

Large sampleLarge sample

Page 23: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Central Limit TheoremCentral Limit TheoremMany X, , SDMany X, , SD

Standardized for whatever n, Mean = 0, Standard deviation = 1

Large sampleLarge sample

X1 XX Xn

Many , , SEMany , , SEX XX

X

Standard deviation of the sampling mean Standard error (SE)Estimated by

SE = SD n

Page 24: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

(Theoretical) Normal (Theoretical) Normal DistributionDistribution

Page 25: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

(Theoretical) Normal (Theoretical) Normal DistributionDistribution

Page 26: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Mean ± 3SD

99.73% of AUC

Page 27: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Mean ± 2SD

95.45% of AUC

Page 28: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Mean ± 1SD

68.26% of AUC

Page 29: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

n = 25X = 52SD = 5

Sample

PopulationParameter estimation

[95%CI]

Hypothesis testing[P-value]

Page 30: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

nSDSE

255

SE 5 = 1 5

Z = 2.58Z = 1.96Z = 1.64

Page 31: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

n = 25X = 52SD = 5SE = 1

Sample

PopulationParameter estimation

[95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96We are 95% confidence that the population mean would lie between 50.04 and 53.96

Z = 2.58Z = 1.96Z = 1.64

Page 32: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

n = 25X = 52SD = 5SE = 1

Sample

Hypothesis testing

Population

Z = 55 – 52 1 3H0 : = 55

HA : 55

Page 33: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Hypothesis testing

H0 : = 55HA : 55If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027.

Z = 55 – 52 1 3 P-value = 1-0.9973 = 0.0027

5552-3SE +3SE

Page 34: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

P-value P-value vs.vs. 95%CI 95%CI (1)(1)

A study compared cure rate between Drug A and Drug B

Setting:Drug A = Alternative treatmentDrug B = Conventional treatment

Results:Drug A: n1 = 50, Pa = 80%Drug B: n2 = 50, Pb = 50%

Pa-Pb = 30% (95%CI: 26% to 34%; P=0.001)

An example of a study with dichotomous outcome

Page 35: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

P-value P-value vs.vs. 95%CI 95%CI (2)(2)

Pa-Pb = 30% (95%CI: 26% to 34%; P< 0.05)

Pa > Pb

Pb > Pa

Page 36: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

P-value P-value vs.vs. 95%CI 95%CI (3)(3)Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99

Page 37: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Tips #6 Tips #6 (b)(b) P-value P-value vs.vs. 95%CI 95%CI (4)(4)

Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99

There were statistically significant different between the two groups.

Page 38: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Tips #6 Tips #6 (b)(b) P-value P-value vs.vs. 95%CI 95%CI (5)(5)

Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99

There were no statistically significant different between the two groups.

Page 39: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

P-value P-value vs.vs. 95%CI 95%CI (4)(4)

Save tips:Save tips:– Always report 95%CI with p-value, NOT report Always report 95%CI with p-value, NOT report

solely p-valuesolely p-value– Always interpret based on the lower or upper Always interpret based on the lower or upper

limit of the confidence interval, p-value can be limit of the confidence interval, p-value can be an optional an optional

– Never interpret p-value > 0.05 as an indication Never interpret p-value > 0.05 as an indication of no difference or no association, only the CI of no difference or no association, only the CI can provide this message.can provide this message.

Page 40: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Additional NotesAdditional Notes

Page 41: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Alpha (Alpha () and Beta () and Beta ())Alpha (Alpha () ) – Type I error Type I error – The probability that a statistical test will reject the null hypothesis when the The probability that a statistical test will reject the null hypothesis when the

null hypothesis is true null hypothesis is true – Making a false positive decisionMaking a false positive decision

Beta (Beta () ) – Type II error Type II error – The probability that a statistical test will NOT reject the null The probability that a statistical test will NOT reject the null

hypothesis when the null hypothesis is false hypothesis when the null hypothesis is false – Making a false negative decisionMaking a false negative decision

Power (1- Power (1- ))– The probability that a statistical test will reject the null hypothesis when the The probability that a statistical test will reject the null hypothesis when the

null hypothesis is false null hypothesis is false – That is, the probability of NOT committing a Type II error or a false That is, the probability of NOT committing a Type II error or a false

negative decisionnegative decision– The higher the power, the lower Type II errorThe higher the power, the lower Type II error– Also known as the specificityAlso known as the specificity

Page 42: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Alpha (Alpha () and Beta () and Beta () ) cont.cont.

Significance levelSignificance level– A statement of how unlikely a result must be, if the null hypothesis is true, to A statement of how unlikely a result must be, if the null hypothesis is true, to

be considered significant. be considered significant. – Need to be declare in advance, before looking at the data, preferably before Need to be declare in advance, before looking at the data, preferably before

data collectiondata collection– Three most commonly used criteria of probabilities: Three most commonly used criteria of probabilities:

0.05 (5%, 1 in 20), 0.05 (5%, 1 in 20), 0.01 (1%, 1 in 100), and 0.01 (1%, 1 in 100), and 0.001 (0.1%, 1 in 1000)0.001 (0.1%, 1 in 1000)

P-valueP-value– The probability of having a results of as extreme as being obtained, given The probability of having a results of as extreme as being obtained, given

that the null hypothesis is truethat the null hypothesis is true– Quantify it based on the data and the hypothesisQuantify it based on the data and the hypothesis– This is the evidence for making the decision whether to reject or not reject This is the evidence for making the decision whether to reject or not reject

the null hypothesisthe null hypothesis– Reject the null hypothesis if the p-value less than the predefined level of Reject the null hypothesis if the p-value less than the predefined level of

significant and not reject otherwise significant and not reject otherwise

Page 43: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Alpha (Alpha () and Beta () and Beta () ) cont.cont.

Page 44: Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Q & AQ & AThank you