Top Banner
1 © 2006 Inferential statistics
101

© 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

1© 2006

Inferential statistics

Page 2: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 20062

In inferential statistics

• Data from samples are used to make inferences about populations

• Researchers can make generalizations about an entire population based on a smaller number of observations

• However, the sample means will not all be the same when repeated random samples are taken from a population

Page 3: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 20063

Sampling distributions

• If many different samples were taken from a population, it would produce a distribution of sample means

• If repeated enough times, the distribution would take on a normal shape– Even if the underlying population is not

normal

• If repeated an infinite number of times, it would be called a sampling distribution

Page 4: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 20064

Sampling distributions (cont.)

• Which of the sample means is truly the population mean?– It would be useful to know, but an exact figure

is not possible

• The population mean can be inferred from the sample – The sample mean is an estimate– Referred to as the point estimate

Page 5: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 20065

Sampling distributions (cont.)

• Because sampling distributions are normal, the properties of the normal distribution can be used – e.g., the 68.3, 95.5, 99.7 proportion of the

area under the curve

Page 6: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 20066

Standard error of the mean (SEm)

• The spread of means around the mean of a sampling distribution

• Can be estimated from the sample – SEm is calculated by dividing the SD of the

sample by the square root of the number of units in the sample

nSm SE

Page 7: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 20067

SEm (cont.)

• SEm is higher when

– The sample’s SD is large or – The sample size is small

• Lower when – SD is a small or – The sample size is large

• A small SEm is preferable because generalizations are more precise

Page 8: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 20068

Confidence Intervals (CIs)

• A CI is a range of values that is likely to contain the population parameter that is being estimated (e.g., the mean)

• The probability that this range of values contains the population parameter is typically 95% – Thus, the 95% confidence interval

Page 9: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 20069

Confidence Intervals (CIs)

-3 -2 -1 0 +1 +2 +3

Page 10: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200610

CIs (cont.)

• One can have 95% confidence that the value of the true mean lies within the calculated interval (i.e., 95% CI)

Page 11: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200611

Calculating a 95% CI

1. Find the z-score (using a z-table) that corresponds to the area under the distribution that includes 95% of all values (e.g., z = ±1.96 for a 95% CI)

2. Multiply the z-scores by the SEm

3. Add the product to the sample mean to find the upper limit of the CI and subtract to find the lower limit

Page 12: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200612

Size (width) of CIs

• The size of the CI is related to the size of the sample and the size of the data variation– Small samples & large variation = larger CIs – Large samples & small variation = smaller CIs

Page 13: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200613

Hypothesis testing

• A hypothesis is an assumption that appears to explain certain events, which must be tested to see whether it is true

• Research hypothesis – a.k.a., alternative hypothesis – Denoted H1 – The research hypothesis is not tested directly

• Instead the null hypothesis (H0) is tested

Page 14: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200614

Hypothesis tests

• Depending on the outcome of the test of H0, there is either support for or against the research hypothesis

• Hypothesis testing involves the comparison of the means of groups in an experiment– The objective is to find out whether they are

significantly different from each other

Page 15: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200615

Hypothesis tests (cont.)

• When comparing the means of an active treatment group and a control group, one looks for a difference – The treatment may produce a better outcome

leading to a higher mean than the control group

– The difference may appear real, but it may be due to chance

– Statistical tests verify if it is real

Page 16: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200616

The null hypothesis

• H0 states that there is no difference between the group means

• H1 is accepted only if the null hypothesis proves to be unlikely – Typically it must be at least 95% unlikely – If H0 is unlikely, it is rejected

• Not unlike the innocent until proven guilty concept in our legal system

Page 17: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200617

A hypothetical neck pain study

• Patients are treated with chiropractic vs. usual medical care – Outcome measure is the Neck Disability Index

(NDI)– H1

• Chiropractic patients will have lower mean NDI scores after treatment

– H0 • There is no difference between mean NDI scores

Page 18: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200618

Hypothetical study (cont.)

• Results – Mean NDI scores of chiropractic patients

• 28 before, 10 after treatment

– Mean NDI scores medical patients • 29 before, 15 after treatment

• Chiropractic care appears to be better– But is there enough difference to rule out

chance– Must perform statistical tests to find out

Page 19: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200619

30

20

10

0

Hypothetical study (cont.)N

DI

scor

e

Baseline Outcome

ChiropracticMedical

Is this difference enough to be meaningful?

Is this difference enough to be meaningful?

Page 20: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200620

Statistical significance

• The results of a study (i.e., the difference between groups) are unlikely to be due to chance – At a specified probability level, referred to as

alpha () is the probability of incorrectly rejecting a

null hypothesis

• If the results are not due to chance, H0 is rejected and H1 is accepted

Page 21: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200621

Statistical significance (cont.)

• It must be at least 95% unlikely that H0 is true before it can be rejected

• There is still a 5% chance that H0 would be rejected, when it was actually true

• Accordingly, P values must be equal to or less than 5% in order for the results of a study to reach a level of statistical significance

Page 22: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200622

Statistical significance (cont.)

• The level of significance (alpha level) is not the same as the P value– The alpha level must be set before the study

begins – The P value is calculated at the completion of

the study and must be ≤ to the alpha level in order to reach statistical significance

Page 23: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200623

Statistical significance (cont.)

• Even when studies are not statistically significant, there is a 1:20 chance that significant results would occur if the study was repeated 20 times

• Fishing– When researchers perform a lot of statistical

tests on their data – Increases the chance that at least one of the

tests will wrongly reach statistical significance

Page 24: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200624

Type I & II errors

• Type I error (a.k.a., alpha error)– Rejecting a true null hypothesis– The probability of making a Type I error is

equal to the value of α

• Type II error (a.k.a., beta error )– Failure to reject a false null hypothesis– The probability of making a Type II error is

equal to the value of beta ()

Page 25: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200625

Type I & II errors (cont.)

Consequences of accepting or rejecting true and false null hypotheses

Consequences of accepting or rejecting true and false null hypotheses

Page 26: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200626

Type I & II errors (cont.)

• There is a trade-off between the likelihood of a study resulting in a Type I error versus a Type II error

• As alpha becomes smaller, the chance of making a Type I error decreases

• Whereas the chance of making a Type II error increases – Because it is more likely that a false H0 will

not be rejected

Page 27: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200627

Type I & II errors (cont.)

The 0.05 alpha level is a compromise between Type I and Type II errors

Page 28: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200628

Power

• The probability of correctly rejecting a false H0

– Related to error – Power is equal to 1-

• Power depends on sample size, the magnitude of the difference between group means, and the value of α

Page 29: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200629

Power (cont.)

• Power increases as – Sample size increases

• Only to a certain extent, then it becomes a waste of resources

– The difference between group means increases

– α increases

• A power value of 0.80 is often sought by researchers

Page 30: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200630

Power (cont.)

• Power may be calculated after a study has been completed (post hoc)– If low power is detected during post hoc

power analysis and H0 was not rejected, it may be grounds to repeat the study using a larger sample

Page 31: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200631

Confidence intervals and hypothesis testing

• If the value specified as the difference between group means in the null hypothesis is included in the 95% CI, then H0 should not be rejected

– The test is not statistically significant

• H0 states there is no difference between group means, so the specified no difference value is always zero

Page 32: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200632

CIs and hypothesis testing (cont.)

• If zero is not included in the 95% CI, the null hypothesis should be rejected – The test is statistically significant

• CIs are becoming more prevalent in the health care literature because they convey more information than P values alone

Page 33: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200633

CIs and hypothesis testing (cont.)

• Example study– Brinkhaus et al.– Acupuncture was more effective in improving

pain on VAS* than no acupuncture in chronic low back pain patients

• Difference, 21.7 mm (95% CI 13.9 to 30.0)

– But no statistical difference between acupuncture and minimal acupuncture

• Difference, 5.1 mm (95% CI -3.7 to 13.9)* Visual analog scale

Page 34: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200634

Clinical significance a.k.a., practical significance

• Do the findings of a study really matter in clinical situations

• Sometimes a study is statistically significant, but the findings are not important in clinical terms– Large studies with small differences between

groups can generate statistically significant findings that are not meaningful to practitioners

Page 35: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200635

Clinical significance (cont.)

• For example – A study found a statistically significant

difference between mean Headache Disability Inventory (HDI) scores of only 10 points

– Yet at least a 29-point change must occur from test to retest before the changes can be attributed to a patient’s treatment

• The HDI is not very responsive to change

Page 36: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200636

Commonly encountered statistical tests

• Statistical tests determine the probabilities associated with relationships in studies– Are the results real or merely due to chance?

• t-test, ANOVA, and chi-square are common in journal articles– Familiarity with these tests is helpful in the

appraisal of articles

Page 37: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200637

t-test

• Used to find out whether the means of two groups are statistically different

• Results are not entirely black-and-white – Only indicates that the means are probably

different– Or, that they are probably the same, if the

study fails to find a difference

• The t-test can be used for a single group by comparing the mean with known values

Page 38: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200638

t-test (cont.)

• The actual differences between means is considered

• Also the amount of variability of the scores– A high degree of variability of group scores

can obscure the differences between means

Page 39: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200639

t-test (cont.)

• The differences between means are the same in both examples, but the variability of group scores differs

• The lower example would be much more likely to reach statistical significance because of the narrow spread

Page 40: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200640

Assumptions of the t-test

• The data should be normal and involve interval or ratio measurement

• Groups should be independent

• The variances of groups should be equal

• When the sample size is large enough (about 30 subjects) violations of these assumptions are less important

Page 41: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200641

Alternatives to the t-test

• The t-test for unequal variances

• Non-parametric tests for use with skewed data– Mann-Whitney U test– Wilcoxon test

Page 42: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200642

The t-score

• The t-score (a.k.a., t-ratio) is similar to the z-score– However, the t-distribution and a t-table are

used– This is because the SD of the population is

estimated from the sample, whereas it is known in the z-distribution

• P values are found using the calculated t-score and a t-table

Page 43: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200643

The t-score (cont.)

• t-tables consider the number of subjects in the groups

• Referred to as degrees of freedom (df) – Signifies the number of subjects in each

group minus 1– Minus 2 when there are two groups – Thus, a study that compares the means of 2

groups that involve 30 subjects has 28 df

Page 44: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200644

The t-table

• t-distributions eventually become nearly normal when many subjects are included– As a result, t-tables usually only go to 100 df

• Alpha levels are shown for – When α is all in one tail (α1 or one-tailed test )

– When α is spit between the tails (α2 or two-tailed test)

Page 45: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200645

t-table showing critical values for t. (only to 15 df)

df

1 0.10 0.05 0.025 0.01 0.005 0.001 0.0005

2 0.20 0.10 0.05 0.02 0.01 0.002 0.001

1 2 3 4 5 6 7 8 9 101112131415…Etc.

3.0781.8861.6381.5331.4761.4401.4151.3971.3831.3721.3631.3561.3501.3451.341

6.3142.9202.3532.1322.0151.9431.8951.8601.8331.8121.7961.7821.7711.7611.753

12.714.3033.1822.7762.5712.4472.3652.3062.2622.2282.2012.1792.1602.1452.131

31.826.9654.5413.7473.3653.1432.9982.8962.8212.7642.7182.6812.6502.6242.602

63.669.9255.8414.6044.0323.7073.4993.3553.2503.1693.1063.0553.0122.9772.947

318.3022.33010.2107.1735.8935.2084.7854.5014.2974.1444.0253.9303.8523.7873.733

636.6231.6012.928.6106.8695.9595.4085.0414.7814.5874.4374.3184.2214.1404.073

To 100

Critical value for 10 df and α2 = 0.05

Page 46: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200646

One-tailed test vs. two-tailed test

• One-tailed test (a.k.a., directional test) – Alpha is all in one tail– The researcher specifies the direction the test

results will go before the data analysis • Either higher or lower

• Two-tailed test (a.k.a., non-directional test) – Alpha is split between the tails– The study’s results could go either way

Page 47: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200647

One-tailed test vs. two-tailed test (cont.)

• In a non-directional test, the researcher wants to know if the means are different– For example, in a study comparing

manipulation with acupuncture for tension headaches, the results could go either way

– That is the case with almost all studies that compare treatments

Page 48: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200648

One-tailed test vs. two-tailed test (cont.)

• It is easier to reach statistical significance using a directional test– Consequently it is tempting for researchers to

use directional hypotheses

• The opposite direction must be of no interest to the researcher – But it is almost always possible for the test to

go either way when comparing treatments

Page 49: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200649

Calculating the t-score

• Is a ratio of the difference between group means and the variability of the data

• Variability is represented by the standard error of the difference ( ) rather than the SD

• Thus

or

21 XXS

21 XX

21

S

XX

t

data theofy Variabilit

means groupbetween difference Thet

Page 50: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200650

The t-score

• For the t-test result to be statistically significant– The difference between the means must be

large (the numerator) – And the variability of the data must be small

(the denominator)

• This results in a t-score that is larger than the critical value of t in the t-table

Page 51: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200651

The t-score (cont.)

• Remember Big t-value

Small P value

Statistical significance

Page 52: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200652

Steps involved in the t-test

• Calculate the means and standard deviations of the groups’ outcomes

• Calculate the t-ratio• Check to see if the calculated t is statistically

significant using a t-table• It is significant if t is greater than the critical

value of t at the 0.05 level • If so, the group means are considered different

Page 53: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200653

Reporting t-test

Page 54: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200654

Paired t-test

• Groups are dependent – The same subjects are in each of the groups

• e.g., repeated measures studies

– Or subjects are matched• e.g., twins or when subjects are very much alike

Page 55: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200655

Analysis of variance (ANOVA)

• Used to compare means when more than two groups are involved

• Repeating t-tests increases the probability of producing a Type I error

• ANOVA can only compare one outcome variable – Univariate

Page 56: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200656

ANOVA (cont.)

• ANOVA provides information about – Whether there are any significant

differences among the group means– Whether any of the particular groups differ

from each other – Whether the differences are relatively big or

small

Page 57: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200657

The F-ratio

• Not unlike the t-ratio, the F-test compares the variance between the groups with the variance within the groups

F =variance between groups

variance within the groups

Page 58: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200658

The F-ratio (cont.)

• Within-group variance is related to sampling error and ordinary differences between subjects– For instance, many physical characteristics

vary normally (e.g., cortisol levels, pulse rate, and blood pressure)

• Between-group variance is related to the differences between the means

Page 59: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200659

Between and within-group variance

The means of3 groups arecompared

Page 60: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200660

The F-ratio (cont.)

• If the F-ratio is small, the groups are probably not significantly different

• If it is big, at least two of the groups are significantly different

• The F- test does not identify which of the groups are different

– Comparison tests are necessary

Page 61: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200661

Comparison tests

• Compare the group pairs (pairwise)• Common comparison tests include

– Tukey• Used if the groups are of unequal size

– Bonferroni• For both equal and unequal group sizes

– Scheffé• Is very conservative to minimize the risk of type I

error

Page 62: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200662

Microsoft Excel has several built-in statistical functions

• Excel can be used to calculate ANOVA, t-ratio, and others

• Select Data Analysis from the Tools menu and input the data

Page 63: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200663

ANOVA table

• There is not much difference between the means of the medical and PT groups (6.75 and 6.38)

• But the mean of the chiropractic group (13.63) appears to be different

Anova: Single Factor

SUMMARY

Groups Count Sum Mean Variance

Chiropractic care 8 109 13.625 8. 55357

Medical care 8 54 6.75 7.92857

PT care 8 51 6.375 11.125

Page 64: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200664

ANOVA table (cont.)

• The calculated value of F (14.5) exceeds the critical value of F (3.5), so the group means are different overall (P < 0.001)

ANOVA

Source of Variation SS df MS F P value F crit

Between Groups 266.5833 2 133.2917 14.4845 0.00011 3.4668

Within Groups 193.25 21 9.2024

Total 459.8333 23        

Sum of squares

Sum of squares

Mean squares

Mean squares

The critical value of F

The critical value of F

Page 65: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200665

Comparison test results

Tukey HSD

(I) Type of care

(J) Type of care

Difference (I-J)

Std. Error

P value

95% Confidence Interval

Chiro MD 6.87500* 1.51677 .001 3.0519 to 10.6981

PT 7.25000* 1.51677 .000 3.4269 to 11.0731

MD Chiro -6.87500* 1.51677 .001 -10.6981 to -3.0519

PT .37500 1.51677 .967 -3.4481 to 4.1981

PT Chiro -7.25000* 1.51677 .000 -11.0731 to -3.4269

MD -.37500 1.51677 .967 -4.1981 to 3.4481

* The mean difference is significant at the .05 level.

Page 66: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200666

Assumptions of ANOVA test

• Normally distributed data • Groups should be independent • Variances of groups should be equal• If not, a nonparametric test should be used

– Kruskal-Wallis test • When variances are unequal

– Friedman test• When paired groups are involved

Page 67: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200667

Chi-square test

• Used to test hypotheses involving categorical data

• There are 2 versions – Chi-square goodness of fit

• Determines if observed frequencies of occurrence differ from what would be expected by chance

– Chi-square test of independence • Tests to see if frequencies for one category differ

significantly from those of another category

Page 68: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200668

Chi-square goodness of fit

• Called the goodness of fit test because it tests whether observed frequencies “fit” against the expected frequencies

• For example– If a sample of Americans found 60 males and

40 females, would that be statistically significantly different from what would normally be expected (50/50)?

Page 69: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200669

Goodness of fit example (cont.)

• Chi-square (Χ2) calculates the difference between the observed and expected frequencies, then divides that value by the expected frequencies to generate the Χ2 statistic

Χ2 =Σ(O-E)2

EO – observed

frequencies E – expected frequencies

Page 70: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200670

Goodness of fit example (cont.)

• Following are the calculations for the 100 Americans example

Χ2 =

(60 – 50)2

+

(40 – 50)2

50 50

ObservedObserved ExpectedExpected

Χ2 = 4.0

Page 71: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200671

Goodness of fit example (cont.)

• A chi-square table is used to see if the results are statistically significant – Only if the critical value is exceeded (3.84 in

this case)

• df is the number of categories minus 1

• The calculated Χ2 is 4– So, the sample is different from what was

expected

Page 72: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200672

Chi-square test of independence

• Frequencies of one variable are compared with another to see if they differ significantly

• A 2 X 2 contingency table (a.k.a., cross-tabulation table) is used

Page 73: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200673

A 2 X 2 contingency table

Yes No Row Total

Yes a b a+b

No c d c+d

Column Total a+c b+d a+b+c+dGrand Total

Var

iabl

e 1

Variable 2

Page 74: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200674

Example hypothetical study

• Two groups of patients are treated using different spinal manipulation techniques – Gonstead vs. Diversified

• The presence or absence of pain after treatment is the outcome measure

• Two categories– Technique used– Pain after treatment

Page 75: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200675

Gonstead vs. Diversified example - Results

Yes No Row Total

Gonstead 9 21 30

Diversified 11 29 40

Column Total 20 50 70Grand Total

Tec

hniq

ue

Pain after treatment

9 out of 30 (30%) still had pain after Gonstead treatment and 11 out of 40 (27.5%) still had pain after Diversified, but is this difference statistically significant?

Page 76: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200676

Gonstead vs. Diversified example (cont.)

• Calculating Χ2 • First find the expected values for each cell

• Then calculate the Χ2 statistic using the cells’ expected (E) values and the previously provided Χ2 formula

Expected (E) =Row total Χ Column total

Grand total

Page 77: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200677

Gonstead vs. Diversified example (cont.)

• To find E for cell a (and similarly for the rest)

Yes No Row Total

Gonstead 9 21 30

Diversified 11 29 40

Column Total 20 50 70Grand Total

Tec

hniq

ue

Pain after treatment

Multiply row totalMultiply row total

Times column totalTimes column total

Divide by grand totalDivide by grand total

Page 78: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200678

Gonstead vs. Diversified example (cont.)

• Find E for all cells

Yes No Row Total

Gonstead9

E = 30*20/70=8.621

E = 30*50/70=21.430

Diversified11

E=40*20/70=11.429

E=40*50/70=28.640

Column Total 20 50 70Grand Total

Tec

hniq

ue

Pain after treatment

Page 79: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200679

Gonstead vs. Diversified example (cont.)

• Use the Χ2 formula with each cell and then add them together

Χ2 = 0.0186 + 0.0168 + 0.0316 + 0.0056 = 0.0726

(9 - 8.6)2

8.6(21 - 21.4)2

21.4=

0.0186 0.0168

(11 - 11.4)2

11.4(29 - 28.6)2

28.60.0316 0.0056

Page 80: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200680

Gonstead vs. Diversified example (cont.)

• Find df and then consult a Χ2 table to see if

statistically significant– df = (number of categories for variable 1) -1 X

(number of categories for variable 2) -1

• There are two categories for each variable in this case, so df = 1

• Critical value at the 0.05 level and one df is 3.84 – Therefore, Χ2

is not statistically significant

Page 81: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200681

Χ2 required conditions

• Observations must be independent – The total number of observed frequencies

should not be higher than the number of subjects in the study

• No small expected frequencies – Expected frequencies less than one or less

than five in more than 20 percent of cells are too small

Page 82: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200682

Χ2 requirements (cont.)

– Fisher's exact test • An alternative to the chi-square test that is used

when expected frequencies are too small• All that is needed is at least one data value in each

row and one data value in each column

• No extremely small or extremely large samples – Extremely small samples may overlook

obvious false null hypotheses and extremely large samples may identify trivial differences

Page 83: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200683

Correlation

• A measure of mathematical relationships that may exist between two or more variables – i.e., if one variable increases or decreases,

the other one will also increase or decrease a specific amount

• Pearson’s correlation coefficient (r) is commonly used

Page 84: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200684

Correlation (cont.)

• Correlation coefficient values range from -1 to +1 +1 = perfect positive correlation -1 = perfect negative correlation

• The closer r is to +1 or -1, the more closely variables are related

Page 85: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200685

Correlation (cont.)

• Positive r values– Variables tend to go up or down together

• Negative r values– Variables tend to go up and down in

opposition

• An r value of 0– There is no mathematical relationship

between variables

Page 86: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200686

Correlation (cont.)

• The units of measurement that are used do not affect correlation coefficient calculations – e.g., height and weight results will be the

same whether in and lb or cm and kg are used in the calculation

Page 87: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200687

No cause-and-effect

• A strong relationship between two variables does not mean that one caused the other to change

• For instance, there is a strong relationship between coffee drinking and developing lung cancer – Actually, heavy coffee drinkers tend to be

heavy smokers– Smoking is the actual cause

Page 88: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200688

Scatterplots

• An X-Y graph with symbols that represent the values of two variables

Regression line

Regression line

Page 89: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200689

Examples

Positive correlation slopes upward

Positive correlation slopes upward

Negative correlationslopes downward

Negative correlationslopes downward

Page 90: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200690

Examples (cont.)

No correlationNo correlation

Page 91: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200691

Scatterplots (cont.)

• Show the form, direction, and strength of the relationship between variables

• Its form may be linear, but can also be curvilinear or nonlinear

• A correlation weakens after a certain point when data is curvilinear

Page 92: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200692

Curvilinear example

• As people age they get stronger to a certain point, but as they continue to age, they eventually begin to weaken

Page 93: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200693

Outliers

• Extreme values that are located far away from the group of data on a scatterplot

• Outliers can strongly influence the slope of the regression line – And the value of the correlation coefficient

• Authors should adequately discuss outliers– Why they occurred– How they were dealt with

Page 94: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200694

Outliers (cont.)

• Outliers are obvious on a scatterplot

Outlier Outlier

Page 95: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200695

Coefficient of determination

• Is the correlation coefficient squared– Symbolized as r2

• Only positive values are possible (because it is squared) – Ranging from 0 to 1

• Denotes how much of the variation in one variable can be explained by the other variable

Page 96: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200696

Coefficient of determination

• Example– If a study on the relationship between the

amount lifted at work and the incidence of low-back pain reported r2 as 0.65

– One could say that 65% of the variability in the incidence of low-back pain was explained by the amount workers lifted

– Other factors are responsible for the remaining 35% variability

Page 97: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200697

Regression

• Regression analysis– Calculation of the line of best fit passing

through a set of data– An equation is generated that describes the

line of best fit (a.k.a., least squares line)

• Using the equation, predictions can be made about the direction and amount variables change

Page 98: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200698

Regression (cont.)

• A regression line is fitted by minimizing the sum of squared deviations of the data points from the least squares line

• The regression equation is Y = a + bX, where – a is the Y intercept– b is the slope of the line – X is the value of the (predictor) variable

Page 99: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 200699

The value of Y can be calculated from a given value of X

a b

Regression (cont.)

Y

X

Page 100: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 2006100

The line is positioned so that the distances of all deviations are as short as possible

The regression line

Page 101: © 2006 1 Inferential statistics. © 2006 2 Evidence-based Chiropractic In inferential statistics Data from samples are used to make inferences about populations.

Evidence-based Chiropractic © 2006101

Multiple regression

• Frequently outcomes are affected by more than one predictor variable

• The multiple regression equation is similar to simple regression, but with more than one value for b. Thus, the equation is Y = a + b1X1 + b2X2 + . . . + bkXk, where

• X1 is the first predictor variable, X2 is the second, and Xk continues for as many predictor variables as are involved