Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Statistics for Health Research Research
43

# Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Jan 04, 2016

## Documents

#### fat groups

Welcome message from author
Transcript
• Statistical Inference for more than two groups Peter T. DonnanProfessor of Epidemiology and BiostatisticsStatistics for Health Research

• Tests to be coveredChi-squared testOne-way ANOVALogrank test

• Significance testing general overview

Define the null and alternative hypotheses under the study

Acquire data

Calculate the value of the test statistic

Compare the value of the test statistic to values from a known probability distribution

Interpret the p-value and draw conclusion

• Categorical data > 2 groups

Unordered categories Nominal- Chi-squared test for association

Ordered categories - Ordinal - Chi squared test for trend

• Example

Does the proportion of mothers developingpre-eclampsia vary by parity (birth order)?

• Contingency table (r x c)

Pre-eclampsiaBirth Order 1st 2nd 3rd 4th No Yes1170 (79.4%)278 (84.8%)83 (86.5%)86 (92.4%) 304 (20.6%) 50 (15.2%)13 (13.5%) 7 (7.5%)

• Null hypothesis: No association between pre-eclampsia and birth orderNull hypothesis: There is no trend in pre-eclampsia with parityNull Hypotheses

• Test of associationTest of linear trend

• Strong association between pre-eclampsia and birth order (2 = 15.42, p = 0.001)Significant linear trend in incidence of pre-eclampsia with parity (2 = 15.03, p < 0.001)Note 3 degrees of freedom for association test and 1 df for test for trendConclusions

• Contingency table (r x c)

Pre-eclampsiaBirth Order 1st 2nd 3rd 4th No Yes1170 (79.4%)278 (84.8%)83 (86.5%)86 (92.4%) 304 (20.6%) 50 (15.2%)13 (13.5%) 7 (7.5%)

• Tables can be any size. For example SIMD deciles by parity would be a 10 x 4 tableBut with very large tables difficult to interpret tests of associationCrosstabulations in SPSS can give Odds ratios as an option with row or column with two categoriesContingency Tables (r x c)

• Numerical data > 2 groups

Compare means from several groups

Single global test of difference in meansAlso test for linear trend

1-way analysis of variance (ANOVA)

• Extend t-test to >2 groups i.e Analysis of Variance (ANOVA)Consider scores for contribution to energy intake from fat groups, milk groups and alcohol groups

Does the mean score differ across the three categories of intake groups?Koh ET, Owen WL. Introduction to Nutrition and Health Research Kluwer Boston, 2000

• One-Way ANOVA of scoresContributor to Energy Intake Alcoholn=6Mean=4.22n=6Mean=0.167FatMilkn=6Mean=2.01

• One-Way ANOVA of ScoresThe null hypothesis (H0) is there are no differences in mean score across the three groupsUse SPSS One-Way ANOVA to carry out this test

• Assumptions of 1-Way ANOVA1. Standard deviations are similar2. Test variable (scores) are approx. Normally distributedIf assumptions are not met, use non-parametric equivalent Kruskal-Wallis test

• Results of ANOVA ANOVA partitions variation into Within and Between group components

Results in F-statistic compared with values in F-tables

F = 108.6, with 2 and 15 df, p

• Results of ANOVA The groups differ significantly and it is clear the Fat group contributes most to energy score with a mean = 4.22

Further pair-wise comparisons can be made (3 possible) using multiple comparisons test e.g. Bonferroni

• Example 2

Does income vary by highest levelof education achieved?

• H0: no difference in mean income by education level achieved

H1: mean income varies with education level achievedNull Hypothesis and alternative

• Assumptions of 1-Way ANOVAStandard deviations or variances are similar Test variable (income) are approx. Normally distributed

If assumptions are not met, use non-parametric equivalent Kruskal-Wallis test

• Table of Mean income for each level of educational achievement

• Analysis of Variance TableF-test givesP < 0.001 showing significant difference between mean levels of education

• Table of each pairwise comparison.Note lower income for did not complete school to all other groups.All p-values adjusted for multiple comparisons

• Summary of ANOVA ANOVA useful if number of groups with continuous summary in eachSPSS does all pairwise group comparisons adjusted for multiple testingNote that ANOVA is just a form of linear regression see later

• Extending Kaplan-Meier and logrank test in SPSSYou need to specify:Survival time time from surgery (tfsurg)Status Dead = 1, censored = 0 (dead)Factor Dukes stage at baseline (A, B, C, D, Unknown)Select compare factor and logrank Optionally select plot of survival

• Implementing Logrank test in SPSS

• Select options to obtain plot and median survivalSelect Compare Factor to obtain logrank testSelect linear trend for this test

• Overall Comparisons Chi-Square dfSig.Log Rank (Mantel-Cox) 80.534 1.000The vector of trend weights is -2, -1, 0, 1, 2. This is the default.The test for trend in survival across Dukes stage is highly significant

• Interpret SPSS outputNote the logrank statistic, degrees of freedom and statistical significance (p-value).Note in which direction survival is worst or best and back up visual information from the Kaplan-Meier plot with median survival and 95% confidence intervals from the output.Finally, interpret the results!

• Interpret test result in relation to median survival

Dukes StageMedian Survival (days)Mean Survival(Days)A27701978B17491866C11201304D375646Unknown5811297

• Output form Kaplan-Meier in SPSSNote that SPSS gives three possible tests:Logrank, Tarone-Ware and BreslowIn general, logrank gives greater weight to later events compared to the other two tests. If all are similar quote logrank test.If different results, quote more than one test result

• Editing SPSS outputNote that everything in the SPSS output window can be copied and pasted into Word and Powerpoint.Double-clicking on plots also allows editing of the plot such as changing axes, colours, fonts, etc.

• Diabetic patients LDL dataTry carrying out extended Crosstabulations and ANOVA where appropriate in the LDL dataE.g. APOE genotype

• Colorectal cancer patients: survival following surgery Try carrying out Kaplan-Meier plots and logrank tests for other factors such as WHO Functional Performance, smoking, etc

• Extending test to more than 2 groups Summary

Define H0 and H1

Choosing the appropriate test according to type of variables

Interpret output carefully

Related Documents
##### Biostatistics 602 - Statistical Inference Lecture 26 Final.....
Category: Documents
##### Introduction to Distributions and Probability Peter T....
Category: Documents
##### Practical Missing Data Analysis in SPSS (v17 onwards) Peter....
Category: Documents
##### Statistical Inference Dr. Mona Hassan Ahmed Prof. of...
Category: Documents
##### Donnan Exh 38
Category: Documents
##### Equilibrio Gibbs Donnan
Category: Documents
##### Assessing Survival: Cox Proportional Hazards Model Peter T.....
Category: Documents
##### Crossover Trials: Design and Analysis Peter T. Donnan...
Category: Documents
##### Donnan Exh 37
Category: Documents
##### Statistical inference Statistical inference Its application....
Category: Documents
##### Biostatistics 602 - Statistical Inference Lecture 15 Bayes.....
Category: Documents
##### Donnan Exh 36A
Category: Documents