Transcript

Statistical Writing*

Tables and Figures

Sven Sandin,Dpt of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm

Scope

Tables and figures - General comments

The primary table: table 1

The work flow

Figure presentations to use and to avoid

Presentations : Figures & Tables

Summarize and focus results

Facilitate reproducing results

Help interpreting the RESULTS - Avoid busy tables … not all data are interesting

Table & Figure must be able to stand by itselfTitle - short, clear Footnotes explaining ALL abbreviations ….. Underlying model be clear Categorical covariates p-value: What's the hypothesis ?

Presentations: "Primary" table

Allow comparison of treatments (exposures)

Ideally (randomized) these should be "similar" ....

One column for each treatment

One row for each covariate

Confounding ...

Modifying of effect - sub tables

Presentations: "Primary" table

Allow comparison of treatments (exposures)Ideally these should be "similar" ....

One column for each treatmentOne row for each covariateConfounding ... Modifying of effect - sub tables

OutcomeTreatment

Confoundingcovariate

Tablecolumn

Table row

Presentations: "Primary" table

Allow comparison of treatments (exposures)Ideally these should be "similar" ....

One column for each treatmentOne row for each covariateConfounding ... Modifying of effect - sub tables

OutcomeTreatment

Confoundingcovariate

Tablecolumn

Table row

M

EXAMPLE: "Primary" table

Trolle-Lagerros, Y., Mucci, L. A., Kumle, M., Braaten, T., Weiderpass, E., Hsieh, C.-C., Sandin, S. … Adami, H.-O. (2005). Physical activity as a determinant of mortality in women. Epidemiology, 16(6), 780–785.

EXAMPLE: Table summarizing results

Trolle-Lagerros, Y., Mucci, L. A., Kumle, M., Braaten, T., Weiderpass, E., Hsieh, C.-C., Sandin, S. … Adami, H.-O. (2005). Physical activity as a determinant of mortality in women. Epidemiology, 16(6), 780–785.

Presentations: "Primary" table

Allow comparison of treatments (exposures)Ideally these should be "similar" ....

One column for each treatmentOne row for each covariateConfounding ... Modifying of effect - sub tables

Generally, don't test for baseline differences !If important ----> In the model already ---> No need to test !If not important ----> p-value not important ---> No need to test !Not known ---> No need to test !

p-values vs estimates ---> No need to test ! Estimate !confuse strength of association with importanceInflation of overall significance level ...

Presentations: Table work process

One-to-one relation

Data ----> Computer program ---> Table results

MethodDon't point-and-click (choice of software)Rerun all results each time ....... or use log bookIn your draft: Make notes about source, date...

Reproducibility !

Presentations : Tables

LayoutDecimalsAvoid using shading and colors

MeasuresNumber of missing data must be clearSurvival-type of analysis: Person year is the relevant measure

Binary data: Show one of the proportions, e.g. males

ContinuousMean or median (both to show symmetry)Q1 and Q3 or P10 and P90 etc. instead of Min and MaxSD not useful for asymmetric data

Presentations: Figures

Figures - examples

Continuous - Box plots

Ordinal - Segmented bar charts

Agreement - Altman Bland

Interactions

Confidence intervals

Bar charts with SD errors and other things to avoid

Figures - Box plot

Qualities

Meaning for any continuous data

Efficient when compare several groups

Minimizes data reduction

Interpretation

Half of the data between Q1 and Q3

Half above and half below the median

Difference between mean and median indicate lack of symmetry

Whiskers to ??? Tukey or percentiles

Outliers

Figures - Box plot

Figures - Box plot

#Data simulatedg=gl(10, 100, n*100) rnorm(n*100) + sqrt(as.numeric(g))boxplot(split(x,g), notch=TRUE)

Figures - Box plot

Wilcoxon rank sum test

Figures - Bar chart ± SD

Figures - Bar chart ± SD

t - test, a

ssuming symmetric data

Bar chart with SD errors

Often misinterpreted to be "different" or "not different" if error bars overlap or not

Why ± 1*SD ? it's 1.96 or 2 times SD that is relevant

A lot of ink to represent one (two) numbers: Mean and SD

Assume symmetry and normal distribution

Use the box plot instead !

Bar chart vs Box plot

Qualities

Meaning for any continuous data

Efficient when comparing several

groups

Minimizes data reduction

Interpretation

Half of the data between Q1 and Q3

Half above and half below the

median

Difference between mean and

median indicate lack of symmetry

Outliers

Qualities

NOT for any continuous data

NOT efficient when comparing

several groups

BIG reduction

Interpretation

?

?

?

Can't evaluate lack symmetry

Extremely sensitive to single outliers

Box plots Bar chart ± SD

Figures - Ordinal Scale

What do we want to achieve ?

What is an ordinal scale

Summarize data - not reducing

Evaluate distribution - Also cumulative

Change in distributions

Avoid problem with scattered tables

Integrated part of statistical analysis - test

Binary ?

Nominal ?

Figures - Ordinal ScaleICSI frozen, surgeryICSI fresh, surgery

IVF fresh

IVF frozen

ICSI fresh

ICSI frozen

N=12,775N=9,457N=142 N=1,699 N=6,886

Figures - Ordinal ScaleICSI frozen, surgeryICSI fresh, surgery

IVF fresh

IVF frozen

ICSI fresh

ICSI frozen

N=12,775N=9,457N=142 N=1,699 N=6,886

Wilcoxon rank sum test

Figures - Ordinal Scale

Trolle-Lagerros, Y., Mucci, L. A., Kumle, M., Braaten, T., Weiderpass, E., Hsieh, C.-C., Sandin, S. … Adami, H.-O. (2005). Physical activity as a determinant of mortality in women. Epidemiology, 16(6), 780–785.

Figures - Interaction

Trolle-Lagerros, Y, Mucci, LA, Kumle, M, Braaten, T, Weiderpass, E, Hsieh, CC, Sandin, S … Adami, HO (2005). Physical activity as a determinant of mortality in women. Epidemiology, 16(6), 780–785

Figure - Confidence intervals

Figure - Confidence intervals on log scale

Sandin, S, Nygren, KG, Iliadou, A, Hultman, CM, Reichenberg, A (2013). Autism and mental retardation among offspring born after in vitro fertilization. JAMA, 310(1), 75–84

Figure - Confidence intervals on log scale

Knight, A, Sandin, S, Askling, J (2010). Occupational risk factors for Wegener’s granulomatosis: a case-control study. Annals of the Rheumatic Diseases, 69(4), 737–740

Figure - Confidence intervals

Yang, L, Lof, M, Veierød, MB, Sandin, S, Adami, HO, Weiderpass, E (2011). Ultraviolet exposure and mortality among women in Sweden. Cancer Epidemiology, Biomarkers & Prevention: A Publication of the American Association for Cancer Research, Cosponsored by the American Society of Preventive Oncology, 20(4), 683–690

Figure - Confidence intervals

Knight, A, Sandin, S, & Askling, J (2010). Increased risk of autoimmune disease in families with Wegener’s granulomatosis. The Journal of Rheumatology, 37(12), 2553–2558

Figure - Confidence intervals

Overlapping CI's can be statistically significantly different

Scale: Ratio vs absolute (linear)

Tables with several comparisons can be hard to digest

Efficient in picking single effects

Efficient in picking out statistically significant results

Figures - Altman Bland

The problem

In a lab we have just bought a new robot. It is expected to be a lot

more accurate than the old one.

Can we just start using it or do we need to evaluate ? How ?

There are two variables measuring the effect of disease.

Can they be used interchangeable ?

Figures - Altman Bland

The problem

Compare two methods

What is our best guess of the truth ?

X and X-Y correlated

Y and X-Y correlated

Figures - Altman Bland

The problem

Compare two methods

What is our best guess of the truth ?

X and X-Y correlated

Y and X-Y correlated

The Figure

Calculate the mean X and Y

Calculate the difference X-Y

Plot Mean vs Difference

Draw reference line at D=0

Mean and Difference un-correlated

EXAMPLE : Altman-Bland

Bexelius, C, Löf, M, Sandin, S, Trolle Lagerros, Y, Forsum, E, Litton JE (2010). Measures of physical activity using cell phones: validation using criterion methods. Journal of Medical Internet Research, 12(1)

top related