Top Banner
Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness.
22

Working with Statisticians

Feb 24, 2016

Download

Documents

Dian

Working with Statisticians. At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness. Statisticians come in many shapes and sizes. But. Data formats. Ideally, use a normalized database with validated data entry as part of LIMS… - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Working with Statisticians

Working with Statisticians

At some point, a statistician is likely to be asked to analyze your data. This

can lead to much unhappiness.

Page 2: Working with Statisticians

STATISTICIANS COME IN MANY SHAPES AND SIZES

Page 3: Working with Statisticians
Page 4: Working with Statisticians

BUT

Page 5: Working with Statisticians
Page 6: Working with Statisticians

Data formats

• Ideally, use a normalized database with validated data entry as part of LIMS…

• But 99% of the time => Excel spreadsheet• Some statisticians prefer to work with raw

data (i.e. FCS files) but not common– Scott will cover consistent annotation for raw data

at another lecture

Page 7: Working with Statisticians

Basic principle #1

• Statisticians do not like Excel– The first thing they will try to do is export to a CSV

or delimited file, for import into SAS or R– If this is difficult to do, they will not like you

Page 8: Working with Statisticians

Excel rules for happy statisticians

• 1 worksheet = 1 table• 1 cell = 1 value• Data?• Metadata?• Formatting?• Validation?

Page 9: Working with Statisticians

1 worksheet = 1 table

• A table has column headers and a number of rows and nothing else – it is RECTANGULAR

• Do not put more than 1 table in a worksheet• Do not use non-rectangular tables• Example of good worksheet

Page 10: Working with Statisticians

1 worksheet = 1 table

Page 11: Working with Statisticians

1 cell = 1 value

• Easy to filter by tube, sample or subject• Easy to write validation rules or lookup table

Page 12: Working with Statisticians

1 cell = 1 value

• ID column has 3 different values• Need to do text parsing to recover information

– very error prone

Page 13: Working with Statisticians

Data: column names

• Consistent column names across worksheets– Singlets/Lymphocytes– Singlet/Lymphs– Singlets / Lymphocytes– Singlets/Lymphoctyes

• Use full gating path for column name– Singlets/Lymphocytes/Viable/CD4+/CM/IFN+

Page 14: Working with Statisticians

Data: What to record • Better to have more data than less data

– Sample type (PBMC, whole blood)– Recovery – Viability

• Better to have basic than derived data– Counts better than relative frequencies

• Keep link to raw data for reproducibility– Path to FCS file on server

• Use special indicator for missing data (e.g. NAN), not zero• Can have extra column for notes

– Ideally codified so Error 23 rather than “Sample sat > 8 hours before processing”

Page 15: Working with Statisticians

Data: Versioning

• Do not change the data in the worksheet once it has been handed to statistician.

• If there are errors that must be corrected, make a new copy, label the filename with date and version, and send that to statistician– ArcticRatExperiment_07May2013_Version01.xlsx– ArcticRatExperiment_17May2013_Version02.xlsx

Page 16: Working with Statisticians

Metadata

• Should have SOP document for metadata– How missing data is represented (e.g. NA or blank)– Keys for interpretation – e.g. Table of error codes– Contact person: phone #, email– Metadata can be in 2nd worksheet or separate document

• Gating scheme with labeled gates matching cell subsets used in column names (PDF or PPT)

• Panel information– Antibodies, clones, batches, fluorochromes, peptide mixes

• Path to Flowjo .jo or .xml analysis file

Page 17: Working with Statisticians

Metadata

• There are minimal information standards that should be followed– MiFlowcyte– MIATA

• Google for them if you’re not familiar with them – increasingly these are required by journals for publication, so worth making it an SOP for documentation of results

Page 18: Working with Statisticians

Formatting

• Don’t do it.• Avoid putting information via:– Highlighting– Fancy spacing– Different fonts and font effects– Merging cells– Comments

• Will it survive a round-trip from Excel to CSV and back again?

Page 19: Working with Statisticians

Formatting - Before

Page 20: Working with Statisticians

Formatting - After

Comments are lostHighlighting is lostBad cell formatting is lostMerged cells become missing information

Page 21: Working with Statisticians

Validation

• Can set up validation rules in Excel to minimize data entry errors:– Number range (0, 10000000)

• Can use lookup tables for codes to use– E.g. Error codes with explanation

• If possible, once format for data is decided, get local Excel wizard to create template and lookup rules

Page 22: Working with Statisticians

Questions?

• If no questions and need to kill time, watch Biologist talks to Statistician video– http://www.youtube.com/watch?v=Hz1fyhVOjr4