Post-Genomics Experimental Design
CSC8309 - Gene Expression and Proteomics
Simon Cockell &Cedric Simillion
Outline
• Introduction– Post-Genomic Technologies– The Importance of Design
• Experimental Design– When Design Goes Bad– More Commonly Made Mistakes– Things Done Right– Types of Experiment
Post-Genomic Technologies
• Set of technologies that have become prevalent since the advent of genome sequencing
• Also referred to as ’functional genomics’ technologies– Transcriptomics– Proteomics– Metabolomics
• 'High-throughput’ techniques, generate lots of data, fast
Importance of Design
• Functional Genomics experiments are expensive
• The quantity of data can mask interesting biological variation (noise)
• Bad design can increase noise• Or at least fail to minimise it
When Design Goes WrongA trivial example
• Bill and Ben want to identify proteins upregulated in response to water starvation in a drought resistant plant
• So, Bill went away and grew some plants, and so did Ben
When Design Goes Wrongcontinued
• Bill chose 3 plants, and Ben chose 4• Bill grew his at home in normal
conditions, and Ben grew his in the lab with minimal water
• Then, after a few days of growth, they each took samples from their plants and ran 2D-PAGE
When Design Goes Wronganalysis
• They used average gels of the 2 groups of plants to find differentially expressed proteins
• They did t-tests for every spot on the gels, and found 400 of 2500 proteins (95% level) with significantly altered expression in drought conditions
• What now? They only wanted 10-20
When Design Goes WrongWhat did they do wrong?
• Confounding– Experiment can’t distinguish between a number of factors:
• Drought• Experimenter effects• Difference between home and lab
• Selection– Bill or Ben could be biased in how they selected plants,
even unconsciously– Randomised selection is preferred
• Unbalanced– Better to have equal numbers in each group for many statistical
analyses
When Design Goes WrongHow to improve
• Grow plants together under same conditions
• Select an equal number randomly for both Bill and Ben
• Both half their plants and grow normal and drought plants to the same protocol
• Better still, either Bill or Ben should do the whole experiment
When Design Goes WrongPost mortem
• Even with a rigorously designed experiment, Bill and Ben may still have obtained confusing results– It is common to identify many differentially expressed
genes/proteins– This can be a true reflection of the biology– False discovery rate is necessarily high in post-genomic
experiments, because of the number of hypotheses being tested
• Good experimental design could have reduced the complexity of their output– providing a base for a robust statistical analysis of the data
Choice of Technology
• Microarray or proteomics?• Affy or two-colour arrays?
– Reference sample?
• 2D gels or LC-MS?• Single stain or DIGE?
– Reference sample?
• No easy (or correct) answers– Depends very much on the individual
experiment
Further Pitfalls
• Fahrenheit and the Cow• Based on urban myth• Still an important message
– No individual is typical– Biological, as well as technical,
replicates required
Further Pitfalls
• The pester problem– Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a
puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad can I have a puppy , Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy? Dad, can I have a puppy?
• Ask a question often enough, eventually you’ll get the answer you’re after
Further Pitfalls
• The universe doesn’t exist -- on average– Pooling samples makes little sense:
no information about distribution / need STDDEV for significance test
• “My machine/technique is so accurate, I don’t need replicates”– Accuracy has little effect on biological
variance
Doing Things RightCalculating power
Probability density (null hypothesis)
Probability density (alternative hypothesis)
= probability of false positive (Type I Error)
= Power
1- = probability of false negative (Type II Error)
Doing Things RightCalculating power
Probability density (null hypothesis)
Probability density (alternative hypothesis)
= probability of false positive (Type I Error)
= Power
1- = probability of false negative (Type II Error)
Doing Things RightCalculating power
Probability density (null hypothesis)
Probability density (alternative hypothesis)
= probability of false positive (Type I Error)
= Power
1- = probability of false negative (Type II Error)
Doing Things RightCalculating power
Probability density (null hypothesis)
Probability density (alternative hypothesis)
= probability of false positive (Type I Error)
= Power
1- = probability of false negative (Type II Error)
Types of Experiment
• Time course– Cell cycle– Following drug challenge– Following external stimulus– Following release of mutant
• Mutant vs Wild-Type• Normal vs Diseased• Developmental Changes• Different Tissues• Within cell differences
Types of Experiment
• Novel microarray techniques– Genotyping– SNP detection– Copy Number Assessment
• Novel proteomics techniques– High-throughput interaction detection– Phosopho-proteomics
• Also…– Protein binding arrays– Ligand binding arrays
A couple of quotes
• You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!– Richard P. Feynman
• To consult a statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination. He can perhaps say what the experiment died of.– R.A.Fisher, 1938.
Summary
• Post-genomics technologies are powerful, but expensive
• Good design gives maximum return for minimum effort