Top Banner
1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data
49
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

1

Experimental Design

EPP 245

Statistical Analysis of

Laboratory Data

Page 2: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

2

Basic Principles of Experimental Investigation

• Sequential Experimentation• Comparison• Manipulation• Randomization• Blocking• Simultaneous variation of factors• Main effects and interactions• Sources of variability• Issues with two-color arrays

Page 3: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

3

Sequential Experimentation

• No single experiment is definitive

• Each experimental result suggests other experiments

• Scientific investigation is iterative.

• “No experiment can do everything; every experiment should do something,” George Box.

Page 4: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

4

Plan Experiment

Perform Experiment

Analyze Data from

Experiment

Page 5: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

5

Comparison

• Usually absolute data are meaningless, only comparative data are meaningful

• The level of mRNA in a sample of liver cells is not meaningful

• The comparison of the mRNA levels in samples from normal and diseased liver cells is meaningful

Page 6: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

6

Internal vs. External Comparison

• Comparison of an experimental results with historical results is likely to mislead

• Many factors that can influence results other than the intended treatment

• Best to include controls or other comparisons in each experiment

Page 7: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

7

Manipulation

• Different experimental conditions need to be imposed by the experimenters, not just observed, if at all possible

• The rate of complications in cardiac artery bypass graft surgery may depend on many factors which are not controlled and may be hard to measure

Page 8: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

8

Page 9: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

9

Randomization

• Randomization limits the difference between groups that are due to irrelevant factors

• Such differences will still exist, but can be quantified by analyzing the randomization

• This is a method of controlling for unknown confounding factors

Page 10: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

10

• Suppose that 50% of a patient population is female

• A sample of 100 patients will not generally have exactly 50% females

• Numbers of females between 40 and 60 would not be surprising

• In two groups of 100, the disparity between the number of females in the two groups can be as big as 20% simply by chance

• This also holds for factors we don’t know about

Page 11: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

11

• Randomization does not exactly balance against any specific factor

• To do that one should employ blocking

• Instead it provides a way of quantifying possible imbalance even of unknown factors

• Randomization even provides an automatic method of analysis that depends on the design and randomization technique.

Page 12: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

12

The Farmer from Whidbey Island

• Visited the University of Washington with a Whalebone water douser

• 10 Dixie cups, 5 with water, 5 empty, covered with plywood

• If he gets all 10 right, is chance a reasonable explanation?

Page 13: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

13

• The randomness is produced by the process of randomly choosing which 5 of the 10 are to contain water

• There are no other assumptions

10252

5

1.004

252

Page 14: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

14

• If the randomization had been to flip a coin for each of the 10 cups, then the probability of getting all 10 right by chance is different

• There are 210 = 1024 ways for the randomization to come out, only one of which is right, so the chance is 1/1024 = .001

• The method of randomization matters

Page 15: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

15

Randomization Inference

• 20 tomato plants are divided 10 groups of 2 placed next to each other in the greenhouse

• In each group of 2, one is chosen to receive fertilizer A and one to receive fertilizer B

• The yield of each plant is measured

Page 16: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

16

1 2 3 4 5 6 7 8 9 10

A 132 82 109 143 107 66 95 108 88 133

B 140 88 112 142 118 64 98 113 93 136

diff 8 6 3 -1 11 -2 3 5 5 3

Page 17: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

17

• Average difference is 4.1

• Could this have happened by chance?

• Is it statistically significant?

• If A and B do not differ in their effects (null hypothesis is true), then the plants’ yields would have been the same either whether A or B is applied

• The difference would be the negative of what it was if the coin flip had come out the other way

Page 18: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

18

• In pair 1, the yields were 132 and 140.

• The difference was 8, but it could have been -8

• With 10 coin flips, there are 210 = 1024 possible outcomes of + or – on the difference

• These outcomes are possible outcomes from our action of randomization, and carry no assumptions

Page 19: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

19

• Of the 1024 possible outcomes that are all equally likely under the null hypothesis, only 3 had greater values of the average difference, and only four (including the one observed) had the same value of the average difference

• The likelihood of this happening by chance is [3+4/2]/1024 = .005

• This does not depend on any assumptions other than that the randomization was correctly done

Page 20: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

20

1 2 3 4 5 6 7 8 9 10

A 132 82 109 143 107 66 95 108 88 133

B 140 88 112 142 118 64 98 113 93 136

diff 8 6 3 -1 11 -2 3 5 5 3

Page 21: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

21

9

4.1

3.872

4.1 4.13.35

1.2243.872 / 10.0043 by t-test

.0049 by true randomization distribution

same range for simulation randomization distributions

The t-test can be thought of as an approximatio

d

d

s

t

p

p

n

to the randomization distribution.

Page 22: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

22

Randomization and in practice

• Whenever there is a choice, it should be made using a formal randomization procedure, such as Excel’s rand() function.

• This protects against unexpected sources of variability such as day, time of day, operator, reagent, etc.

Page 23: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

23

Pair Number First Sample Treatment

1 A or B?

2 A or B?

3 A or B?

4 A or B?

5 A or B?

6 A or B?

7 A or B?

8 A or B?

9 A or B?

10 A or B?

Page 24: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

24

Pair Num

First Sample Treatment

random number

1 A or B? 0.871413

2 A or B? 0.786036

3 A or B? 0.889785

4 A or B? 0.081120

5 A or B? 0.297614

6 A or B? 0.540483

7 A or B? 0.824491

8 A or B? 0.624133

9 A or B? 0.913187

10 A or B? 0.001599

Page 25: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

25

• =rand() in first cell

• Copy down the column

• Highlight entire column

• ^c (Edit/Copy)

• Edit/Paste Special/Values

• This fixes the random numbers so they do not recompute each time

• =IF(C3<0.5,"A","B") goes in cell C2, then copy down the column

Page 26: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

26

Plant Pair

First Plant Treatment

random number

1 B 0.871413

2 B 0.786036

3 B 0.889785

4 A 0.081120

5 A 0.297614

6 B 0.540483

7 B 0.824491

8 B 0.624133

9 B 0.913187

10 A 0.001599

Page 27: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

27

• To randomize run order, insert a column of random numbers, then sort on that column

• More complex randomizations require more care, but this is quite important and worth the trouble

• Randomization can be done in Excel, R, or anything that can generate random numbers

Page 28: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

28

Blocking

• If some factor may interfere with the experimental results by introducing unwanted variability, one can block on that factor

• In agricultural field trials, soil and other location effects can be important, so plots of land are subdivided to test the different treatments. This is the origin of the idea

Page 29: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

29

• If we are comparing treatments, the more alike the units are to which we apply the treatment, the more sensitive the comparison.

• Within blocks, treatments should be randomized

• Paired comparisons are a simple example of randomized blocks as in the tomato plant example

Page 30: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

30

Simultaneous Variation of Factors

• The simplistic idea of “science” is to hold all things constant except for one experimental factor, and then vary that one thing

• This misses interactions and can be statistically inefficient

• Multi-factor designs are often preferable

Page 31: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

31

Interactions

• Sometimes (often) the effect of one variable depends on the levels of another one

• This cannot be detected by one-factor-at-a-time experiments

• These interactions are often scientifically the most important

Page 32: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

32

• Experiment 1. I compare the room before and after I drop a liter of gasoline on the desk. Result: we all leave because of the odor.

Page 33: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

33

• Experiment 1. I compare the room before and after I drop a liter of gasoline on the desk. Result: we all leave because of the odor.

• Experiment 2. I compare the room before and after I drop a lighted match on the desk. Result: no effect other than a small scorch mark.

Page 34: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

34

• Experiment 1. I compare the room before and after I drop a liter of gasoline on the desk. Result: we all leave because of the odor.

• Experiment 2. I compare the room before and after I drop a lighted match on the desk. Result: no effect other than a small scorch mark.

• Experiment 3. I compare all four of ±gasoline and ±match. Result: we are all killed.

• Large Interaction effect

Page 35: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

35

Statistical Efficiency

• Suppose I compare the expression of a gene in a cell culture of either keratinocytes or fibroblasts, confluent and nonconfluent, with or without a possibly stimulating hormone, with 2 cultures in each condition, requiring 16 cultures

Page 36: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

36

• I can compare the cell types as an average of 8 cultures vs. 8 cultures

• I can do the same with the other two factors

• This is more efficient than 3 separate experiments with the same controls, using 48 cultures

• Can also see if cell types react differently to hormone application (interaction)

Page 37: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

37

Fractional Factorial Designs

• When it is not known which of many factors may be important, fractional factorial designs can be helpful

• With 7 factors each at 2 levels, ordinarily this would require 27 = 128 experiments

• This can be done in 8 experiments instead

Page 38: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

38

F1 F2 F3 F4 F5 F6 F7

1 H H H H H H H

2 H H L H L L L

3 H L H L H L L

4 H L L L L H H

5 L H H L L H L

6 L H L L H L H

7 L L H H L L H

8 L L L H H H L

Page 39: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

39

F1 F2 F3 F4 F5 F6 F7

1 H H H H H H H

2 H H L H L L L

3 H L H L H L L

4 H L L L L H H

5 L H H L L H L

6 L H L L H L H

7 L L H H L L H

8 L L L H H H L

Page 40: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

40

F1 F2 F3 F4 F5 F6 F7

1 H H H H H H H

2 H H L H L L L

3 H L H L H L L

4 H L L L L H H

5 L H H L L H L

6 L H L L H L H

7 L L H H L L H

8 L L L H H H L

Page 41: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

41

F1 F2 F3 F4 F5 F6 F7

1 H H H H H H H

2 H H L H L L L

3 H L H L H L L

4 H L L L L H H

5 L H H L L H L

6 L H L L H L H

7 L L H H L L H

8 L L L H H H L

Page 42: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

42

Main Effects and Interactions

• Factors Cell Type (C), State (S), Hormone (H)

• Response is expression of a gene

• The main effect C of cell type is the difference in average gene expression level between cell types

Page 43: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

43

• For the interaction between cell type and state, compute the difference in average gene expression between cell types separately for confluent and nonconfluent cultures. The difference of these differences is the interaction.

• The three-way interaction CSH is the difference in the two way interactions with and without the hormone stimulant.

Page 44: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

44

Sources of Variability in Laboratory Analysis

• Intentional sources of variability are treatments and blocks

• There are many other sources of variability• Biological variability between organisms or

within an organism• Technical variability of procedures like

RNA extraction, labeling, hybridization, chips, etc.

Page 45: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

45

Replication

• Almost always, biological variability is larger than technical variability, so most replicates should be biologically different, not just replicate analyses of the same samples (technical replicates)

• However, this can depend on the cost of the experiment vs. the cost of the sample

• 2D gels are so variable replication is required

Page 46: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

46

Quality Control

• It is usually a good idea to identify factors that contribute to unwanted variability

• A study can be done in a given lab that examines the effects of day, time of day, operator, reagents, etc.

• This is almost always useful in starting with a new technology or in a new lab

Page 47: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

47

Possible QC Design

• Possible factors: day, time of day, operator, reagent batch

• At two levels each, this is 16 experiments to be done over two days, with 4 each in morning and afternoon, with two operators and two reagent batches

• Analysis determines contributions to overall variability from each factor

Page 48: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

48

References

• Statistics for Experimenters, Box, Hunter, and Hunter, John Wiley

Page 49: 1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.

October 4, 2007 EPP 245 Statistical Analysis of Laboratory Data

49

Exercise• You have a clinical study in which 10 patients

will either get the standard treatment or a new treatment

• Randomize which 5 of the 10 get the new treatment so that all possible combinations can result. Use Excel or another formal randomization method.

• Randomize so that in each pair of pair of patients entered by date, one has the standard and one the new treatment.

• What are the advantages of each method?• Why is randomization important?