Top Banner
Statistics in Science Statistics in Science Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10
28

Statistics in Science Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Dec 22, 2015

Download

Documents

Troy Tartt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Statistics

in

Science

Statistical Analysis & Design in Research

Structure in theExperimental Material

PGRM 10

Page 2: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Blocking – the idea

Detecting differences between treatments depends on the background noise (BN)

• BN is:– caused by inherent differences between the

experimental units– measured by the residual (error) mean square RMS

(alternatively! MSE)

• Comparing treatments on similar units would reduce background noise

• With blocks of units of differing contributing characteristics we measures the variation due to blocks and reduce residual variation

Page 3: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Blocking – the benefit

Reducing background noise:

• Gives more precise estimates

• Allows a reduction in replication, without loss of

power

(the probability of detecting an effect of a specified

size)

• Reduces cost!

Page 4: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Blocking and experimental materialExamples

1. A field: with fertility increasing from top to bottomWith 3 treatments group plots into BLOCKS of 3, starting at top and continuing to bottom.Randomise treatments within each block

Page 5: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Block Design

How many replicates per treatment?

What is the experimental unit?

Treat Blk A B C 1 T1 T3 T2 2 T3 T2 T1 3 T2 T1 T3 4 T1 T2 T3 5 T3 T1 T2 6 T1 T2 T3

What is the block?

Page 6: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Example

• 2 drugs (A, B) to control blood pressure

• 100 subjects – randomly assign 50 each to A and B

• Valid - but is it efficient?

• If subjects are heterogenous - likely to be a large variation (2) in the responses within each group.

• Design may not be very efficient.

Page 7: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Factors affecting BP variation

Page 8: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Blocking and experimental material1. 100 subjects are selected to compare new drug to

control BP with a Control

Block into pairs by age & weight (believed to affect BP)

In each pair one is selected at random to receive the new drug, the other receives Control

Alternatively – see next slide

Page 9: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Groups (Blocks)

Age Sex Weight # T1 T2 >50 Male H 15 >50 Male N 11 >50 Male L 12 >50 Female H 11 >50 Female N 9 >50 Female L 13 <50 Male H 7 <50 Male N 2 <50 Male L 5 <50 Female H 4 <50 Female N 8 <50 Female L 3 Total 100

Page 10: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Groups (Blocks)

Age Sex Weight # T1 T2 >50 Male H 15 8 7 >50 Male N 11 5 6 >50 Male L 12 6 6 >50 Female H 11 5 6 >50 Female N 9 5 4 >50 Female L 13 6 7 <50 Male H 7 4 3 <50 Male N 2 1 1 <50 Male L 5 2 3 <50 Female H 4 2 2 <50 Female N 8 4 4 <50 Female L 3 2 1 Total 100 50 50

Page 11: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Blocking and experimental materialExamples

1. A field: with fertility increasing from top to bottomWith 3 treatments group plots into BLOCKS of 3, starting at top and continuing to bottom.Randomise treatments within each block

2. 100 subjects are selected to compare new drug to control BP with a ControlBlock into pairs by age & weight (believed to affect BP)In each pair one is selected at random to receive the new drug, the other receives Control

3. 3 products to be compared in 15 supermarkets:All 3 compared in each supermarket, regarded as BLOCKS

Page 12: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Blocking and experimental materialExamples (contd)

4. A crop experiment will take 5 days to harvest.The material is blocked into 5 sets of plots, and treatments assigned at random within each setA BLOCK of plots is harvested each day

Here: day effects, such as rain etc will be allowed for in the ANOVA table, not clouding the estimation of treatment effects, and reducing residual variation.

Page 13: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Blocking factors in your work area?

Page 14: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Reasons to BLOCK

1. Reduce BN (as above)

2. Material is naturally blocked (eg identical twins)

so using this a part of the design may reduce BN

3. To protect against factors that may influence the

experimental outcomes, and so cloud comparison

of treatments

4. To assess block variation itself

eg day to day variation large may indicate a

process that is not well controlled.

Page 15: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Typical Randomised Block Design (RBD) Layout

Block

1 T3 T1 T2 T4

2 T2 T3 T1 T4

3 T1 T2 T3 T4

4 T2 T4 T1 T3

5 T4 T2 T3 T1

6 T3 T1 T4 T2

4 treatments T1 – T4 BLOCKS of size 4

Example of random allocation within blocks:

Page 16: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

ANOVA table

Source DF SS MS F Pr > F

Treatments t – 1 TSS TMS TMS/RMS Small?

Blocks b – 1 BSS BMS BMS/RMS Small?

Residual (t-1)(b-1) RSS RMS

Total tb - 1

each treatment occurs once in each blockt treatmentsb blockstb experimental units

MS = SS/DF

Page 17: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

ExamplePGRM pg 10-2

Compare effect of washing solution used in retarding bacterial growth in food processing containers.

Only 3 trials can be run each day, and temperature is not controlled so day to day variability is expected.

BLOCKS: day

Treatments: 2%, 4%, 6% of active ingredient

Randomisation: 3 containers randomly allocated to 3 treatments on each of 4 days.

Response: bacterial count on each container each day (low score = cleaner)

Page 18: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Example (contd)

DaySolution(

%) Count

1 2 13

1 4 10

1 6 5

2 2 18

2 4 20

2 6 6

3 2 18

3 4 17

3 6 7

4 2 30

4 4 31

4 6 10

Day,Solution(%),Count1,2,131,4,101,6,52,2,182,4,20...

Note:Response values in a single columnExtra column to identify

BLOCK (day)TREATMENT (solution)

csvExcel

Page 19: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

SAS GLM codeproc glm data = randb;

class solution day;

model score = solution day;

lsmeans solution;

lsmeans day;

estimate ‘2-6’ solution 1 0 -1;

estimate ‘linear ok?’ solution1 -2 1;

quit;

Page 20: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

GLM OUTPUT: ANOVA

Source DF Sum of

Squares Mean Square F Value Pr > F

Model 5 748.08 149.6 11.68 0.0048

Error 6 76.8 12.8

Corrected Total 11 824.9

Source DF Type I SS Mean Square F Value Pr > F

solution 2 425.17 212.58 16.60 0.0036

Day 3 322.92 107.64 8.41 0.0144

425.17 + 322.92 =

748.09

So the Model SS has been partitioned into TREATMENT (solution) and BLOCK (Day)

Page 21: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

GLM OUTPUT: means

solution score LSMEAN

2 19.75

4 19.5

6 7.0

Parameter Estimate Standard

Error t Value Pr > |t|

2-6 12.75 2.530 5.04 0.0024

linear ok? -12.25 4.383 -2.80 0.0314

Page 22: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

ANOVA table

Source SS df MS F P

Days 425 ? 213 18.60 0.004

Solution 323 ? 108 8.41 0.014

Residual 76.8 ? 12.8

Solution 2 4 6 SED

19.8 19.5 7.0 2.53

Day 1 2 3 4 SED

9.3 14.7 14.0 23.7 2.92

Page 23: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

More Blocking – Latin square designs

Page 24: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Latin Square design – blocking by 2 Sources of variation

Variation in milk yield among cows is large (CV% = 25)

Variation in Yield across lactation is large

Use different treatments in sequence on each cow

Need to allow for a standardisation period (1-2) weeks between treatments

Lactation yield pattern

0

200

400

600

800

0 2 4 6 8 10

Month

Yie

ld (

kg)

Page 25: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Data

Cow

Period 1 2 3 4

1 T2 T1 T3 T4

2 T4 T2 T1 T3

3 T3 T4 T2 T1 4 T1 T3 T4 T2

Milk yield (kg/day) Cow

Period 1 2 3 4 1 9.7 14.0 20.2 20.9 2 15.1 20.3 17.8 24.3 3 16.4 20.1 21.3 21.5 4 11.8 19.1 21.3 20.6

Period Cow Treat yield 1 1 2 9.7 2 1 4 15.1 3 1 3 16.4

4 1 1 11.8 1 2 1 14.0

2 2 2 20.3

….

Columns for period,cow and

treatment codes

Page 26: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

SAS GLM codeproc glm data = latinsq;

class period cow treat;

model yield = period cow treat;

lsmeans treat;

lsmeans period;

lsmeans cow;

estimate ‘1v2’ treat 1 -1 0 0 ;

Run;

Page 27: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Results

Treat Period Cow 1 16.28 16.21 13.24 2 17.98 19.37 18.38 3 20.01 19.82 20.16 4 19.33 18.18 21.82

SED 0.775 0.775 0.775

Source DF SS MS F P Period 3 31.2 10.41 8.68 0.013 Cow 3 165.8 55.28 46.06 0.000 Treat 3 32.5 10.82 9.01 0.012 Error 6 7.2 1.20

Means

Cow and Period removed much variation

Page 28: Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Statistics

in

Science

Conclusions on Latin square design

CV greatly reduced to 6% - When the effect of period is allowed for, repeated measurements within a cow are not very variable.

Periods and cows are nuisance variables. Sometimes the row and column variables are of interest in themselves and so design is very efficient – information on 3 factors. (e.g. treatments, machines, operators).

Useful for screening but questionable whether short term results would apply for the long term.