Top Banner
Design Of Experiments Notes for Lecture 1, 8/25/03 I. Class Logistics The first part of the class will be devoted to discussing the logistics of the class as shown on the syllabus and course schedule. II. Introduction to Design of Experiments Why Experiment? To gain information. Why design the experiment? To gain the most information with the least effort. To collect information from which valid conclusions may be drawn.
32
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DOE - Design Of Experiments

Design Of Experiments

Notes for Lecture 1, 8/25/03

I. Class Logistics

The first part of the class will be devoted to discussing the logistics of the class as shown on the syllabus and course schedule.

II. Introduction to Design of Experiments

Why Experiment?

To gain information.

Why design the experiment?

To gain the most information with the least effort. To collect information from which valid conclusions may be drawn.

Page 2: DOE - Design Of Experiments

What is the best way to design an experiment?

It depends, thus the need for this class.

Page 3: DOE - Design Of Experiments

A. What is an experiment (from a mathematical / statistical perspective)?

Purpose:

1. To determine if various factors effect some response (Y)

2. To build models relating the response to the factors.

Page 4: DOE - Design Of Experiments

Examples to clarify:

Example 1.

Response Variable Y = yield from a chemical process.(Perhaps measured in percent conversion of the raw materials to final product)

Factors studied:

Factor A: Temperature at which process is runFactor B: Amount of catalyst used in the process

The response variable (Y) is also called the DEPENDENT variable because its value is assumed to depend on the values of the factors.

Similarly, factors are also sometimes called independent variables.

Factors are also commonly known as "treatments", a term which comes from agriculture where design of experiments was first developed.

Page 5: DOE - Design Of Experiments

Definitions: Factor or Treatment LEVELS:

The values assigned to the factors for the various runs of the experiment:

Amount of Catalyst (Lbs.) | 200 300

-----|-------------------------- 20 | 84% 92%Temp | 87% 91%(C) | 30 | 75% 87% | 79% 88% | 45 | 97% 96% | 23% 93%

Thus in this experiment Factor A (temperature) is at three levels, 20, 30, and 45 degrees. Factor B is at two levels, 200 and 300 lbs.

OBSERVATIONS of the response variable Y are listed in the table.

Note that for each combination of treatment levels, we have 2 observations on the response Y.

Page 6: DOE - Design Of Experiments

Natural Questions Is 2 enough? Is 1 sufficient? Do we really need observations for each combination of treatment levels? Must we have the same number of observations for each treatment level combination? Why 3 and 2 levels respectively? Does either of the factors affect the response?

These are questions you will be able to answer by the end of the course.

Page 7: DOE - Design Of Experiments

Example 2.

Response Y = Number of defects in a silicon wafer

Factor A = Operator who made wafer (Joe, Sue, Dinsdale, Sally)

Factor B = Raw material supplier (3 different companies)

Factor C = temperature at time of manufacture

Factor D = Class of clean room (1, 2 or 3)

Page 8: DOE - Design Of Experiments

Note that factors can be purely categorical such as A and B or also gender, drug (e.g. aspirin, ibuprofen, acetaminophen, placebo)

They can be categorical, but with natural order as D or also education (grade completed)

They can be real numbers like C or also pressure, age, income, etc.

Page 9: DOE - Design Of Experiments

Note that all factors can be TREATED as if they are categories.

We begin this class with experiments involving a single factor.

However it is important to remember that the importance of design of experiments really kicks in for multi-factor experiments. Also, many of the complications kick in as well. We will discuss some of pitfalls later today.

Page 10: DOE - Design Of Experiments

B. How does one analyze data from an experiment?

We begin with single factor experiments.

Example: Experiment to determine if temperature effects yield

Temperature (degrees C) 20 30 40 --------------------------- Obs 1 93 90 90 2 94 97 91 3 92 92 92 4 91 93 91

Page 11: DOE - Design Of Experiments

A Mathematical Model for the Response:

Let i index temperature levels (e.g. i=1 implies temp=20)

Let j index observation number within treatment level (j=1,...,4 there are 4 replicates)

Yij = i + ij

Page 12: DOE - Design Of Experiments

Where:

i = Mean(Y) or expected value E(Y) at Temperature i

ij = A random variable or "error term"

we assume E[ij] = 0

Page 13: DOE - Design Of Experiments

When we repeated the experiment 4 times at the same temperature above, we did not get the same yields.

Why not? A whole bunch of reasons.

ij represents the net effect of countless unmeasured variables and causes which, in our model, are chalked up to "random error".

Page 14: DOE - Design Of Experiments

We observe variation in yield from two causes:

1. Temperature 2. Sum total of many factors called "error"

Page 15: DOE - Design Of Experiments

Statistical Analysis (ANOVA)

Does temperature effect yield Y? Or, perhaps more accurately, does the data give us significant reason to conclude that temperature effects Y?

How would you answer this question (class will actively participate at this point if not earlier)

Page 16: DOE - Design Of Experiments

Consider the following two examples of how the data could turn out.

The data values are represented by "o"s and the treatment averages are represented by "-"s

Page 17: DOE - Design Of Experiments

| o | | | | o | o o | - | Y | o o | o | - - | | o o | o | | | | o o |_______________________________

20 30 40

Temperature

Page 18: DOE - Design Of Experiments

Given this graph, will you bet the farm that temperature effects Yield?

Now consider the following graph. Note that the AVERAGE yield is the same for each temperature as it was in the previous graph.

Page 19: DOE - Design Of Experiments

| | | | | | oo | - | ooY | | oo oo | - - | oo oo | | | | | | |_______________________________

20 30 40

Temperature

In the first graph, the averages varied. However, the averages will always vary because there is randomness involved. Thus the BIG question is:

Is the variation in averages at the different temperatures more than we would expect from randomness?

We compare the variation in treatment averages to the variation observed inside or "within" each treatment.

Page 20: DOE - Design Of Experiments

*Notes for the rest of this section must be taken by hand*

Page 21: DOE - Design Of Experiments

We can show algebraically that SSTOT = SSTRT + SSERR

This is “an orthogonal decomposition of SS”, and SSTRT and SSERR are statistically independent.

This turns out to be quite important in the analysis of expts.

We can basically compare how big SSTRT is to how big SSERR is to decide if there is a significant effect.

Page 22: DOE - Design Of Experiments

C. Why is it important to design experiments?

(Illustrated by examples of how you can mess up if you don't design the experiment!)

Page 23: DOE - Design Of Experiments

Pitfall number one:

Consider our previous one factor experiment. The numbers in the table show the order in which the observations where run/collected.

Temperature (degrees C) 20 30 40 --------------------------- Obs 1 1 5 9 2 2 6 10 3 3 7 11 4 4 8 12

Page 24: DOE - Design Of Experiments

Suppose our analysis of variance indicated a difference in response averages between treatments. Can we conclude temperature was the cause?

How do we remedy this problem? (Again lively discussion at this point)

Solution: run the experiments in __________ order.

Hopefully factors not included in the expt will "balance out through _______ization”.

Page 25: DOE - Design Of Experiments

Example 2: pitfall #1 continued

Temperature 20 30 -------------------- obs 1 joe bob 2 joe bob 3 joe bob

Our chemical process requires an operator to run the expt. We assign Joe to run the three observations at 20 degrees, and Bob to the 3 observations at 30 degrees. What is wrong with this? How could we remedy it? (More lively discussion)

Solution 1: Randomize

Solution 2: Only use Bob

- May not be possible physically- Experimental inference extends (at least theoretically) only when Bob is running the process

Solution 3: Include "operator" as factor B in a two-factor experiment.

Page 26: DOE - Design Of Experiments

Pitfall #2: failure to consider interaction between factors:

The world's most common experimental design is the vary "one factor at a time" or OFAT design. For example, to determine the effects of temperature and pressure on yield you might do this:

Set temperature at 30 degrees. While temp=30, take observations at pressure levels of 1, 2, and 3 atmospheres.

Next set Pressure = 2 atmospheres, and observe the process at 20, 30, and 40 degrees.

That is, vary one factor at a time.

This is the world's most commonly used experimental design.

Page 27: DOE - Design Of Experiments

It is also the world's worst experimental design.

Example: Does the presence of men and women effect birth rate?Response = birth rate per 100 people.

Factor A = the presence or absence of women Factor B = the presence or absence of men

Page 28: DOE - Design Of Experiments

We vary one factor at a time and get the following results:

No Men Men ----------------------------- No women 0 0

Women 0 (No Obs)

These data clearly show that the two factors (presence of men, presence of women) have absolutely nothing to do with birth rate.

The lesson is to not forget "interaction" which means that the effects of one factor may depend on the levels of another.

Example 2 of what can happen if you ignore interaction. (Refer to class notes for this example)

Page 29: DOE - Design Of Experiments

Pitfall #3: Confounding

Consider the following data designed to determine if temperature and pressure effect the response.

Pressure 1 atm 2 atm ----------------------------- 0.1 20 0.3Temp 0.2 150.3 30 150.2 150.1

Page 30: DOE - Design Of Experiments

Do we observe a significant effect on response? Apparently.

What caused the effect? Temperature? Pressure?

In this experiment, temperature and pressure are "completely confounded"

This means that because of the terrible design, we cannot separate out the contributions of the two factors to the effect on the response.

In a well-designed ("orthogonal") experiment, we can completely separate the effects of the factors.

If the design is not orthogonal, the effects of the factors cannot be completely separated.

In the following tables are examples of three designs. The numbers in the tables are the number of observations made in each cell:

Page 31: DOE - Design Of Experiments

A 1 2 ---------------- 1 100 0 B "Completely Confounded" 2 0 100

A 1 2 ---------------- 1 50 50 B "Orthogonal" 2 50 50

A 1 2 ---------------- 1 75 25 B "Partially Confounded" 2 25 75In the next lecture we will review the following concepts from your basic statistics class pertaining to hypothesis testing:

The basic one sample T test as an example The decision procedure Type I and Type II error OC curves and sample size selection Practical vs. statistical significance The relationship between confidence intervals and hypothesis tests