Curso Bosch Diseño de experimentos

8/2/2019 Curso Bosch Diseo de experimentos

1/104


2/104

Edition 08.1993

1993 Robert Bosch GmbH


3/104

- 3 -

Table of Contents:

1. System-Analytical Approach.................................................................................. 5

1.1 One-Factor-at-a-Time Method................................................................................ 5

1.2 Two-Factor Method................................................................................................ 91.3 General Case (Numerous Influence Factors) ........................................................ 12

2. Industrial Experimentation Methodology and System Theory .............................. 16

2.1 Hints on System Analysis..................................................................................... 17

2.2 Short Description of the System Theoretical Procedure ....................................... 17

2.2.1 Global System Matrix (i.e. without quoting any levels)........................................ 19

2.2.2 Local System Consideration ................................................................................ 19

2.2.3 Local System Matrix ............................................................................................ 20

2.3 Summary .............................................................................................................. 20

3. Probability Plot .................................................................................................... 21

3.1 Probability Plot of Small-Size Samples................................................................ 22

3.2 Probability Paper ................................................................................................. 23

4. Comparison of Samples Means ........................................................................... 244.1 t Test.................................................................................................................... 244.2 Minimum Sample Size ......................................................................................... 26

5. F Test................................................................................................................... 30

6. Analysis of Variance (ANOVA)........................................................................... 326.1 Deriving the Test Statistic.................................................................................... 34

6.2 Equality Test of Several Variances (According to Levene) .................................. 36

7. Design of Experiments with Orthogonal Arraysand Evaluating such Experiments......................................................................... 38

7.1 Representing the Results of Measurement............................................................ 437.2 Calculating the Effects ......................................................................................... 527.3 Regression Analysis............................................................................................. 557.4 Factorial Designs ................................................................................................. 567.4.1 Design Matrix ...................................................................................................... 567.4.2 Evaluation Matrix ................................................................................................ 587.4.3 Confounding ........................................................................................................ 597.4.4 Fractional Factorial Designs................................................................................. 637.5 Designs for Three-Level Factors .......................................................................... 657.6 Central Composite Designs .................................................................................. 677.7 Screening Designs According to Plackett and Burman ......................................... 69

8. Statistical Evaluation Procedures for Factorial Designs ....................................... 718.1 One-Way Analysis of Variance ............................................................................ 718.2 Factorial Analysis of Variance ............................................................................. 728.3 Factorial Analysis of Variance with Respect to Variation .................................... 728.4 Computer Support ............................................................................................... 738.4.1 Evaluation of an Experiment using the FKM Program.......................................... 758.4.2 Evaluation with the Help of SAV Program........................................................... 81


4/104

- 4 -

9. Hints on Practical Design of Experiments ............................................................ 86

9.1. Task and Target Formulation .............................................................................. 86

9.2. System Analysis .................................................................................................. 86

9.3. Stipulating an Experimental Strategy ................................................................... 87

9.4. Executing and Documenting an Experiment ......................................................... 88

10. Shainin Method .................................................................................................... 89

11. List of References ............................................................................................... 92

12. Tables .................................................................................................................. 93

Index ...................................................................................................................101

Within the framework of quality assurance and for effective new and further development

of Bosch products, careful design of experiments is not only indispensable but is also re-quired by our customers.

In this connection, the commonly used term Statistical Experimental Design is not ex-actly defined and labels such as Design of Experiments (DOE), Industrial Experimen-tation Methodology, Taguchi Method and Shainin Method(s) are often used inter-changeably.

This pamphlet is based on a seminar manuscript on Industrielle Versuchsmethodik 1(Industrial Experimentation Methodology) and should clarify vital terms and processes ofthe statistical experimental design, to an interested user.


5/104

- 5 -

1. System-Analytical Approach

Investigation of a system must often begin with the description of a particular systemsstate. A basic requisite that we impose on an experiment is reproducibility, i.e. under defi-

nite conditions the result of an experiment must always be the same. Since there cant beabsolute equality (one cannot swim upstream twice), the reproducibility of an experi-ment is a relative term. One can use statistical terms to define the term reproducibility.The term self-control can be interpreted as generalization of the term reproducibility.It is also possible to limit oneself to the statement that a (quantitative) result of an experi-ment must always lie within a specific bandwidth.

Variation in results of repeated experiments (e.g. process variation) can, under certainsituations, be a vital parameter. Standard deviation, under certain circumstances, canserve as a measure of the variation. If one wishes to evaluate the variation quantitatively,one needs a sufficiently large sample size, i.e. sufficiently many repetitions of the ex-periment (see hereto, Chapter 4.2). Similar statements are valid for the mean position.

1.1 One-Factor-at-a-Time Method

If one wishes to investigate the influence of a factor within a system, one varies this factorbut leaves other factors in the system unchanged. In general, this ensures that other fac-tors, which are not the subject of the investigation, neither falsify the results nor restrictthe corresponding deduced statements. (That is obviously easier said than done.) This ap-proach is convenient, logical and should be reckoned as a fundamental experimental strat-egy. The only restriction: The nature of influence depends (possibly very strongly) upon

the position of the other factors.

Strangely, numerous textbook authors reckon the one-factor-at-a-time method to be ineffi-cient. A practical person can, nonetheless, confidently ignore these objections. It is evidentthat one-factor-at-a-time experiments must be carefully designed, executed and evaluated.

Systematic Approach

We differentiate between variable and discrete influence parameters. Before one deter-mines experiments or experimental series, one must think about what type of influence thevariable factor has. When preparing one-factor-at-a-time experiments we will get ac-quainted with terms which are later of prime importance when investigating the general n-dimensional case.

a) The most simple type of influence is the linear influence.

Increasing the influence by a fixed amount always brings about the same effect, independ-ent of the chosen levels (see Chapter 7). Many known natural laws of physics or chemistryare linear (examples?).


6/104

- 6 -

Differential calculus linearizes (nearly) arbitrary functions. However, one should not as-

sume that a fact to be investigated can be linearized just as a matter of simplicity.

The statement that every problem

can be linearized when the differ-

ence between the steps of the influ-

ence parameters is small enough,

may be correct, though this is of

little practical value, since what is

considered small enough mustthen be clarified.

For instance, a temperature differ-ence of 1C can be small in manyproblems, but in other problems, thisincrement may be large.

Extrapolation beyond the investigated region is only permissible if the function is known.The same restriction applies to interpolation.

Because a system generally exhibits significant background noise, erroneous interpreta-tions of experimental results easily occur even though linearity is ensured.

b) Further generalization of linear influence is the monotonic influence (synonym: ten-

dency, directional factor).

A monotonic influence is apparentwhen the input quantity can influ-ence the output quantity in only onedirection.

More precise: A monotonic influ-ence is apparent when an increase ofthe input quantity invariably causeseither an increase or a decrease of

the output quantity.

Monotonic influence parameter: The choice of thesteps influences the size of the effect.


7/104


8/104

- 8 -

Task 1

When doing a one-factor-at-a-time experiment one should make sure that factors notconstituting the object under investigation neither falsify the results nor restrict the con-firmation. One should discuss, using concrete cases, how this basic principle can be real-

ized (e.g. by randomization).

Task 2

A glass of water is put inside a freezer and the time required for the water to freeze is re-corded.

The initial temperature of the water (between 10 C and 100 C) should be determined sothat the time interval up to freezing is as long as possible (optimization problem).

How do you investigate the process empirically?

Task 3

a) Assuming that a process is definitely linear.How many supporting points does one need to represent the natural law explicitly?How should the system noise be considered?How does one select the supporting points?

b) How can one invalidate linearity empirically?

Task 4

Electron-impact experiment doneby Franck and Hertz:The anode current flowing to theexciting anode as a function of theanode voltage.

a) How can the process representedin the adjacent figure be investi-gated empirically?

b) What could a physicist con-clude, who only performed testsat 5V, 10V and 15V?


9/104

- 9 -

Summary:

From the considerations discussed in this chapter, it is clear that when investigating the

influence of a single factor, the given situation is very important - how many measure-

ments must be made, how many repetitions must be undertaken and where they have to be.

There is therefore no strict recipe for conducting empirical investigations. Thus, it is not

appropriate to teach recipes.

Scheme

Target quantity(ies):

Influence variable(s):

Other factors influencing the target quantity are (which are nonetheless not the objective

of the investigation):

How are the quantities considered?

Prior knowledge:

Number of the steps:

Reason:

Number of repetitions:

Reason:

Additional points to be considered:

1.2 Two-Factor Method

If one wants to investigate the influence of two factors within a system, phenomena have

to be observed that dont arise during one-factor-at-a-time investigations. Since these phe-nomena are symptomatic for the general n-dimensional case, a thorough investigation is

beneficial.Before one determines an experimental arrangement (incl. experiments size), what isknown and unknown regarding the two factors (and what is then to be investigated empiri-cally) must be systematically established.At first, a cognitive investigation takes place in principle; i.e. a knowledge-based de-scription of the two-factors system. What is helpful as in the one-factor-at-a-time methodis differentiation between discrete and variable influence parameters.

There are 3 cases to be differentiated.


10/104

- 10 -

Case 1: Both influence factors are discrete

Influence factor A with k levels: A A Ak1 2, , . . .,

B with l levels: B B B l1 2, , . . .,

There are k l system states.

Example:

Target quantity: YieldPlants A A1 2,

Pesticides B B B1 2 3, ,

Remark:

It is clear that in general one cannot derive other system states A Bi j from the knowledgeof an empirical result.

Case 2:

One discrete influence factor A A Al: , . . .,1One variable influence factor.

Example:

System: Solution

Target quantity: Solubility

A: Chemical substanceB : Temperature

It is possible, in principle to de-scribe the system in k characteristic

lines (family of characteristics). Ingeneral, one deals with k differentone-factor problems. Is it possibleto make a general deduction fromone characteristic line to the othercharacteristic line?

When characteristic curves are shifted upwards in parallel (depending on the discrete fac-tor), one speaks of an interaction-free system.

Solubility of several inorganic substances as a func-tion of temperature


11/104

- 11 -

Case 3: Both influence parameters are variable.

The information can be represented in 3-dimensions (see Figure).

Complex Motronic ignition map (ignition angle as a function of load and engine speed)

Hereby the values of both variable influence parameters (in this example, load and engine

speed) constitute the coordinates of points in a plane. The function is then represented by a

mountain above this plane (see Figure 7.1).

The experimenter must now specify the region in which empirical investigations are to beperformed. It fully depends upon the physical question - how many experimental pointsshould be foreseen.The idea that the experimental scope can be reduced by means of a combinatorial magic issimply erroneous. The scope can only be reduced through precise task formulation and useof the knowledge already verified.

Task:

System: CakeTarget quantity: Height of cakeInfluence parameters: Yeast, water

How can one investigate the system empirically?What does an array of characteristic curves look like in principle?

Ignition Angle

LoadRotationalSpeed


12/104

- 12 -

Summary:

To handle two influence factors just like in the one-dimensional case: the combinatorialarrangement of experimental points, the number of repetitions per point etc. fully dependupon how a question is formulated. Generally binding rules, in an algorithmic sense, can-

not exist.The case differentiation

discrete - discretediscrete - variablevariable - variable

is helpful.

1.3 General Case (Numerous Influence Factors)

A complex system, with numerous influence factors, poses a challenge. It is clear that thetime needed for an investigation increases with the number of factors to be considered. Itwould be good if one could reduce the time needed for an experiment through a combina-torial magic.It is unfortunately not so. The only way to reduce experimental expenses is by applyingthe existing knowledge in a systematic way. This systematic approach must help the prac-titioner but not force him to have to employ terminology he does not understand (nor is

expected to know). The practitioner must be able to present his knowledge or his pre-sumptions in a simple and rational manner.(Furthermore, the assumptions must be system-based and plausible!)

We differentiate between variable and discrete influence parameters.

The description of the type of influence of the individual quantities belongs to principle-system description. In view of the fact that we may have to differentiate among numerousinput quantities, a careful description of the influence of individual input-quantities isespecially important.

Naturally, the influence of an individual quantity depends upon the position of the other

input-quantities, but because of this, it must be determined whether the physical-chemicalcharacter of the individual quantities permits making principle statements about the typeof influence, independent of the other quantities.

With the systematic approach, it is preferable to begin with considering the discrete influ-ence parameters. If for instance, A is a discrete influence parameter with the levels A1, A2(e.g. metal type) then the following should be asked:Is one of the two levels, in respect of the target quantity, better, in principle, than the oth-ers or not?If it is not the case, then this means that the answer to the question depends upon the posi-tion of the other factors.


13/104

- 13 -

Example:

A Definite A Ambiguous

Remark:It is usually preferred to begin by investigating the discrete influence parameters, basically

because the different steps often represent the system states to be differentiated. (Dontcompare apples with oranges!).

Variable InfluenceBefore determining the experiments or experimental series, the overall influence should bedescribed (i.e. without determining the levels). Relevant terminology is known to us (see1.1 and 1.2).

Black BoxIf, after a careful analysis, all of the influence parameters are ambiguous or if the characterof the influence is unknown, then the matter can not be investigated empirically. If onenevertheless, wants to conduct the experiment, all the strategies then become nearlyequivalent (all cats are grey at night).

Trial Task:

System:A board fixed on one side.

Target quantity:Lowering of the free end.

Influence quantities:Types of wood H H H 1 2 3, , ,

length, breadth, height,force F


14/104

- 14 -

I. a) Perform a global system analysis with the help of the system matrix!

b) If appropriate, draw an array of characteristic lines!

Length Breadth Height Force

Linear

Monotonic

Non-monotonic

Unknown

II. Given are

Length: 1.5 m

Breadth: 20 cm

Height: 4 cm

Force: 20 N

All 4 quantities can be reduced by up to 10%. Target is a board which is lowered as little

as possible. Perform a local analysis!

Which experiments or experimental series would you perform?

Factors Length Breadth Height Force

Definite

Ambiguous

Unknown

Trial Task:

System: Green house

Target

quantity: Yield of useful plants

Influence-

quantities: Types of plants P P P1 2 3, ,

Types of soil B B1 2,

Chemicals C C C1 2 3, ,

Water quantity (irrigation)

Light

Temperature


15/104

- 15 -

1. Perform a detailed system analysis (with system matrix)!

2. Draw arrays of characteristic lines!

What can be said about interactions?

3. Which experimental strategy is recommendable?

Trial Task:

a) What does the optimization strategy of a monotonic system look like?

Global System Matrix:

Factor A B C D

Monotonic

Non-monotonic

Unknown

b) What does the optimization strategy of the following system look like?

Factors A B C D E F

Levels A1 A2 B1 B2 C1 C2 D1D2 E1 E2 F1 F2

Definite X X

Ambiguous X

Unknown X X X


16/104

- 16 -

2. Industrial Experimentation Methodology and System Theory

The terminology or key words summarized under D.O.E., Statistical Experimental Design,

Taguchi, and Shainin methods, as mentioned earlier, are either required or initiated by

customers and also used in specialized literature.

With respect to the practical relevance of the methods mentioned above, reference is made

to the following:

Taguchi Method

The Taguchi method is characterized by, among other things, the usage of the so-called

orthogonal arrays to reduce the required extent of the experiment. The use of the method is

dependent upon negligibility of interactions or - in exceptional cases - the predictability of

interactions. These assumptions are controversial; nevertheless, successful examples are

often quoted in the literature. These successes are not verifiable and usually not rationally

comprehensible. What is confirmed is that substantial misstatements can be proved withthe orthogonal arrays.

D.O.E. (= Design of Experiments)

Anybody who has ever thought about performing an experiment, has practiced experi-mental design. Thus, one can never ask the question whether one is for or against experi-mental design. With regard to the contents of textbooks about the D.O.E.-subject, how-ever, there are some reservations, for instance:

All algorithmic approaches are based on models, i.e. a mathematically quantitativemodel is suggested to represent the reality to be investigated. All subsequent proce-dures (experimental designs, evaluations etc.) are only reasonable if the model ade-quately describes the reality.

The difficulty of selecting the right model is fundamentally natural.

From the results structure, it is not possible to recognize whether the model is ade-quate (i.e. verification is neither a prior nor a posterior possibility).

A way out of this difficulty is only possible via a system-theoretical approach.

Shainin Method

For Shainin method see Chapter 10 and [11].


17/104

- 17 -

2.1 Hints on System Analysis

The prerequisite for a reasonable experimental design is a system analysis. The purpose of

a system analysis is, among other things, to present existing knowledge or lack of know-

ledge for the system to be investigated with the help of elementary terms. Theoretical DOE

terms are to be avoided at this stage for various reasons. After executing the system analy-sis, a decision can be made, to some extent deduced, about the experimental strategy that

is appropriate. Automation in the sense of a strict recipe is not appropriate and therefore

not to be pursued. Formulation of General Systems Theory terminology is used.

Generally it may be assumed that the system to be investigated does not represent ablack-box. (It is self-evident that a real black-box cannot be investigated with formalprocedures). Hence the specialist will be able to make principle statements about the in-put-output-situation of the system. An explanation in principle, i.e. qualitatively correctexplanations, are preferred to precise quantitative statements that are for various reasonsoften false (better be approximately right than exactly wrong).

2.2 Short Description of the System-Theoretical Procedure

System analysis begins with system definition. This includes listing all relevant targetquantities (output) as well as relevant influence parameters (input).

Here for instance, flow charts and cause-and-effect diagrams can be helpful. When dealingwith input-quantities, e.g., care should be taken about independence, susceptibility andpossibility of definite establishment.

Subsequent to completion of the system definition, the system characteristics are to bedescribed. System analysis is a recursive process. In the ideal case, all relevant systemcharacteristics are known and investigating the system via experiments becomes unneces-sary.

A statement about system noise belongs to system characteristics description, i.e. thedescription of output-quantities behaviour when given input-quantities are kept constant.

The knowledge of system noise has vital consequences to the type and scope of impendinginvestigations. Describing the functional input-output situation is important within thescope of information about system characteristics. In view of the fact that normally severalinput-quantities exist, describing the influence of the individual input-quantity is espe-cially important. Naturally, the influence of an individual quantity depends upon the posi-tion of the other input-quantities, and for this reason it is especially important that thephysical-chemical character of the individual quantity permits making principle statementsabout its type of influence, independent of the other quantities.

Here the following formulation of terms can help further:

Global description


18/104

- 18 -

Linear influence (as a special case

of the monotonic influence):

A linear influence exists, if the

functionsf A( , . . . )

are always lin-

ear (linear influence factors are

certainly exceptional cases).

Monotonic influence:

A monotonic influence exists if the

input-quantity can only influence

the output-quantity in one direction.

Non-monotonic influence factor:

A non-monotonic (dichotomous)

influence exists if the input-quantityis influenced in both directions (i.e.

both upwards and downwards). Here

also the characteristic of the influ-

ence factor depends upon the posi-

tion of the other influence factors. It

is generally assumed, however, that

the type of the dichotomy is an in-

variant of the influence factor, i.e.

the dichotomy is independent of the

position of the other factors.

Characteristics of a linear influence factor

Characteristics of a monotonic influence factor

Characteristics of a dichotomous influence factor


19/104

- 19 -

2.2.1 Global System Matrix (i.e. without quoting any levels)

Considering the special role of discrete input-quantities, every single quantity is then

specified according to how someone, conversant with the system, determines the influence

character (without quantification). Hereby reference is made to the above type classifica-

tion.

The results are summarized in the global system matrix:

Factors A B C Z

Linear

Monotonic

Dichotomous

Unknown

A completed global system matrix can alreadydepict a sensible experimental strategy.

Example:

If all influence factors are monotonic, then it is simple to optimize the system and the only

question needed to be asked is what influence factors are decisive for the optimum. Here

reference can be made to the Shainin method.

2.2.2 Local System Consideration

Often, an experimental strategy directly follows from the global system consideration.

Because the global characteristics array, especially that of the dichotomous influence fac-

tors, is often very complex, system consideration must be localized; i.e., the levels of theinfluence factors must be prescribed and the properties of the system relative to the pre-scribed levels considered.

For the special case between the two steps, the following case-differentiation is to bemade:

1. Univalent Influence Factor (univalent = definite)If the target quantity is only moved in one direction with a change from A1 to A2 ,

i.e. f A f A( ) ( )1 2 0 >

or always f A f A( ) ( )1 2 0 < ,

then a univalent factor exists.

Hint:Because of localization, a dichotomous factor can be univalent. To some extent, how-

ever, there exists some correspondence between univalent and monotonic factors.


20/104

- 20 -

2. Bivalent Influence Factors (bivalent = ambiguous)

Bivalent factors, according to definition, are factors which are not univalent. That

means that the factor, depending on the position of the other factors, influences the tar-

get quantity both upwards and downwards when the level of the influence factor is

changed as prescribed. The behaviour of a bivalent factor is, as such, synergetic or an-

tagonistic. It is of special importance to find out which ones of the other factors cause

the changes.

2.2.3 Local System Matrix

(depending upon the selected levels, i.e. there exists not only one local system matrix).

The results of the local system consideration are summarized in the local system matrix.

Example:

Factors A B C Z

Levels A1 A2 B1 B2 C1 C2 Z1 Z2

Univalent

Bivalent

Unknown

A completed local system matrix gives an indication of the complexity of localized prob-

lems. The simplest case exists when all influence factors are univalent. Then the experi-

mental strategy is obvious. The most difficult case exists when all influence factors are

bivalent or when the character of the influence is unknown.

In this case, a simple experimental strategy is (without further information) impossible.

Especially, reasonable optimization with a small experimental series is not attainable.

2.3 Summary

The statement made in the QS-Info 1/90 there is no alternative to statistical design of

experiments is only correct if, under statistical design of experiments, one understandsthe systematic, i.e., the system-theoretical design of experiments by considering the statis-tical points of view.If under statistical design of experiments, however, one understands the contents of thetextbooks about statistical design of experiments (from Fisher via Box to Taguchi), then itis assumed that these contents are not or are only seldom transferable to real-life. Similarreservations are made with respect to commercial software-packages. Especially, everypolemic against the so-called conventional methods is uncalled-for. A consequent appli-cation of the system-theoretical attitude will often lead to the need to account for conven-tional investigation types in other cases, however, this can lead to the formal approachesbeing seen as promising. Holding to stubborn schools of thought is certainly detrimental atlong-term.


21/104

- 21 -

3. Probability Plot

When one speaks about a normal distribution, one mostly associates this concept with a

Gaussian bell-shaped curve. The Gaussian bell-shaped curve is a representation of the

probability density function x( ) of the normal distribution:

f x e

x

( ) =

1

2

1

2

2

.

This function and its graphic representation are printed on the 10 DM bank note, be-sides the portrait, in honour of the mathematician called Carl Friedrich Gau.

The normal distribution assigns to every value x the probability that a random variable Xtakes a value between and x . One acquires the distribution function F x( ) of the

normal distribution, in that he integrates over the above given density function.

f x e dv

vx

( ) =

1

2

1

2

2

F x( ) corresponds to the area up to the value x , under the Gaussian bell-shaped curve.

The graphical representation of this function has an s-shaped form. Thus, strictly speaking,one must always think about this curve whenever a normal distribution is concerned.

If the y-axis, in this representation, is now distorted such that a straight line evolves out ofthe s-shaped curve, a new coordinate system - the probability paper - emerges. The x-axisremains unchanged.Because of this association, a normal distribution in this new coordinate system is alwaysportrayed as a straight line on the probability paper.

One uses this fact in order to check graphically for the normal distribution of a given dataset. As long as the number of measured values given is large enough, one creates a histo-gram of these values, thus determining the relative frequencies of values within the classesof a grouping. If the cumulative relative frequencies found are now plotted over the rightclass limits on the probability paper and a series of points approximately lying on astraight line is obtained, then it can be inferred that the values of the data set are approxi-

mately normally distributed.

Remark:The recording of measurement values or groups of measurement values ordered accordingto the factor levels on probability paper is a component of the SAV-program (see Chapter8.4 Computer aid and [9]).

Hint: In German, two different denotions are used in this context. Wahrscheinlichkeits-netz stands for the coordinate system in which the data are plotted and Wahrscheinlich-keitspapier denotes the form (sheet) with the pre-printed coordinate system (see chapter

3.2), whereas in English textbooks the denotion probability paper is used for both.


22/104

- 22 -

3.1 Probability Plot of Small-Size Samples

The size of a sample for creating a histogram or calculating relative frequencies is often

not sufficient, so that representation on the probability paper according to the above-

described method is not possible. There is a way out of this dilemma, which is explained

below.

The processes can be understood easily by means of computer simulation.

One takes a sample of size n : x x xn1 2, , . . . , from a standard normally distributed popula-

tion ( = 0 , = 1) and arranges the values in order of magnitude:

( ) ( ) ( ) x x xn1 2 . . . .

The number assigned to each of the sample values in this increasing sequence is called

rank. The smallest value ( )x 1 has therefore the rank 1, the greatest value ( )x n the rank n .

Then one determines the value F F xi i= ( )( ) from the table of standard normal distributionfor every ( )x i ( , , . . ., )i n= 1 2

If this process is frequently repeated, then the cumulative frequencies H ni( ) ensue for

every rank i as a mean value of Fi (in actual sense, the median will be considered).

To every sample size 6 50 n these cumulative frequencies H ni ( ) are given for eachrank i in Table 1 (Section 12).

We now consider a sample of size 10 for example, which should be tested for normal dis-

tribution:

2.1 2.9 2.4 2.5 2.5 2.8 1.9 2.7 2.7 2.3.

The values are sorted according to magnitude:

1.9 2.1 2.3 2.4 2.5 2.5 2.7 2.7 2.8 2.9.

The value 1.9 has rank 1, the value 2.9 rank 10. In the table in the appendix (sample size

n = 10) one finds the cumulative frequencies (in percentage) for every rank i :

6.2 15.9 25.5 35.2 45.2 54.8 64.8 74.5 84.1 93.8.

Finally, one chooses a suitable division (scaling) for the x-axis of the probability paper

corresponding to the values 1.9 up to 2.9 and enters the cumulative frequencies versus the

well-sorted accompanying sample values on the probability paper. One therefore marks the

following points in the example considered above:

(1.9; 6.2), (2.1; 15.9), (2.3; 25.5), ...

..., (2.7; 74.5), (2.8; 84.1), (2.9; 93.8).

Because these points are well approximated by an eye-fitted straight line, it can be as-

sumed that the sample values are approximately normally distributed.


23/104

- 23 -

3.2 Probability Paper

The plot of the above described points will be simplified if the so-called probability paper

is used. This is a special form where horizontal lines are drawn at the positions of the cu-

mulative relative frequencies which correspond to ranks i .

The probability paper for the sample size n = 10 therefore exhibits horizontal lines for thevalues:

6.2% 15.9% 25.5% ... 74.5% 84.1% 93.8%.

Hint:

The cumulative frequency H ni ( ) to the rank i can also be calculated with the following

approximation formulas

H ni

ni ( )

.=

05and H n

i

ni ( )

.

.=

+

0 3

0 4.

The deviation from the exact value in the table is thereby insignificant.

Approximating values for n = 10:

5% 15% 25% 35% 45% 55% 65% 75% 85% 95%


24/104

- 24 -

4. Comparison of Samples Means

4.1 t Test

The t test is a statistical method with which a decision can be made to determine whetherthe mean values of two samples are significantly different. In order to clarify the func-

tional nature of t tests, we will perform the following mental experiment:

We derive from a normally distributed population N(, ) two samples each of size n,

calculate the mean values y1 and y 2 as well as the standard deviations s1 and s2 (or the

variances s12

and s22) and finally deduce the value

t ny y

s s=

+

1 2

1

2

2

2.

t can take values between 0 and + . If we repeat this process very often, we expect thatmainly values near zero occur and very large values are rarely found.

This mental experiment was performed by computer simulation. For n = 10 and 3,000sample pairs ( t -values), the result was the histogram represented in Fig. 4.1.

Fig. 4.1


25/104

- 25 -

If one simultaneously lets the number of samples approach infinity and the class width

approach zero, the histogram will more and more approach the straight line that represents

the density function of the t distribution.

The upper limit of the 99% random variation range (percentage point) is, in this example,

t18 0 99 2 88; . .= , i.e. only in 1% of all cases can values greater than 2.88 randomly occur.

Percentage points of the t distribution are tabled for different error probabilities depending

upon the number of degrees of freedom f n= 2 1( ) (Table 2). The t test approach isbased on the relationship represented above.

A decision shall be made whether the arithmetic mean values of two existing series of

measurements (each of size n ) can belong to one and the same population or not. As theso-called null hypothesis, it is therefore assumed that the mean values of the respectively

affiliated population are equal.

Hence, the test statistic becomes calculated from both the mean values y1 and y 2 as well

as the variances s12

and s22:

t ny y

s s=

+

1 2

1

2

2

2for n n n1 2= = .

If t t n> 2 1 0 99( ); . is the result, i.e. t lies outside the 99% random variation range, the nullhypothesis will be rejected.

Hint: The expression for the test statistic t is then, in the simplest form only applicablewhen both the variances of the populations as well as the sample sizes are assumed to be

equal respectively ( 12

2

2= and n n n1 2= = ). The prerequisite for equal variances can betested with the help of an F test (see 5).

The t test, in the form represented here, tests the null hypothesis 1 2= against the al-ternative 1 2 . As such, a two-sided question exists. For this reason, the absolutevalue of the difference of the means is contained in the expression for t .

t can hence only assume values 0 , so that the distribution depicted in Figure 4.1 re-sults.

Table 2 in Section 12 gives the 95%, 99%, and 99.9% percentage points of the t distribu-

tion in correspondence with the two-sided question. They correspond to the one-sided per-

centage points: 97.5%, 99.5% and 99.95%.


26/104

- 26 -

4.2 Minimum Sample Size

In the preceding Section 4.1 it was explained how one can decide, by means of a t test,

whether or not the mean values of two samples are significantly different.

This decision is frequently the goal of experiments, by which the change of a target char-acteristic in dependence upon two system states or two settings of an influence factor is to

be determined. The subsequent intention with respect to pursued system optimization is to

choose the better one between two selected settings.

This especially applies to experiments witch use orthogonal arrays, by which several in-fluence factors are concurrently varied on two levels (see Chap. 7).

The executed factorial analysis of variance (see 8.2) in the scope of the evaluation of suchexperiments is, in principle, nothing other than a comparison of mean values of all ex-periment results attained for two settings (levels) of an influence factor, by consideringexperimental noise.

In the preparatory phase of such experimental investigations, the experimenter often asksthe question: which minimum mean value difference is of actually interest in view ofhis target (system optimization, production simplification, costs reduction), and whichminimum sample size n must be chosen, so that the minimum mean value distance, if ac-tually existent, is ascertained as a result of the experimental evaluation (significant).

From the expression for the test statistic t (see Section 4.1)

t ny y

s s

=

+

1 2

1

2

2

2

it is apparent that for a significant test result, n must be the greater, the smaller the meanvalue difference y y1 2 is and the greater the variances s1

2 and s22 of both of the series to

be compared are. Note that the table value tTable is smaller at increasing number of degrees

of freedom f n= 2 1( ) .

Visually, a small difference of mean values by a simultaneously greater variance of dis-tributions means that both groups of values are visually either indistinguishable or arehardly distinguishable in a graphical representation of both measurement series.

Based on the previous discussion, it is possible to estimate the minimum sample size nroughly, by assigning the mean value difference as a multiple of a mean variance

s s12

2

2

2

+and for different n the calculated test statistic t is compared with t

Table(ob-

serve the degrees of freedom and significance level!).


27/104

- 27 -

Besides this trial method, however, there is an exact deduction method for the minimumsample size from the statistical point of view, which we only sketch roughly at this point(deduction in [1] and [7]).

By comparing the mean values of two series of measurements and the corresponding test-

decision, two types of errors are possible.In the first case, both series of measurements originate from the same population, i.e. thereis no significant difference. If one decides here, due to a t test, that a difference of bothmean values exists, then an error of the first kind ( ) is made. It corresponds to the sig-nificance level of the t test (for example = 1% ).

If, in the second case, a difference of the mean values actually exists, i.e. the measuredseries originates from two different populations, then this will not be indicated with abso-lute certainty by the test. The test result can coincidentally indicate that this differencedoes no exist. One speaks in this case about an error of the second kind ( ).

For the person performing the experiment, both of these error types are unpleasant, be-cause for example due to the likely significant effect of an influence factor, further expen-sive investigations may be initiated or even changes in the production process (error of thefirst kind; type I error), or because the actually significant effect is not identified, thechance to make possible process improvements (error of the second kind; type II error) ismissed.

The minimum sample size n , which is required in order to identify a real mean value dif-

ference depends upon both the distance 2 1

= =D of the mean values given in

units of standard deviation in correspondence with the above plausibility considerationand the error probabilities and .

( )n

u u

D=

+ 2

2

In the concrete case of comparing two series of measurements, the mean values 1 and 2 as well as the standard deviation of the population (subsequently also D ) are notknown. They become estimated through the empirical values y1, y 2 and s . For this rea-

son, when calculating n according to the given formula, the t distribution must be taken as

a basis.

Accordingly, u and u are the abscissa values u , by which the t distribution assumes

the values (two-sided) or (one-sided).

Smaller error probabilities, i.e. smaller type I ( ) and type II errors ( ) mean that bothdistributions to be compared and thus also the distributions of the mean values may onlymarginally overlap. For this, with a given mean values distance D , the sample size n mustbe chosen adequately large.


28/104


29/104

- 29 -

Stronger effect

Medium effect

Weaker effect


30/104

- 30 -

5. F Test

The F test is a statistical method, with which it can be decided, whether the variances of

two samples are significantly different.

The functionality of the test can be explained, just as in the case of the t test, using the

result of a computer simulation.

We take two samples of sizes n1 and n2 from a normally distributed population N( , ) and calculate the sample variances s1

2and s2

2, and from this finally calculate the quantity

Fs

s= 1

2

2

2.

F can take values between 0 and + . It is plausible that by frequent repetition of this

procedure, small values near zero and very large values result very rarely.

The results of a computer simulation, by which the F-values for N= 3 000, sample pairsare determined with sample sizes n n n1 2 9= = = , are represented as a histogram in thefollowing figure.

Figure 5.1


31/104

- 31 -

If one lets the number of samples approach infinity and, at the same time, the class width

approaches zero, the histogram will approximate the line in Fig. 5.1 (density function of

the F distribution).

The shape of the histogram depends upon the sample sizes n1 and n2 of the investigated

sample pairs; the curve shape of the density function of the F distribution correspondinglydepends upon the degrees of freedom f n1 1 1= and f n2 2 1= .

The upper limit of the 99% random variation range (percentage point) in the calculated

example is F8 8 0 99 6 03; ; . .= , i.e. only in 1% of all cases (error probability) is random

s s12

2

26 03 . .

The percentage points of the F distribution are tabled in the appendix for different error

probabilities dependent upon the degrees of freedom 1 and 2 .

The relationship represented above makes the approach by F test understandable.

It should be decided whether or not two series of measurements, with sizes n1 and n2,

originate from two normally distributed populations with the same variance (the mean

values do not need to be known).

As a null hypothesis, it is assumed that the variances of the respective populations are

equal: 12

2

2= .

Finally, the test statistic Fs

s= 1

2

2

2will be calculated from the variances s1

2and s2

2of both

measurement series and compared with the percentage point of the F distribution. If the

result is F Fn n> 1 21 1 0 99; ; . , i.e., F lies outside of the 99% random variation range, thenthe null hypothesis will be rejected.

Remark:

The alternative hypothesis is 1

2

2

2> ; a one-sided problem is in question.

In principle, when one writes the greater one of the two variances s12

and s22

above the

fraction line, then F can only assume values greater than 1; now there is a two-sidedquestion. If an error probability of = 1% is chosen the percentage point must be deter-mined with an accuracy of 99.5%.


32/104

- 32 -

6. Analysis of Variance (ANOVA)

With the help of the t test (Section 4.1) a determination is made whether the mean values

of two series of measurements are significantly different. The series of measurements to be

compared can be considered formally as experimental results for both respective levels1(e.g. material A) and 2 (material B ) of an individual influence factor (material).

If one expands the one-factor-at-a-time experiment to more than two levels (general: klevels), then it is no longer possible to compare the mean values using the t test. In this

case, an evaluation can occur by means of the analysis of variance.

If the factor A has no influence upon the measurement results, then all individual results

y i j can be seen as originating from the same population. The y i j and thus also the mean

values y i are then only subjected to random deviations (experiment noise) of the com-

mon mean value .

In the other case - the factor A has a significant influence upon the result of measurement -the mean values 1 , . . . , k of the distributions belonging to the levels A Ak1, . . . , of thefactor A will be different.

In the scope of the analysis of variance, one sets k independent, normally distributedpopulations with the same variance as prerequisite and formulates the null hypothesis:

All measured values originate from populations with the same mean value 1 2= = = =. . . k (Remark: Since identical variances were a prerequisite, the nullhypothesis means that all measured values originate from one and the same population).Therefore one calculates the mean variance within the experimental rows (levels ofA)

( )s sk

sy yi

i

k

22 2 2

1

1= = =

as well as the variance between the experimental rows (levels ofA) s sy12 2= .

sy2 Is a measure for the experimental noise. sy

2 Is the variance of the mean values y i.

If the null hypothesis is correct, both factors are estimates of the variance of the underly-

ing population:

$ 12 2= n sy $ 2

2 2= sy .

The factor n is to be considered because of the relationship

y

y

n= .


33/104

- 33 -

Finally, one conducts an F test with the test statistic

Fn s

s

y

y

= 2

2

(comparison of both estimates), and rejects the above formulated null hypothesis, if

F Fk n k

> 1 1 0 99; ( ) ; . . (percentage points for F in the appendix)

Rejection of the null hypothesis means: a significant difference exists with regard to the

mean values y i of the results of measurement for the levels of factor A, or: factor A has a

significant influence upon the result of measurement.

Figure 6.1

Figure 6.2


34/104

- 34 -

Figures 6.1 and 6.2 should illustrate the importance of this fact. Along the diagonals, the

density functions of normal distributions with equal variance are represented respectively.

In the corners of the figures, the density functions of the mixture of distributions (top left)

and of the distribution of the mean values (bottom right) are represented.

The distributions on Figure 6.1 are only subjected to small mean-value fluctuations, the

mixture of distributions is nearly normally distributed.

The variance of the distribution of mean values and original distributions are rarely differ-

ent, so that an F test does not reject the null hypothesis (identical mean values). In com-

parison with this, the mean values of the seven distributions in Figure 6.2 show greater

fluctuations, the variance of the mean-value distribution is substantially (significant)

greater than that of single distributions.

Accordingly, the null hypothesis, that is the assumption of identical mean values, will in

this case be rejected within the scope of an analysis of variance.

6.1 Deriving the Test Statistic

The term analysis of variance is based on the decomposition of variation of all measured

values in both parts - random variation (experimental noise) and systematic deviation ofthe mean values associated with the above represented formality.

This decomposition is described as follows. When k represents the number of rows and nthe number of measured values (experiments) per row, then the overall variance of alln k measured values is given by

( )sn k

y yi jj

k

i

n2 2

11

1

1=

== .

The quantity Q n k s= ( )1 2 is called the sum of squares (SS).

( )Q y yi jj

k

i

n

= ==

2

11

( )Q y y y yi j j jj

k

i

n

= + ==

2

11(expansion with zero)

( )Q y y y y y y y yi j j i j jj

k

j j

i

n

= + + ==

2

1

2

1

( ) ( ) ( )

If we first consider the middle term:

( ) ( ) y y y yi j jj

k

j

i

n

==

11

= = ===

( ) ( ) y y y y y yi j jj

k

j i j j

j

k

i

n

i

n

1 111

.


35/104

- 35 -

=

+

== == == y y y y y y y j i j ji

n

j

k

i j

j

k

i

n

j

j

k

i

n

( )11 11 11

=

+ == y y y n k y n k y j i j j

i

n

j

k

( )11

2

( )= ==

y n y n y j j jj

k

( ) 01

Therefore:

( ) ( )Q y y y yi j jj

k

j

j

k

i

n

i

n

= +

= ===

2

1

2

111

Q n s k sj yi

n

j

k

= + == ( ) ( )1 12 2

11

( ) ( ) ( )n k s k n s n k sy y = + 1 1 12 2 2

Q Q Q= +1 2

Overall variation = experimental noise + variation of mean values

Degrees of freedom ofQ1: k n1 1= ( )Degrees of freedom ofQ2: k2 1= Degrees of freedom ofQ: n k= 1

Equation of the number of degrees of freedom:

= +2 1

n k k k n = + 1 1 1( )n k n k = 1 1

Test statistic: F

Q

f

Q

f

n k

ks

k n

k ns

n s

s

y

y

y

y

= =

=

2

2

1

1

2

2

2

2

1

1

1

1

( )

( )

( )


36/104

- 36 -

6.2 Equality Test of Several Variances (According to Levene)

With the one-way analysis of variance, it is investigated whether a factor A has a signifi-cant influence upon the result of measurement. Thus a determination is made whether the

mean values 1 , . . . , k of the measurement results which belong to the levels

A Ak1, . . . , are significantly different.

Frequently the aim of the experiments in this case is to maximise or to minimise a target

quantity.

In connection with investigating disturbance-insensitive (robust) designs, it can be of in-

terest to find out parameter settings, at which the experimental results possibly exhibit

little variation (variance).

For this reason, it is sensible to initially check whether the variances of the results in the

individual experimental rows are significantly different.

Experiment No. Results Mean Variance

1 x x x n11 12 1, , . . . , x1 s12

2 x x x n21 22 2, , . . . , x 2 s22

k x x xk k k n1 2, , . . . , x k sk2

Deviating from our notation to date, we designate the determined measured values with x

and calculate the row mean values x i as well as the variances within the rows s i2.

To test the equality of these variances s i2, Levene proposes the following method:

0. Formulate the null hypothesis:

All results of measurement originate from populations with equal variance:

12

2

2 2= = =. . . k.

1. Calculate the absolute deviations of the results of measurement x i j from the

mean values x i . This corresponds to a transformation according to the equation:

y x xi j i j i= .

The transformed values y i j are entered in the evaluating scheme.

Further calculation is done exclusively with the transformed values y i j .


37/104

- 37 -

Experiment No. Results Mean Variance

1 y y y n11 12 1, , . . . , y1 s12

2 y y y n21 22 2, , . . . , y 2 s2

2

k y y yk k k n1 2, , . . . , y k s k2

2. Calculate the mean values y i and variances sy2

3. Calculate the mean value of the variances sy2

4. Calculate the variance sy2

of the mean values y i

5. F test with the test statistic

Fn s

s

y

y

= 2

2Degrees of freedom: k1 1= , n k2 1= ( )

If F, for example, is greater than the percentage point Fk n k 1 1 0 99; ( ) ; . , then the null

hypothesis will be rejected with an error probability


38/104

- 38 -

7. Design of Experiments with Orthogonal Arrays

and Evaluating such Experiments

In this section, two simple examples will be used to represent how orthogonal arrays are

applied:

Example 1: One-factor-at-a-time method

The change in length of an alloy should be determined through experiment. Two experi-

ments will be performed.

1. Experiment: length at T1 25= C2. Experiment: length at T2 100= C

L C cm1 25 100 04( ) . = L C cm2 100 10016( ) . =

One starts with the fact that a linear relationship exists between expansion and temperatureand therefore wants to calculate the equation of the straight line in order to determine ar-bitrary intermediate values.

Equation of the straight line: L A A T = + 0 1

Through a coordinate transformation, as it is schematically represented in Figure 7.0.1through the second x-axis, the pair of values (T1, T2 ) will be formally transformed in (-1,

+1).

Figure 7.0.1


39/104

- 39 -

The transformation equation is x

TT T

T T=

+

2 1

2 1

2

2

.

Remark:

This equation can be written in the form given in 7.1 through a simple transformation:

xT T

T T=

+2

12 1

2( ) .

Substituting the values T1 25= C and T2 100= C gives: xT

= 62 537 5

.

..

For T T= 2 follows: x = +1.For T T= 1 follows: x = 1.

In the transformed coordinate system, the straight line equation is: L a a x= + 0 1 .

From there, follows for x = +1: 10016 0 1. = +a a ,for x = 1: 100 04 0 1. = a a .

At this point the reason for the coordinate transformation is clear; the coefficients a0 and

a1 are thus easy to calculate by addition or subtraction of both equations:

a 0100 16 100 04

21001=

+=

. .. a 1

100 16 100 04

20 06=

=

. .. .

The coefficient a0 is the mean value of both lengths: aL L

0

2 1

2=

+.

The coefficient a1 is the half effect (see Figure 7.0.1): aL L

1

2 1

2=

.

Thus, in the transformed system the equation of the straight line is:

L L L L L

x=+

+

2 1 2 12 2

L x= + 1001 0 06. . .

The equation of the straight line in the original system is found by reverse transformation

LT

= +

1001 0 0662 5

37 5. .

.

. L T= + 100 0 0016. .


40/104

- 40 -

Example 2: Two-Factor Design

This example should clarify the mathematical procedure followed when evaluating ex-

periments using orthogonal arrays applying a known and analytically exact physical fact -

Ohms law.

We put ourselves in the position of an experimenter, who does not know the relationshipbetween voltage, current and resistance and wants to investigate it with the help of a sim-ple experiment.

We assume he has conducted four individual experiments according to Figure 7.0.2 andignores experimental repetitions and measuring errors.

R 1 20= R 2 60= I A1 4= I A2 12=

Searched: U R I= ( , )

Transformation:

x

RR R

R R

R1

2 1

2 1

2

2

40

20=

+

=

x

II I

I I

I2

2 1

2 1

2

2

8

4=

+

=

Figure7.0.2


41/104

- 41 -

Multilinear formulation of solution:

U a a x a x a x x= + + +0 1 1 2 2 12 1 2

1. x1 1= x 2 1= a a a a0 1 2 12 80 + =

2. x1 1= + x 2 1= a a a a0 1 2 12 240+ =

3. x1 1= x 2 1= + a a a a0 1 2 12 240 + =

4. x1 1= + x 2 1= + a a a a0 1 2 12 720+ + + =

On the right side there are the voltages U, determined by individual experiment combina-tions.

a0 80 240 240 7204

320= + + + =

a1720 240

4

240 80

4160=

+

+=

a2720 240

4

240 80

4160=

+

+=

a12720 80

4

240 240

480=

+

+=

Substituted in the formulated solution, one gets: U x x x x= + + +320 160 160 801 2 1 2 .

Reverse transformation:

UR I R I

= +

+

+

320 16040

20160

8

480

40

20

8

4

U R I=

Remark:

In this example, the right solution (Ohms law) is bound to come out because the multi-linear form U a a x a x a x x= + + +0 1 1 2 2 12 1 2 was just the right formulation. A more com-plex functional relationship with quotients or exponentials of the influence factors wouldbe described with this formulation only approximately or otherwise never described at all(see 7.3).


42/104

- 42 -

Generalization:

For two factors and two levels, the equation of the multilinear form in the transformed

system is:

y a a x a x a x x= + + +0 1 1 2 2 12 1 2.

The coefficients can easily be determined with the following matrix. One designates this

matrix as an orthogonal arrangement or an orthogonal array (see 7.4). The term or-

thogonality in this connection, simply said, means that in each column both levels (-) and

(+) appear equally frequently (see also general formulation scheme in 7.4.1). The or-

thogonality is explained in [1] through mathematical orthogonality conditions.

I x1 x 2 x x1 2 y

+ - - + y1

+ + - -y

2

+ - + - y3

+ + + + y 4

a y y y y

0

1 2 3 4

4=

+ + +

a y y y y

1

2 4 3 1

4=

+ +( ) ( )

a y y y y

2

3 4 1 2

4=

+ +( ) ( )

a y y y y

12

1 4 2 3

4=

+ +( ) ( )

The coefficient a0 is the mean value of all measurement results. The coefficient a1 is the

half mean effect through a change ofx1 from -1 to +1.

a Effect x Effect x

1

2 21

2

1

2= = + =

( ) ( )


43/104


44/104

- 44 -

The representation in Figure 7.1.3 shows the contours of a hill (see Figure 7.1.2), as are

found on topographic charts. In the example shown, a jump from a line to the neighbouring

line corresponds to a height difference of 10 m.

Closely neighbouring contours represent a steep ascent in a direction perpendicular to the

contours. If one remains on a closed contour, then one moves - pictorially speaking - at a

constant height around the hill.

If we refrain from the picture of a hill and consider instead of the height generally a func-

tion y , which depends upon the parameters A and B : y A B= ( , ) .

Figure 7.1.2

Figure 7.1.3


45/104

- 45 -

y is a target characteristic, whose value is determined by the setting of the factors A and B.Each setting (A, B) then corresponds to a point in the A-B-plane and this again to a value y A B= ( , ) .

One finds for instance the following results:

A B y

6 12 43

12 12 62

6 20 78

12 20 113

The four points (A,B) form a rectangle in Figure 7.1.3.

They are drawn in Figure 7.1.2 over the A-B-plane with y as the third coordinate, whichcorresponds to the height above this plane.

From this representation, it is just as apparent as in Figure 7.1.1, that when dealing with

factorial designs at two levels, a linear model (straight line, plane) is taken as a basis, in

order to approximate the unknown, in general, curved response surface.

Figure 7.1.4 shows a further way to represent these results. The target characteristic y isentered as a function ofA withB as fixed parameter.

In Figure 7.1.3, an attempt is made to illustrate the three-dimensional surface y A B= ( , ) it corresponds to the hill surface two-dimensionally depending upon both factors AandB.

The dotted curves in Figure 7.1.4, on the contrary, represent the function y respectivelywhenB is fixed : y f A B const = =( , .)

They are, as such, the intersection lines of a perpendicular cut through the hills surfacewherebyB is constant (see Figure 7.1.2).

Figure7.1.4


46/104

- 46 -

Analogous to that, the dotted lines in Figure 7.1.5 represent the functiony when A is con-stant.

These facts are illustrated by the following figures, as further examples.

Figure 7.1.5

Figure 7.1.6


47/104


48/104

- 48 -

In principle, one can also use these methods for representing results of experiments.

The above scheme can be simplified, in which, one transforms the factorial levels A1 6= ,A2 12= , B1 12= , B2 20= respectively according to the following rule:

XX X

X X* ( )=

+2 12 1

2 .

Example: AA A

A A12 1

1 2

21 1* ( )=

+ =

AA A

A A22 1

2 2

21 1* ( )=

+ = +

B B B B B1

2 1

1 2

21 1

*

( )= + =

BB B

B B22 1

2 2

21 1* ( )=

+ = +

If one considers only the attained signs, then after the coordinate transformation one at-

tains the following design matrix for the two-factor design with two levels, instead of the

above scheme.

No. A B y

1 - - y1

2 + - y2

3 - + y3

4 + + y4

The second row corresponds accordingly to an experiment in which the factor A is set on

the upper level (+), the factor B on the lower level (-). Instead of using the form A1, A2 for

the settings of factorA one frequently uses A and A+ .

In the column y the results are y y1 4, . . . , of the four experiment rows. They allow being

represented in the following form.


49/104

- 49 -

This form of representation is also applicable, when one (or several) of the investigated

factors is not a quantitative adjustable variable, but instead a qualitative variable withfixed levels (e.g. material 1 - material 2). Naturally, an interpolation of intermediate values

is not reasonable in this case.

The results of three influence factors can be graphically represented by expanding Figure

7.1.10, into the form of a cubical. Each corner point thus corresponds to a combination of

levels of the factors A, B and C. When dealing with more than three factors, only two or

three-dimensional projections of an n-dimensional experimental space can be repre-sented.

Figure7.1.10

Figure7.1.11


50/104


51/104

- 51 -

Fig. 7.1.13

Fig. 7.1.14

Fig.7.1.15


52/104

- 52 -

These representations show clearly the principle appearance of a surface described by amultilinear form. The linearity with respect to both coordinates is obvious. In addition, itis seen that the minimum or maximum of every considered straight line respectively lieson the boundary of the experimental space.

7.2 Calculating the Effects

The effect of a factor gives the change of the target characteristic y, when a change takesplace from - level to + level, as an average over the settings of all the other factors. Natu-rally, the effect depends upon the explicit choice of the levels.

A graph of the effects, for the example of the two-factor design, is shown in Fig 7.2.1.

As long as the factors behave in an additive manner, both lines are parallel (see Figure7.1.11). If, on the contrary, the effect of a factor depends upon the setting (level) of an-

other, then an interaction of these factors exists, since they do not behave in an additivemanner.

The evaluation matrix of the two-factor design contains a columnAB for the interaction ofthese factors in addition to the columns for the factorsA andB.

No. A B AB y

1 - - + y1

2 + - - y2

3 - + -y

3

4 + + + y4

Fig. 7.2.1


53/104

- 53 -

The effect of factor X is calculated as a difference from the mean value of all y, resulting

when Xhas the + level and the mean value of all y, resulting when Xhas the level -. Thiscalculation rule is analogous for interactions and may be used generally for orthogonal

designs with m factors.

For this example the following is valid:

Effect Ay A y A y y y y

m m( )( ) ( )

= =+

++

2 2 2 21 1

2 4 1 3

Effect By B y B y y y y

m m( )( ) ( )

= =+

++

2 2 2 2

1 1

3 4 1 2

Fig. 7.2.2

Fig. 7.2.3


54/104

- 54 -

Effect AB y AB y AB y y y y

m m( )( ) ( )

= =+

++

2 2 2 21 1

1 4 2 3.

Here, the designation of the factor levels with + and - as opposed to the notation 1 and 2,

that is frequently used, proves advantageous, since the signs of y i on the right side of

these equations can directly be read for A, B and AB from the evaluation matrix. Further-

more, the column AB of the evaluation matrix can be determined character-wise as the

product of the columnsA and B (( ) ( ) = +1 1 1).

When dealing with fractional factorial designs, confounding of factors with interactions

can occur. The effects of confounded quantities can then no longer be calculated sepa-

rately.

Hint:

Calculation of mean effects is given here only as a matter of completeness. By using the

Figures 7.1.6 - 7.1.9, one can easily see that if a stronger interaction AB exists, the mean

effect of both factors A and B can become zero, although each factor exhibits great total

effects.


55/104

- 55 -

7.3 Regression Analysis

From the factors effects, the coefficients of the multilinear form (regression polynomial)may be calculated by using the coordinate transformation which transforms the settingvalues of factors into the coded form, + level, - level. The searched coefficients corre-

spond to half of the effects.

Consider, as an example, the function: y x x x x= + +3 4 2 51 2 1 2 .

The four experiments with the settings

A = 5 A+ = 10B = 6 B+ = 12

would accordingly deliver the following results if experimental noise remained unconsid-ered:

y1 3 4 5 2 6 5 5 6 161= + + =

y2 3 4 10 2 6 5 10 6 331= + + =

y3 3 4 5 2 12 5 5 12 299= + + =

y4 3 4 10 2 12 5 10 12 619= + + = .

We now proceed as though the above initial polynomial was unknown and try to derive thecoefficients from the experimental data (see 7.2).

Effect A y y y y

( ) =+

+

=2 4 1 3

2 2245

Effect B y y y y

( ) =+

+

=3 4 1 2

2 2213

Effect AB y y y y

( ) =+

+

=1 4 2 3

2 275

Constant term y y y y

=+ + +

=1 2 3 4

43525.

If one now substitutes half of the effects as coefficients into the polynomial (model)

y a a x a x a x x= + + +0 1 1 2 2 12 1 2

and considers the coordinate transformation (see Section 7.1)

XX X

X X* ( )

= +

21

2 12

,


56/104

- 56 -

then this results

y A

B

A B

= +

+

+

+

+

+

+

3525245

2

2

10 510 1

2132

212 6

12 1

75

2

2

10 510 1

2

12 612 1

. ( )

( )

( ) ( )

and after solving this expression:

y A B AB= + +3 4 2 5 .

It is therefore possible to calculate the coefficients of the regression polynomial from the

results of the experiment which was chosen as a formulation model for the experimentaldesign.

Therefore, it is possible to determine interpolated values within the experimental space. If

one or several additional experiments are conducted in the center of the experimental

space (e.g. rectangle in Figure 7.1.3) (design with center point), it is possible to get infor-

mation about the adequacy of the model used as a basis, by comparing the results for this

point with the corresponding interpolated values, i.e. about the quality of the fit .

If greater deviations occur between the results of additional experiments and the values

interpolated with the help of the regression polynomial, then this shows that the chosen

model describes the reality insufficiently, if not fully wrong.

Here the whole crux of DOE with orthogonal arrays shows itself: right results can onlybe attained with the right model.

7.4 Factorial Designs

7.4.1 Design Matrix

In Section 7.1, the creation of a simple scheme for a 2 2 -design is shown by considering acoordinate transformation:

No. A B

1 - -

2 + -

3 - +

4 + +

Strictly speaking, one can interpret the first two rows of the design as a one-factor-at-a-time experiment, where the factor A is set to the lower (-) or upper (+) level, while thefactorB is on the - level.


57/104

- 57 -

In the rows 3 and 4, A is set on the two - and + levels, thoughB is held fixed on + level.

This scheme is the basis for a general rule of factorial designs that is made clear by means

of the following representation.

25-Design

24-Design

2 3-Design

22-Design

Experiment A B C D E

1

23

4

-

+-

+

-

-+

+

-

--

-

-

--

-

-

--

-

5

6

7

8

-

+

-

+

-

-

+

+

+

+

+

+

-

-

-

-

-

-

-

-

9

10

11

12

1314

15

16

-

+

-

+

-+

-

+

-

-

+

+

--

+

+

-

-

-

-

++

+

+

+

+

+

+

++

+

+

-

-

-

-

--

-

-

17

18

19

20

21

22

23

2425

26

27

28

29

30

31

32

-

+

-

+

-

+

-

+-

+

-

+

-

+

-

+

-

-

+

+

-

-

+

+-

-

+

+

-

-

+

+

-

-

-

-

+

+

+

+-

-

-

-

+

+

+

+

-

-

-

-

-

-

-

-+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

Scheme for illustrating the general rule for factorial designs

(see [1] p. 53).


58/104


59/104

- 59 -

Remark:

In this model, the designations x1, x 2 and x 3 will be used instead of the names A, B and C

for the three factors. Correspondingly, e.g. a12 is the coefficient of the interactionAB.

The columns of the evaluation matrix assigned to the interactions can be calculated, char-acter-wise as products of the columns of related factors (( ) ( ) = +1 1 1). For example,the column for the interactionACresults when one multiplies the columns of the factors A

and Cwith each other.

7.4.3 Confounding

If all 8 experiments of a 23- design were conducted, the effects and thus the coefficients of

the model for all factors and interactions can be calculated separately. Mathematically

considered, the calculation of the coefficients means solving a system of 8 equations with8 unknowns (see model, design and evaluation matrix).

y a a a a a a a a1 0 1 2 12 3 13 23 123= + + + y a a a a a a a a2 0 1 2 12 3 13 23 123= + + +y a a a a a a a a3 0 1 2 12 3 13 23 123= + + +y a a a a a a a a4 0 1 2 12 3 13 23 123= + + + y a a a a a a a a5 0 1 2 12 3 13 23 123= + + +y a a a a a a a a6 0 1 2 12 3 13 23 123= + + + y a a a a a a a a7 0 1 2 12 3 13 23 123= + + +

y a a a a a a a a8 0 1 2 12 3 13 23 123= + + + + + + +

The coefficients of this system of equations are easy to calculate due to the simple struc-ture. For example, the constant a 0 can be determined by adding all rows and dividing the

sum by 8 (mean of all results yi, see 7.3, Regression Analysis).

Owing to the balanced nature of the system of equations in front of every coefficient aplus sign appears as frequently as a minus sign by addition, all members on the right-hand side, except for a 0 cancel each other out. In order to calculate a1 the rows 1, 3, 5 and

7 are multiplied by -1 respectively and then all 8 rows are added together. Again, apartfrom a1 all elements on the right-hand side cancel each other out. The calculation for all

the remaining coefficients is analogous. If one compares this procedure with the equationsin Section 7.2, it will be evident that the calculation of the coefficients of the system ofequations and the calculation of half effects of the factors are identical processes.

Because a plus sign appears in front of a 0 in every row of the equation system, the

evaluation matrix is often given a precedent column with exclusively plus signs, which isdesignated with I (for identity) or 0.


60/104

- 60 -

If less than 8 experiments are conducted, then it is clear that it is no longer possible to

determine the coefficients separately . The so-called confounding occurs. This is explained

by means of an example of the 23 1

fractional factorial design. Where, three factors shall

be investigated, only 4 experiments are conducted.

Design matrix of the 23 1

design (see [9]):

A B C

1 - - +

2 + - -

3 - + -

4 + + +

We now consider how the interaction columns AB, AC and BC of the related evaluation

matrix look. They can be calculated as a product of the corresponding columns of the de-

sign matrix.

If one compares these columns with the columns of the design matrix, then it is evident

that AB with C, ACwith B and BCwith A are equivalent. Thus, the columns A and BC, B

and AC, Cand AB in the evaluation matrix are not distinguishable at all. One reckons that

the factor A with the interactionBC, the factor B with the interaction ACand the factor C

with the interactionAB are confounded.

A

BC

B

AC

C

AB

1 - - +

2 + - -

3 - + -

4 + + +

Evaluation matrix of the 2 3 1 fractional factorial design

BC

-

+

-

+

AB

+

-

-

+

AC

-

-

+

+


61/104

- 61 -

The occurrence of confounded factors will still be somewhat clearer if one directly con-

siders the incomplete system of equations corresponding to the 23 1

design:

y a a a a a a a a1 0 1 2 12 3 13 23 123= + + +y a a a a a a a a2 0 1 2 12 3 13 23 123= + + +

y a a a a a a a a3 0 1 2 12 3 13 23 123= + + +y a a a a a a a a4 0 1 2 12 3 13 23 123= + + + + + + + .

If, in this case, the first and third equation are multiplied by -1 and subsequently all four

equations are added together, then all elements on the right-hand side apart from a1 and

a 23 will cancel out. They are the coefficients assigned to the factor A or to the interaction

BC. Therefore A and BC are confounded. The remaining confounded factors are analo-

gously.

Remark:

Strictly considered, one should list an extra column in the evaluation matrix, for entering

the identity (column for the constant term a 0) and the three-factor interaction ABC. It is

neglected as a matter of simplicity.

It is therefore not possible in the preceding example, for instance, to calculate the effect of

factorA separate from the effect of interaction BC.

Here, a rather strange logic can be used now, which is found in most of the literature on

the subject of DOE. The effect of factor A can be determined if one assumes that the inter-

action BCdoesnt exist. This means that one must be sure that the factors B and Cbehavepurely additive. If this is clear, then it is sufficient to investigate B and C with the one-factor-at-a time experiment.

In textbooks on DOE, it is often assumed that three-factor and higher interactions are not

probable and as such this fact becomes exploited in order to formulate fractional factorialdesigns of the type 2 1m .


62/104

- 62 -

We investigate the evaluation matrix of the 24 1

design as an example.

A B

AB

CD

C

AC

BD

BC

AD

D

ABC

1 - - + - + + -

2 + - - - - + +

3 - + - - + - +

4 + + + - - - -

5 - - + + - - +

6 + - - + + - -

7 - + - + - + -

8 + + + + + + +

Instead of the 2 164 = experiments which would be necessary for investigating four fac-

tors on two levels each, in correspondence with the full factorial design, here only 8 ex-

periments will be conducted. If one determines the column of the interaction ABC, then

one sees that this corresponds with the column of factor D. Therefore, factor D is con-

founded with a three-factor interaction. When applying this design it is assumed that thethree-factor interaction ABC does not exist. When this assumption is false then a false

effect results forD.

In addition, two-factor interaction effects cannot be calculated separately. If, for instance,

a higher significance of the third column occurs during the column-wise evaluation (facto-

rial analysis of variance ), then it is not determinable whether this is due to the interaction

AB or CD. Otherwise AB and CD can compensate themselves (equivalent, counteracting

effects). This is not recognisable by the evaluation. The reduction in the extent of experi-

mentation is therefore a trade-off with the risk of a faulty result as well as loss of informa-

tion.

This statement is especially valid for a fractional factorial design with a reduction of the

experimental extent by more than factor 0.5 (Taguchi method, see [10]).

The rows 1-8 of the 24 1

-design correspond to the rows 1, 10, 11, 4, 13, 6, 7, 16 of the

complete 24-design (see [9], Appendix). An experiment on the basis of the 2

4 1-design

still allows being rescued, if necessary, by addition of the missing (complementary)eight rows. The c

Curso Bosch Diseño de experimentos

Documents