56 CHAPTER 3 DESIGN OF EXPERIMENTS
56
CHAPTER 3
DESIGN
OF
EXPERIMENTS
57
CHAPTER 3
DESIGN OF EXPERIMENTS
3.1 INTRODUCTION
Design of Experiment (DoE) is a useful method in identifying the significant parameters and
in studying the possible effect of the variables during the machining trials. This method also
can developed experiment between a ranges from uncontrollable factors, which will be
introduced randomly to carefully controlled parameters. The factors must be either
quantitative or qualitative. The range of values for quantitative factors must be decided on
how they are going to be measured and the level at which they will be controlled during the
trials. Meanwhile, the qualitative factors are parameters that will be determined discretely.
The advantages of design of experiments are as follows:
Numbers of trials is significantly reduced.
Important decision variables which control and improve the performance of the
product or the process can be identified.
Optimal setting of the parameters can be found out.
Qualitative estimation of parameters can be made.
Experimental error can be estimated.
Inference regarding the effect of parameters on the characteristics of the process
can be made.
Experiments are performed by investigators in virtually all fields of inquiry, usually to
discover something about a particular process or system. Generally experiments is test or
series of test in which purposeful changes are made to the input variable of the process or
system so that we may observe and identify the reason for changes that may be observed in
the output response and analyzing the result data so that valid and objective conclusions
are obtained.
58
The objective of the experiments may include the following:
I. Determine which variable are most influential on the performance measures or
response.
II. Determine where to set the influential controllable process variables so that response
is almost always near the desired nominal value.
III. Determine where to set the controllable process variables so that variability in
response is small.
IV. Determine where to set the influential process variables so that the effects of the
uncontrollable variables are minimized.
Statistical design of experiments refers to the process of planning the experiment so that
appropriate data that can be analyzed by statistical methods will be collected, resulting in
valid and objective conclusions.
The statistical approach to experimental design is necessary if we wish to draw a
meaningful conclusion from the data [62].
Figure3.1: General model of a process or system [63]
59
3.2 APPROCHE TO DESIGN OF EXPERIMENT
I. Multiple Regression Analysis
II. Mathematical Modeling
III. Orthogonal Array
IV. ANOVA
3.2.1 MULTIPLE REGRESSION ANALYSIS
Main purpose of multiple regression analysis (first used by Pearson in 1908) is to learn about
the several independent or predictor variables and dependable variable [64].
Multiple regression is a statistical technique that allows to predict someone’s score on one
variable on the basis of their scores on several other variables. Multiple linear regression
examines the linear relationships between one continuous response and two or more
predictors. The independent variable that is used to predict values of the dependent, or
response variable in a regression analysis, If the number of predictors is large, then before
fitting a regression model with all the predictors, use stepwise techniques to screen out
predictors not associated with the responses [65].
A current trend in statistics is to emphasize the similarity between multiple regression and
ANOVA, and between correlation and the t-test. All of these statistical techniques are
basically seeking to do the same thing – explain the variance in the level of one variable on
the basis of the level of one or more other variables. These other variables might be
manipulated directly in the case of controlled experiments, or be observed in the case of
surveys or observational studies, but the underlying principle is the same. Thus, although we
have given separate chapters to each of these procedures they are fundamentally all the
same procedure. This underlying single approach is called the General Linear Model [64].
Multiple linear regression attempts to model the relationship between two or more
explanatory variables and a response variable by fitting a linear equation to observed data
[66].
Every value of the independent variable x is associated with a value of the dependent
variable y.
60
The regression line for p explanatory variables x1, x2,. ..., xp is defined to be
𝝁𝒚 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏 + 𝜷𝟐𝑿𝟐 + ⋯ + 𝜷𝒑𝑿𝒑
The mean response 𝝁𝒚, described by the regression line changes with the explanatory
variables. The observed values for ‘y’ vary about their means 𝝁𝒚 and are assumed to have
the same standard deviation 𝜎. The fitted values b0, b1,..., bp estimate the
parameters 𝜷𝒐, 𝜷𝟏, …𝜷𝒑 of the regression line. Since the observed values for ‘y’ vary about
their means 𝝁𝒚, the multiple regression model includes a term for this variation.
The regression model is expressed as Data = Fit + Residual [66], where the
"Fit" term represents the expression 𝜷𝟎 + 𝜷𝟏𝑿𝟏 + 𝜷𝟐𝑿𝟐 + ⋯ + 𝜷𝒑𝑿𝒑 .
The "Residual" term represents the deviations of the observed values y from their
means 𝝁𝒚, which is normally distributed with mean 0 and variance 𝝈. The notation for the
model deviations is 𝜺 .
Formally, the model for multiple linear regression, given ‘n’ observations, is
𝒚𝒊 = 𝜷𝟎 + 𝜷𝟏𝑿𝒊𝟏 + 𝜷𝟐𝑿𝒊𝟐 + ⋯ + 𝜷𝒑𝑿𝒊𝒑 + 𝜺𝒊 𝒇𝒐𝒓 𝒊 = 𝟏, 𝟐, …𝒏
In the least-squares model, the best-fitting line for the observed data is calculated by
minimizing the sum of the squares of the vertical deviations from each data point to the
line. Since the deviations are first squared, then summed, there are no cancellations
between positive and negative values. The least-squares estimates b0, b1, ... bp are usually
computed by statistical software.
The values fit by the equation b0 + b1xi1 + ... + bpxip are denoted iy , and the residuals ei are
equal to iiy y , the difference between the observed and fitted values.
The sum of the residuals is equal to zero.
61
The variance 𝝈𝟐 may be estimated by 𝒔𝟐 = 𝒆𝒊
𝟐
𝒏−𝒑−𝟏 also known as the mean-squared error
(or MSE).
The estimate of the standard error s is the square root of the MSE [66].
3.2.2 MATHEMATICAL MODELING
Once experimental design becomes final, the next step is to fit the given data in
mathematical model using regression analysis.
A mathematical model is a description of a system using mathematical concepts and
language. The process of developing a mathematical model is termed mathematical
modeling. Mathematical models are used in engineers, statisticians, research analysts
and economists use mathematical models most extensively. In general, mathematical
models may include logical models, as far as logic is taken as a part of mathematics. In many
cases, the quality of a scientific field depends on how well the mathematical models
developed on the theoretical side agree with results of repeatable experiments. Lack of
agreement between theoretical mathematical models and experimental measurements
often leads to important advances as better theories are developed [67].
3.2.3 ORTHOGONAL ARRAY
Orthogonal array (OA) represents a simplified method of putting together an experiment.
Taguhi’s orthogonal arrays are selected on the basis of the condition that the total degree of
freedom of selected OA must be greater than or equal to the total degree of freedom
required for the experiment [68].
An orthogonal array provides a set of well balance (minimum experimental runs)
experiments and used to design experiments and describe trial condition. Experiments
design using orthogonal arrays yield result that are more reproducible.
Standard notation for orthogonal arrays [69].
LN (XY)
Where,
N = Number of experiments, X = Number of levels, Y = Number of factors
62
For example:
2- Level Arrays: L4 (27), L12 (211), L16 (215)
3- Level Arrays: L9 (34), L18 (2137), L27 (313)
4- Level Arrays: L16 (45), L32 (2148)
Example: L9 (34)
9 = Number of experiments, 3 = Number of levels, 4 = Number of factors
Taguchi’s orthogonal arrays are experimental designs that usually require only a fraction of
full factorial combination. The columns of arrays are balanced and orthogonal i.e, in each
pair of columns, all factors combinations occurs the same number of times. Orthogonal
designs allow estimating the effect of each factor on the response independently of all other
factors.
There are 18 basic types of standard Orthogonal array (OA) in the Taguchi parameter design
[]. Since four factors were studied in the present work, three levels of each were considered.
Therefore an L9 (34) Orthogonal array has been selected in the present study, for multi-
performance optimisation shown in Table 3.1.
63
Table 3.1 – Shows L9 (34) orthogonal array
Factors Runs
A
B
C
D
1
1
1
1
1
2
1
2
2
2
3
1
3
3
3
4
2
1
2
3
5
2
2
3
1
6
2
3
1
2
7
3
1
3
2
8
3
2
1
3
9
3
3
2
1
3.2.4 ANOVA
The purpose of the ANOVA is to investigate which wire EDM process parameters
significantly affect the quality characteristics. This is accomplished by separating the total
variability of the S/N ratios, which is measured by the sum of the squared deviations from
the total mean of S/N ratio, into contributions by each Wire EDM process parameter and
error. The percentage contribution by each of the process parameter in the total sum of
squared deviations can be used to evaluate the importance of the process parameter
change on the quality characteristic. In addition the F-test method can also be used to
determine which Wire EDM process parameter has a significant effect on the quality
characteristic when the F value is large. The fundamental technique is a partitioning of the
total sum of squares S into components related to the effects used in the model. For
64
example, we show the model for a simplified ANOVA with one type of treatment at different
levels.
𝑺𝑻𝒐𝒕𝒂𝒍 = 𝑺𝑬𝒓𝒐𝒐𝒓 + 𝑺𝑻𝒓𝒆𝒂𝒕𝒎𝒆𝒏𝒕𝒔
So, the number of degrees of freedom f can be partitioned in a similar way and specifies
the chi- squared distribution which describes the associated sums of squares
𝒇𝑻𝒐𝒕𝒂𝒍 = 𝒇𝑬𝒓𝒐𝒐𝒓 + 𝒇𝑻𝒓𝒆𝒂𝒕𝒎𝒆𝒏𝒕𝒔
The F-test is used for comparisons of the components of the total deviation. For example, in
one-way or single-factor ANOVA, statistical significance is tested for by comparing the F test
statistic [70].
ANOVA for Multiple Linear Regression:
Multiple linear regression tries to fit a regression line for a response variable by using more
than one explanatory variable. ANOVA calculations for multiple regression are nearly
identical to the calculations for simple linear regression, except that the degrees of freedom
are adjusted to reflect the number of explanatory variables included in the model [71].
3.3 TEST FOR SIGNIFICANCE OF REGRESSION [72]
The test for significance is a test to determine if there is a linear relationship between the
response variable y and a subset of the regressor variables x1, x2,…….., xk.
Once the co-efficient have been estimated and tested for their significance, the estimated
regression equation is then tested for the adequacy of fit.
The appropriate hypotheses are
𝑯𝑶: 𝜷𝟏 = 𝜷𝟐 = ⋯ = 𝜷𝒌 = 𝟎
𝑯𝟏: 𝜷𝒋 ≠ 𝟎 𝑓𝑜𝑟 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑗
Rejection of 𝑯𝑶 in equation indicates that at least one of the regressor variables x1, x2,……..,
xk. Contribute significantly to the model. The test procedure involves an analysis of variance
partitioning of the total sum of squares due to the model (or regression) and a sum of
squares due to residual (or error).
65
𝑺𝑺𝑻 = 𝑺𝑺𝑹 + 𝑺𝑺𝑬
Now, if null hypothesis 𝑯𝑶: 𝜷𝟏 = 𝜷𝟐 = ⋯ = 𝜷𝒌 = 𝟎 is true, then 𝑺𝑺𝑹/𝝈𝟐 is distributed as
𝑿𝒌𝟐, where the number of degree of freedom for 𝑿𝟐 are equal to the number of regressor
variable in the model.
Computational formula for error sum of square 𝑺𝑺𝑬
𝑺𝑺𝑬 = (𝒚𝒊 − iy𝒏𝒊=𝟏 )2
𝑺𝑺𝑬 = 𝒚,𝒚 − ' 𝑿′𝒚
The regression sum of square
𝑺𝑺𝑹 = ' 𝑿′𝒚 − 𝒚𝒊
𝒏𝒊=𝟏 𝟐
𝒏
And the total sum of square is
𝑺𝑺𝑻 = 𝒚′𝒚 − 𝒚𝒊
𝒏𝒊=𝟏 𝟐
𝒏
Table 3.2 Shows Analysis of Variance (ANOVA) for significance of Regression in Multiple
Regression
Source of
variation
Degree of freedom
(df)
Sum of square
(SS)
Mean square
(ms)
FO
Due to regression
k
SSR
MSR = SSR/k
MSR/MSE
Due to residual
(error)
N – k - 1
SSE
MSE = SSE/(n –k – 1)
Total
N - 1
SST
66
Apply F- test to test the adequacy of fit as below
𝑭 = 𝑴𝒆𝒂𝒏 𝒔𝒒𝒖𝒂𝒓𝒆 𝒐𝒇 𝒓𝒆𝒈𝒓𝒆𝒔𝒔𝒊𝒐𝒏
𝑴𝒆𝒂𝒏 𝒔𝒒𝒖𝒂𝒓𝒆 𝒐𝒇 𝒓𝒆𝒔𝒊𝒅𝒖𝒂𝒍=
𝑴𝑺𝑹
𝑴𝑺𝑬
The estimated regression equation fits the data adequately if P<0.05 at 95% confidence
level or if P< 0.99 at 99% confidence level.
The coefficient of multiple determinations R2
R squared: A measure of the amount of reduction in the variability of y obtained by using
regressor variables x1, x2,…….., xk in the model.
𝑅2 =𝑆𝑆𝑅
𝑆𝑆𝑇= 1 −
𝑆𝑆𝐸
𝑆𝑆𝑇
Adjusted R squared: A measure of the amount of variation around the mean explained by
the model, adjusted for the number of terms in the model. The adjusted R-squared
decreases as the number of forms in the model increase, if those additional terms do not
add value to the model.
𝑅2 = 𝑆𝑆𝐸/(𝑛 − 𝑝)
𝑆𝑆𝑇/(𝑛 − 1)= 1 −
𝑛 − 1
𝑛 − 𝑝 (1 − 𝑅2)
PRESS: The prediction sum of squares (PRESS) provides a useful residual scaling.
𝑃𝑅𝐸𝑆𝑆 = 𝑒𝑖
1 − ℎ𝑖𝑖
2𝑛
𝑖=1
Pred R- squared: A measure of the amount of variation in new data explained by the model.
𝑅𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛2 = 1 −
𝑃𝑅𝐸𝑆𝑆
𝑆𝑦𝑦
67
The predicted R2 and adj- R2 should be within 0.20 of each other otherwise there may be
problem with either data or the model. In addition to the adequacy test mention above the
validity of developed models checked by drawing scatter diagram which shows the
relationship between the observed and predicted values of the weld bead dimension.
3.4 TESTING FOR LACK OF FIT [73]
A procedure for checking the adequacy of the model is called as lack of fit test of fitted
model. In general, to say that the fitted model is inadequate or it is lacking in fit is to imply
the proposed model does not contain sufficient number of terms. The inadequacy of model
can be due to
1. Factors (other than these in proposed model) that are omitted the proposed model
but which affect response.
2. The omission of high order terms involving the factor in proposed model which are
needed to adequately explain the behaviour of the response.
Suppose that we have n observation such that
𝒚𝟏𝟏,𝒚𝟏𝟐, 𝒚𝟏𝟑, ……… . , 𝒚𝒏 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑎𝑡 𝑥1
𝒚𝒎𝟏,𝒚𝒎𝟐, 𝒚𝒎𝟑, ……… . , 𝒚𝒎𝒏 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑎𝑡 𝑥𝑚
We can see that there are m distinct levels of x.
The pure error sum of squares
𝑺𝑺𝑷𝑬 =
𝒎
𝒊=𝟏
𝒚𝒊𝒋 − iy 𝟐
𝒏𝒊
𝒋=𝒊
Degree of freedom associated with 𝑺𝑺𝑷𝑬 is
= n – m
The sum of square for lack of fit
𝑺𝑺𝑳𝑶𝑭 = 𝑺𝑺𝑷𝑬 − 𝑺𝑺𝑬
𝑺𝑺𝑳𝑶𝑭 = 𝒏𝒊 iy − iy
𝒎
𝒊=𝟏
𝟐
68
Degree of freedom associated with 𝑺𝑺𝑳𝑶𝑭 is m – p because there are m levels of x, and p
degree of freedom are lost because p parameters must be estimated for the model.
The test statistic for lack of fit is
𝑭𝟎 =𝑺𝑺𝑳𝑶𝑭/(𝒎 − 𝒑)
𝑺𝑺𝑷𝑬/(𝒏− 𝒎)=
𝑴𝑺𝑳𝑶𝑭
𝑴𝑺𝑷𝑬
The present study has utilized the Multiple linear regression analysis to predict model and
find the optimal parameter settings.