Statistics and Acceptance Sampling in TQM

UNIT 5 APPLICATION OF STATISTICS IN QUALITY ENGINEERING

Structure

5.1 Introduction

Objectives

5.2 Importance of Statistics

5.3 Measures of Central Tendency and Dispersion

5.4 Confidence Interval

5.5 Testing of Hypothesis

5.5.1 Test of Significance of Large Samples 5.5.2 Test of Significance of Small Samples

5.6 Probability Theory

5.7 Probability Distributions

5.7.1 The Normal Distribution 5.7.2 The Exponential Distribution 5.7.3 The Weibull Distribution

5.8 Acceptance Sampling

5.8.1 Acceptance Sampling by Attributes 5.8.2 Acceptance Sampling by Variables

5.9 Summary

5.1 0 Key Words

5.1 1 Answers to SAQs

5.1 INTRODUCTION

The tools for quality attainment can be broadly classified in the following two main categories - management tools and statistical tools. In a TQM culture, both tools should be used by anybody, from the top manager to shop floor engineer. However, the first category will be more useful to manager, whereas the second will be more useful to the people concerned with the technical side of the process. A number of management tools have been described in Block 1. Notable among them are principles and management rules advocated by Deming. Statistical tools include Statistical Process Control and Taguchi methods.

A fundamental need of a quality system is the reduction of variation. One of the early workers to use statistical tools is Walter Shewhart. Shewhart's work in 1920s was initially focused on the reduction of variability in the performance of telephones at Bell Laboratories. In order to study the trend in variability with time, Shewhart calculated upper and lower limits, which are also called control limits. The departure of a trend outside these limits signaled the need for corrective action. Thus, it became possible to predict the behaviour of a process. In 1935, L.H.C. Tippet introduced the concepts of sampling techniques for inspection of goods in bulk with the introduction of sampling plans. US Military in 1940 was the first to adopt the statistical tools proposed by Shewhart and Tippet. The statistical quality control movement gathered momentum and found acceptance in industry also. In this unit, basic concept of statistics will be presented. The application of statistics in quality engineering will be highlighted.

Quality Tools - Statistical Objectives

After studying this unit, you should be ablq to

know the conaepts of statistics,

understand the importance of Hypothesis Testing,

describe the various theorems on probability and probability distribution, and

explain acceptance sampling and construction of OC curve.

5.2 IMPORTANCE OF STATISTICS

Statistics is the science of collecting, organizing, ahd interpreting numerical facts called data. Statistics is largely associated with the bits of data that appear in news reports: average rainfall, percentage of population below poverty lines, expected life of a member of a particular society etc. In the advertisements also, often the data are used to show the superiority of the advertiser's product. The usefulness of statistics goes far beyond these everyday examples. The study and collection of data are important in the work of many professions. Each month, for example, government statistical offices release the latest numerical information on unemployment and inflation. Doctors must understand the origin and trustworthiness of the data that appear in medical journals if they are ta offer their patients the most effective treatment. Politicians rely on data from polls of public opinion. Market research data that reveal consumer tastes influence business decisions. Farmers study data from field trials of new crop varieties. Engineers gather data on the quality and reliability of manufactured products.

The study of statistics is essential to a sound education. It helps to read data critically and with comprehensiorl. statistics helps produce data that provide clear answers to important questions and hdps reach valid conclusions. Statistics teaches how to gather, organize, andanalyze data, and then to infer the underlying reality from these data. Persons in industry and government make decisions that are increasingly dependent upon the collection and interpretation of data.

Statistics is also a theory about decision-~.r -i 3: 1 3 d =. ,\. to make "judgment under uncertainty". Statistics provides us with : : * . - I ,.-+.tsnds (-;nFi~ernatical procedures) for organizing, summarizing, presenting an( : 7;~73:lart.allg is ,' r . ra;ir~a 3r data. These methods provide standardized techniques for communicating and iir;cy-fn::ing information. The statistical tools can be interpreted as vocabulary and symbol for communicating data. Statistics could be descriptive or inferential. Following examples depict descriptive statistics :

Average rainfall in Shillong last year.

Number of neiw phone connections in 2004.

Inferential statistics are used to draw inferences about a population from a sample. One example is as follows. Consider an experiment in which two groups of 10 students are asked to perform a task. The first group had a nonnal night sleep whereas the second group was deprived of sleep for 24 hours. The second group got 30 marks lower than the first group. Then, inferential statistics helps us to answer the following questions :

(i) Is the difference real or could it be due to chance?

(ii) How much larger could the real difference be than the 30 marks found in the sample?

Both descriptive and inferential statistics find application in quality engineering.

In order to control the quality, manager often obtains the information through sampling. While studying statistics we often encounter the terms population and sample. .

Population is a set of all individuals in a study, whereas a sample is a set of all Application of Statistics

individuals selected from a population. A sample is intended to represent the population in Quality Engineering

and should be identified in terms of the population from which it was selected. One could collect data from the whole population for study purpose, but often it is not a feasible idea. For example, inspection of all items may be a very costly affair. If destructive testing is required, then except conducting the test on a sample and inferring the results for population, there is no other way. The relationship between the population and sample is depicted in Figure 5.1.

THE POPULATION All the individuals of interest

The result from sample are generalized to the

population

The sample is selected from the population

THE SAMPLE The individuals selected to participate in

the research study J Figure 5.1 : Relationship between the Population and Sample

Parameter Versus Statistic

A parameter is a value (usually numerical) that describes a population. It may be derived from a single measurement or a set of measurements from the population. For example, population mean is a parameter. Statistic describes a sample and may be derived from a single measurement or a set of measurements from the sample. For example, sample mean is a statistic.

5.3 MEASURES OF CENTRAL TENDENCY AND DISPERSION

In quality engineering, we often have to deal with data. Data has to be interpreted properly. To know the behaviour of particular data sets, there are certain statistical measures. Two important measures are Central Tendency and Dispersion of Data. There are various measures of central tendency and dispersion of data, the most commonly used being as follows.

Mode

The mode is the most commonly occurring observation. It is possible for more than one mode to exist in the population.

The mid-range is the mid-point between the highest and the lowest observation.

Median

The median is the middle observation when all observations are arranged in order of magnitude. In other words, this is the value for which it can be said that half of the observations lie above the value and other half lie below. The median is less sensitive to extreme scores than the mean and this makes it a better measure than the mean for highly skewed distributions. The median income is usually more informative than the mean income, for example.

Quality Tools - Statistical Arithmetic Mean

The arithmetic mean, also referred to as mean, is the arithmetic average of all observations with respect to a sample of sample size n represented by XI, X2, X3, . . . , Xn. The sample mean is given by

The sample mean provides an excellent estimate of the population mean except when the p~pulation is highly skewed, that is highly asymmetric.

Geometric Mean

It is the n" root of the product of n given observations. The geometric mean g is given by

g = (XI X2 X3 . . . Xn)ltn . . . (5.2)

This type of mean is suitable if the observations increase exponentially or in geometric progression. For example, for data 1,4,16,64,256, the arithmetic mean is 68.2, whereas the geometric mean is 16. We observe that arithmetic mean does not fall in the center. On the other hand, geometric mean is exactly at the center for this data. Hence, for this set of data, the geometric mean is an appropriate measure of central tendency.

Harmonic Mean

The harmonic mean H of n numbers X, (where i = 1, . . . , n) is

Sometimes the harmonic mean gives a better measure of central tendency. For example, consider this problem. A truck goes from one city to the other city at a speed of 40 km/h and returns at a speed of 60 kmlh. It is desired to find out the average speed of the truck. This problem can be solved as follows :

Let the distance between the citier be S. Time taken t t in going from one city to the other is

Time taken to return, S

t2 =- 60

Total distance 2S Average speed = - --=

2 Total time tl + t2 (- -

This is the harmonic mean of the speed 40 kmlh and 60 km/h. The arithmetic mean is 50 kmh, which is an inferior measure to harmonic mean in this case.

In order to explain the various concepts in central tendency, another example is presented. The marks obtained by 10 students in an examination are as follows :

The values are arranged in ascending order as:

37, 38,39,41,42,43,43,44,46,47.

Values of various measures of central tendency are : . The mode is 43 because it appears twice. -

-

The highest and lowest marks are 47 and 37 respectively. Therefore the Application of Statistics

mid-range is : in Quality Engineering

There are two data lying in the middle, i.e. 42 and 43 because the data 37, 38,39,41 lies on left hand side of 42 and rest 43,44,46,47 lies on right hand side of 43. The median is given by the average of two middle values i.e. (42 + 43) I 2 = 42.5. If the total number of data is odd, then there will be only one middle value, which itself is equal to median.

The sample size (n) is 10 and sample arithmetic mean (1 ) is calculated as:

Geometric mean g is calculated as

Harmonic mean His calculated as

which gives H = 4 1.76.

Since the sample mean (42) is close to the median (42.5), therefore it can be inferred that the observations are distributed more or less symmetrically about the sample mean.

Measures of Dispersion

The measures of location do not bring out the entire characteristics of a dataset. For instance, two distributions may have equal means but they differ with respect to the scatter from the mean. In one distribution, most of the values may be very close to the mean while in the other distribution they may be widely distributed. Assume that the surface roughness of a cylindrical component is measured at four different points. In one piece, the surface roughness values at one point were found as#.97,2.0,2.03 and 2.04 pm. The arithmetic mean of these is 2.01 pm. Assume that for some other piece, surface roughness values were found as 1.75, 1.85,2.15,2.29 p.m. In this case, also the mean comes out to be 2.01 pm. However, in the second set there is a large variation among data. No doubt, the quality of second piece will be considered inferior to first piece, even though the mean roughness value of both the pieces is same. Thus, apart from the mean, dispersion of the data should also be seen. The most useful measure of dispersion is the standard deviation. In order to understand, the concepts of degrees of freedom and variance are to be understood.

Degrees of Freedom

The degrees of freedom are the number of observations that can be varied independently of each other without affecting the mean. The degree of freedom is denoted by dof and is calculated as dof = n - 1, where n is the sample size. In the case of marks obtained by titudents in an examination, there are 9 degrees of freedom.

Variance

The variance is the sum of squared deviations from the mean, divided by degrees of freedom. The sample variance is denoted by sZ.

Quality Tools - Statistical Here, X, is the i~ member in a sample and n is the sample size. This sample variance is an excellent estimate for the population variance denoted by 2. The sample variance in the above example (marks obtained by students in an examination) is

22 + (- 412 + 2* (- 312 + 1' (- 5)' + l2 + 52 + (- 112 + 0 88 S = - - - = 9.78

9 9

Standard Deviation

The standard deviation is defined as the square root of variance.

Therefore, in the above example the standard deviation is J978 = 3.127 . Range

The range is the difference between the highest and lowest value in a sample. The range in marks obtained by 10 students in the exam is 47 - 37 = 10.

5.4 CONFIDENCE INTERVAL

Sample statistic can bedused to estimate the population parameters. However, these statistics are only point estimators, in that they provide a single numerical estimate but no measure of its probable accuracy. Some idea of probable accuracy can be obtained by calculating interval estimators associated with a certain prescribed degree of confidence - the confidence intervals (CI). Confidence intervals can be constructed for any individual population parameter such as mean or variance, or even any combination of parameters from more than one population, such as difference of mean.

Suppose we wish to estimate an interval of a parameter (such as population mean) from the sample statistics. Choose a probability y close to 1 (for example y= 0.95). Then determine two quantities MI and M2 such that the probability that MI and M2 include the exact unknown value of the parameter is equal to y. We can never be 100% sure about the population mean based on observing one or few samples. Therefore, we use the probability in estimating the interval. The values MI and M2 are called lower and upper confidence limits for the parameter. The number y is called the confidence level and is often expressed as percentage. Commonly used confidence levels are 95% and 99%.

As an example, if we choose y = 95%, then we can expect that about 95% of the samples that we may obtain will yield confidence intervals that include the value of the parameter, whereas the remaining 5% do not. Hence, the statement "the confidence interval includes the parameter value" will be correct in about 19 out of 20 cases, while in the remaining case it will be false. The following method is adopted for finding out the confidence interval for the mean of the normal distribution:

Step 1

Choose a confidence level y.

Step 2

Determine the corresponding C from the following table

Table 5.1 : Confidence Level and Value of Statistic

i i Step 3

1 Compute the mean X of the sample.

Step 4

C 0 Compute k = --, where a is the standard deviation of the distribution. The

J;; confidence interval for the population mean X is

5.5 TESTING OF HYPOTHESIS

Many a time, we strongly believe some results to be true. But after taking a sample, we notice that data of one sample does not wholly support the result. The difference is due to (i) the original belief being wrong and (ii) the sample being slightly one sided. Tests are, therefore, needed to distinguish between the two possibilities. These tests tell about the likely possibilities and reveal whether or not the difference can be due to chance elements. If the difference is not due to chance elements, it is significant and, therefore, these tests are called test of significance. The whole procedure is known as Testing of Hypothesis.

A hypothesis is a statement supposed to be true till it is proved false. It may be based on previous experience or derived theoretically. First a statistician forms a research hypothesis. Then he derives a statement, which is opposite the research hypothesis (denoting as Ho). The approach here is to set up an assumption that there is no contradiction between the believed result and the sample result and the difference can be ascribed solely to chance. Such a hypothesis is called a null hypothesis (Ho). For example, if the mean of the weights of 100 students in a college is 60 kg then we want to test thd null hypothesis that the entire population (all college students) has a mean of 60 kg. This is written as Ho : X = 60. This statement in effect gives rise to following

alternative hypothesis : HI : X # 60, H, : X 2 60 and H, : X I 60 The next step is putting the null hypothesis to test. Depending on the result of test, there are two options :

(i) we do not reject the hypothesis Ho : X = 60, or

(ii) we reject the hypothesis, i.e. we accept the alternative HI : X # 60

Once the null hypothesis is set up, the next job is to set limits within which we expect the sample mean to lie. If the sample mean does not cross these limits then the sample supports null hypothesis. If it crosses the limits, the sample does not support the hypothesis and it is rejected.

5.5.1 Test of Significance for Large Samples Suppose we want to test whether a given sample of size n has been drawn from a population with mean p. In other words, we want to test whether the difference between sample mean and population mean is significant or not. The following example helps to illustrate the process of calculating levels of significance for mean.

The average mark obtained by a large number of boys in a spelling test is 62 with a standard deviation of 10. The same test was given to a group of 400 boys who scored an average of 60. Is the difference significant?

Step 1 -

Calculate a test statistic z =- * - , where p is the population mean. (3

Application of Statistics in Quality Engineering

62 - 60 2 In this example, z =- = - = 4 .

10 0.5 J400

Quality Tools - Statistical Step 2:

Now set the null hypothesis Ho. Assume that sample has been drawn fiom the population with mean y = 62. This implies an alternative hypothesis y # 62.

Step 3

Select the appropriate level of significance. Let a = 0.05.

Step 4

Obtain the critical value of statistic, i.e. z,corresponding to the confidence interval fiom Table 5.1. Since a = 0.05, hence confidence interval = 0.95. Thus, z, = 1.96.

Step 5

Compare the computed value of test statistic z with the critical value z,at given level of significance. If I z ( < z, we conclude that it is not significant. In other words, the difference F - y is due to fluctuations of sampling and the sample data do not exhibit any sufficient evidence against the null hypothesis. Hence the null hypothesis is accepted.

If 1 z I > z,, we conclude that the difference is significant and hence the null hypothesis is rejected at level of significance. In this example, z = 4 > z, = 1.96.

'Hence, we reject the null hypothesis that sample has been drawn fiom the population.

The test of significance can be carried for difference of means of two large samples. Let Fl be the mean of sample size nl from a population with mean yl and variance a12. Similarly, and 0; are the mean and variance of sample size nl fiom another population with mean y2. The null hypothesis to be tested is Ho: P I = y2.

- - Then the z statistic is given by z = Xl - *2 . . . (5.7)

1fa12 and oZ2 are unknown and a, ;f 6 2 then z =

where sI2 and s t are sample ~tu-iarns, .~ ,

5.5.2 Test of Significance for Small Samples

When the sample size is small, then the test of significance is based on student's t-test.

Step 1

Calculate sample variance s2.

Step 2 -

Calculate the t-statistic.

Step 3

Obtain t, from a t-distribution tableat a level of significana. Table 5.2 provides t, values for a = 0.05 and a = 0.1. Here, dof means degrees of fieedom and is equal ton-1.

Step 4 Application of Statistics in Quality Engineering

Compare the value I t I with t,. If 1 t 1 < ta, the difference between sample mean Z and population mean p is not significant. If I t I 1 t,, the difference is significant.

' I

For example, consider that the breaking strength of steel rods is specified as 17.5 units. To test this, sample of 14 rods were tested and gave the following results : 15, 18, 16,2 1, 19,2 1, 17, 17, 15, 17,20, 19, 17 and 18. We wish to know if the result of the experiment is significantly different from the specified value at 95% confidence level for the average breaking strength.

We have p = 17.5,

a = 1 - confidence level = 1 - 0.95 = 0.05 and dof = n - 1 = 13

For 1 3 degrees of freedom and a = 0.05, t0.05 = 2.16.

Since I t 1 5 t0.05 hence the difference is not significant. Hence the breaking strength for sample of steel rods do not differ significantly from the specified average. The 95% confidence limits are

Table 5.2 : Significant Values off, of I-distribution

The test of significance for difference of mean of two small samples is carried out when it is necessary to ascertain that samples have been drawn from the population with same means. Suppose we have two independent samples XI, x2, . . . , x,,, and yl, y2, . . . , yn2 of

sizes nl and n2 which have been drawn from two populations with means p1 and p2 respectively. The population variances are equal and unknown.

Quality Tools - Statistical . Step 1

State null hypothesis Ho: 2 and do not differ significantly.

Step 2

x-j7 2 2 Compute the statistic t = ( 9 - 1) SI + ("2 - 1) s2 and

II whereS2 = n, + n2 - 2

S - - + - I nl "2

s12 and sZ2 are sample variances. Dof will be equal to n, + n2 - 2 . Step 3

Compare the value ( t I with t,. If I t I I t,, accept HO at a level of significance.

SAQ 1

(a) Determine a 95% confidence interval for the mean of a normal population with variance 9, using a sample of size 100 with mean 40.

(b) To compare the price of a product in two cities ten shops were selected at random in each town and the prices were noted below.

Test whether the average prices can be said to be the same in two cities.

- -

5.6 PROBABILITY THEORY

A probability is a numerical statement about the likelihood that an event will occur. The likelihood of a dice showing 4 in a single throw is 1 in 6. The occurrence of head or tail while tossing a coin is 50 %. In tossing a coin the occurrence of a head excludes the occurrence of tail. These events are not influenced by external causes and have equal chances of coming. Such events are called mutually exclusive. The events are called collectively exhaustive if they include all possible outcomes. Since obtaining a head and tail represent every possible outcome, they are collectively exhaustive. Consider the event of drawing the number 7 and drawing a heart from a pack of 52 cards. Neither of . these are mutually exclusive events because a card with heart and number 7 can be picked up. Also they are rsot collectively exhaustive because there are other cards in the pack besides 7's and hearfs.

If a events are favorable to an event E and total number of events that can occur is b, then probability P of the happening of an event is defined as

Number of favourable cases a P = = - . . . (5.10)

Total number of exhaustive cases b

Also, the probability of failure (Q) or event a not happening is given by

Number of unfavourable cases - b - a ' - a --=1--=1-p . . . (5.1 1) = Total number of exhaustive cases b b

Thus, P + Q = 1

O l P i l O l Q l l

From the above, the two basic statements about the mathematics of probability can be stated as :

(1) The sum of the simple probabilities of all possible outcomes of an activity must equal 1.

(2) The probability, P, of any event is greater than or equal to zero and less than or equal to 1.

These statements are illustrated with the event of rolling a dice. It has six possible outcomes : 1,2 ,3 ,4 ,5 and 6. The probability of occurrence of number 1 in a throw is 116. Similarly, the probability of occurrence of number 2 is 116. Since all the outcomes are mutually exclusive, they have the same probability, i.e. 116. The sum total of all these

1 1 1 1 1 1 probabilities is -+ - + - + - + - + - $= 1 . Hence statement (1) is true. Also the

6 6 6 6 6 6 individual probabilities are less than 1. Thus statement (2) is also true.

Additive Law of Probability

The probability of occurrence of any n exclusive events El, E2,. . . , En is equal to the sum of the probabilities of the occurrence of separate events.

Consider the event of drawing a spade or drawing a club out of a pack of cards. These are mutually exclusive since drawing a spade does not involve drawing a club. Thus,

13 13 P (spade) = - and P (club) = -.

52 52

13 13 26 1 Hence, P (spade + club) = P (spade) + P (club) = - + - = - = - = 50%.

52 52 52 2

If two events are not mutually t ~ ~ l u s i v e then the probability of both events occurring together must be subtracted from the sum of probabilities of individual events, i.e.

P (El or Ez) = P (El) + P (E2) - P (E, and E2). . . . (5.13)

Consider the event of drawing a 4 or a diamond out of the pack of cards. These events are not mutually exclusive as there is a card with the number 4 and having a

4 13 1 diamond. Thus, P (4) = -, P (diamond) = - and P (4 and diamond) = -

52 52 52

Hence', P (4 or diamond) = P (4) + P (diamond) - P (4 and diamond)

Multiplicative Law of Probability

When the occurrence of one event affects the probability of occurrence of some other event, the events are said to be statistically dependent. The probability of the simultaneous happening of two events El and E2 is equal to the product of probability of El and conditional probability of Ez given that E l has already occurred.

P (El E2) = P (El) P (Ez 1 El) . . . (5.14)

In order to illustrate this concept consider the following problem.


Quality Tools - Statistical Assume a bag containing 10 balls of following descriptions :

5 are grey (G) and lettered M.

1 is grey (G) and lettered K.

2 are black (B) and lettered M.

2 are black (B) and lettered K.

If a ball andomly drawn from the bag is black find out probability that the ball is lettered K.

5 1 The probability of choosing a grey ball lettered M is P (GM) = - = - = 0.5 . .. 10 2

1 The probability of choosing a grey ball lettered K is P{GK) = - = 0.1 .

. 10

2 1 The probability of choosing a black ball lettered M is P (BM) = - = - = 0.2 . 10 5

2 1 The probability of choosing a black ball lettered K is P (BK) = - = - = 0.2. 10 5

The probability of choosing a grey ball is P (G) = P (GM) + P (GK)

= 0.5 + 0.1 = 0.6. \

The probability of choosing a black ball is P (B) = P (BM) + P (BK)

The probability of choosing a ball lettered K is P ( K ) = P (GK) + P (BK)

The joint probabilities of a ball being lettered 4 and black in color is also given as

Hence, P (BK) 0.2 ~ ( $ ) = ~ = ~ = 0 . 5

The term P' ($1 gives the probabilities ofchoosing a ball I& Kgiven that it

is black (B).

The term P (+I gives the probabilities of choosing a black ball (B) given that it

is lettered K " '

Events are said to be statistically independent itoccurrence of one event does not affect the probability of occurrence of some other event. The joint probabilities of the two events E, and &occurring together is simply the product of occurrences of each event independently, i.e.

P (El Ez) = P (Ez) p ( E l ) . . .(5.17)

For example, the probability of tossing a 6 on the first roll of a die and a 5 on the second roll is

1 1 1 P (Tossing a 6) x P (Tossing a 5) = -

6 ' d = 3 6 '

SAQ 2

(a) Find the probability that in the throwing of 2 fair dice, the sum of the faces exceed 9, given that one of the faces is a 5.

(b) Two players play a game of tossing a coin. The player who gets a head wins. Calculate the probability that first player wins if

(i) the first player starts the game.

(ii) the second player starts the game.

- - -

5.7 PROBABILITY DISTRIBUTION

If an experiment is performed on variable X and it takes value a, then the probability of this event is denoted by P (X= a). Such a variable is termed random. Suppose the event is X I a, the corresponding probability is denoted by P ( X I a). For the event X > a, the probability is P (X> a). These two events are mutually exclusive because if the event X > a occurs, the other cannot. Hence,

The random variable Xis said to be discrete if it can take at most the countable number of values denoted by xi, i = 1,2, . . . ; N. Here, N is a numbkr, which can tend to infinity.

Let XI, x2 . . . be the values for which X has a positive probability, i.e. p,, p2, . . . , then a probability function of X can be defined for each x as

pi when x= xi, j = 1,2,. . . , N 0 otherwise

If the probability function is known, then for any unique number the probability can be determined. This is called theprobabilityfunction of the random variable X. For any a and b (>a), the probability that a < X S b is

P (XI x) depends on the choice of x, it is a function of x, which is called the distribution function of X and is denoted by F (x).

The measures associated with a probability distribution are expected value and variance. The expected value is the weighted average of the values of the random variable. It is also called central tendency. The formula for calculating the average is :

The variance represents the variability in distribution. It is calculated as:

Variance = f: [xi - E (x)12 P(xi) . . . (5.22) i=1

where [xi - E(x)] is the difference between each value of the random variable and the expected value.

Any probability distribution is described about the mean. If the distribution is symmetrical about the mean it is termed normal distribution. In the next three sub-sections, three important probability functions shall be introduced. These functions are used to describe a continuous random variable. Examples of continuous random variables are time taken to finish a project, the high temperature' during a given day etc.

Application of Statutkr in Quality Engineering

Quality Tools - statistical 5.7.1 The Normal Distribution The normal distribution is the most useful and popular probability function. It can be completely specified by two parameters: the mean (p) and the variance (a2).

The mean of a normal distribution locates the center of the function and can be any real number. The variance of a normal distribution measures the variability arid can be any positive real number. The normal distribution is symmetrical about the mean as shown in Figure 5.2. The area under the curve between any two points represents the probability that the random variable will lie between these two points. Obviously, the total area under the curve is 1. The normal curve has the property that 68% of the population lies within +_ 0 limits, 95.4% within f 20 limits and 99.7% within +_ 3alimits.

Figure 5 2 : Area under Normal Curve

It is convenient to look for the probability values fiom standard table. Table 5.3 is the standard normal table. To use a table we first convert the normal distribution to a standard normal distribution. A standard normal distribution is one that has a mean of 0 and a standard deviation of 1. Any normal distribution can be transformed to a standard normal distribution using the following transformation:

x - P z = - . . . (5.24) cr

The probability may be found from Table 5.3. First learn to read this table. Suppose you want to find out the area under the normal curve to the left of z = 1.54. First go to the row corresponding to z =1.5 and in that row move to the column corresponding to 0.04. Value in that particular cell is 0.9382. Hence, the required area is 0.9382.

An Example Illustrating the Use of Table 5.3 For a probability distribution, p = 100 and cr = 15. It is required to find out the probabilities P (x 4 125) and P (x < 75). The probabilities are calculated as follows :

(i) For random variable X less than 125,Z= 125 - loo = 1.66. ~ r o m the 15

tables, P (X < 125) = P (Z = 1.66) = 0.95 15. This represents the percentage area (shaded) under the normal curve as shown in Figure 5.3.

(ii) The random'variable x is less than 75, i.e. P ( X < 75)

Z= 75 - loo = - 1.66 . The standard tabla provide values for positive Z 15

only. Since we notice that the areas about mean are symmetric, the probability P (x < 75) is equivalent to P (x > 125). From Table 5.3, P (x < 125) 7 0.95 15. Therefore,

P (X < 75) = P (X > 125) = 1 - P (X < 125) = 1 - 0.9515 = 0.0485. This represents percentage area (shaded) under normal curve as shown in Figure 5.4.

(a) Area under curve when X < 125

p = 100

(a) Area under curve when X i: 75

Figure 5.3 Figure 5.4

Table 5.3 : Area under the Normal Curve

Quality Tools - Statistical 5.7.2 The Exponential Distribution

The exponential distribution is defined by the probability function

where p is mean, e is the base of natural logarithm and x is the value of random variable. 1 1 The expected value and variance of an exponential distribution are - and - . The P P2

exponential distribution has the characteristic that is not symmetrical about the mean; 36.8 percent values will be above the average and 63.2 percent below the average. Examples of variables that are approximately exponentially distributed are :

The lifetime of light bulbs.

' Service time in a queue at banks and post offices.

The time required to process any task at the counters of banks, post offices and railway ticket booking centers is called service time. The service time at these places fit an exponential distribution. Figure 5.5 illustrates that if service time follows an exponential distribution, the probability of any very long service time is low. For example, when the average service time is 20 minutes, the person standing in a queue will hardly ever require spending more than 1 15 minutes tb receive that service. If the mean service time is low (about 1 hour) the probability of spending more than 200 minutes is very low.

Senice time (minutes)

Figure 5.5 : Exponential Probability Distribution

5.7.3 The Weibull Distribution

The Weibull distribution is one of the most widely used lifetime distributions in reliability engineering. It is a versatile distribution that can take on the characteristics of other types of distributions, based on the value of the shape parameter, P. The three parameter Weibull Probability Density Function (pdf), is given by

where a is scale parameter, p the shape parameter and ythe location parameter. The location parameter can be any real number and is the smallest possible value of variable. The shape and scale parameters are positive. One can obtain different shapes of pdf curve depending 0.n the values of three parameters. When the shape parameter is 1, the Weibull function reduces to the exponential. For P = 3.5, a = 1 and y = 0, the Weibull

distribution closely approximates to the normal distribution. Figure 5.6 shows the effect of shape parameter on shape of Weibull pdf.

Weibull pdfwit h O<p<l, p = l , and pl

X .

Figure 5.6 : Weibull Probability Distribution

The task of quality control involves identifying defectives in the product during various stages. If production has already taken place, then it is valuable to know the level of quality in a lot. A lot represents a batch containing defined number of items. For example, when 1000 items of the same kind are placed for order, the supplier delivers it in 10 lots each containing 100 items. Now, the factory receiving these 10 lots must inspect the quality in the delivered items. Inspection is necessary to identify the defective items. However, it might not be possible to individually inspect so many items because of time constraints and it is a non-productive work. In such a scenario, acceptance sampling provides a way of measuring the quality level of all items without individual inspection.

Acceptance sampling involves evaluating a portion of the product in a lot for the purpose of accepting or rejecting the entire lot as either conforming or not conforming to quality specifications. For example, while purchasing food grains, the usual procedure is to take a handful of the grains in our hands and inspecting its quality on the basis of size, colour, odour and texture. If that handful of grains is found good we decide to purchase the grains. However, if we are not satisfied, we inspect another handful of a different grain and repeat the procedure. The "handful of grains" represents the sample on which inspection is carried to determine its acceptability. The different types of grains represent the lot.

The single random sampling is the most common method of selecting a sample. In this procedure, a sample is selected in such a way that each item in the lot has an equal chance of being selected. This requires that the sample si.ze or the number of items selected for inspection must be large. The method of sampling inspection is of two types.


Quality Tools - Statistical Attribute sampling is used for items that can be classified simply as good or bad. Variable sampling is used for items whose quality can be determined only on the basis of actual measurement. This procedure makes use of gauges that can compare the dimension of a part with respect to a standard part.

5.8.1 Acceptance Sampling by Attributes

The statistical treatment used to decide the acceptance or rejection of lots is provided by a curve. This curve called Operating Characteristic (O.C.) curve is a graph of fraction defective in a lot against the probability of acceptance. This curve makes it possible to discriminate between a good and a bad lot. It is useful for the producer as well as customer..This curve can be constructed by specifyisg the following variables:

N = lot size,

n = sample size,

c = number of defective items in the sample permitted,

PA = The probability of acceptance, and

p = The fraction or percent defective.

The variable c is called the acceptance number of defectives. Thus the value of c = 2 means that if number of defectives are 0 (no defective), 1 or 2, the lot would be considered acceptable. If more than two defectives are found, the lot would be rejected. The construction of an O.C. curve shall be illustrated with an example of a sampling plan.

Let N = 1000,n = 6 0 , c = 3 andp=0.01

The above sampling plan means that one should take a random sample of 60 from a lot of 1000 items. If the sample contains more than 3 defectives, reject the lot otherwise accept it. The value of probability of acceptance (PA) is calculated using the Poisson's formula :

The term n x p represents the number of defective items. The term c! represents the factorial where c is any positive integer or 0.

Note that O! = 1. The formula in Eq. (5.27) gives the probability of finding a maximum of c defectives. For example, if percent defective in a sample of size 60 is 0.01, and the value c is 3, the probability of its acceptance is calculated as follows :

Probability that a lot contains 3 defective items and shall be accepted is

Similarly, PA (2) = 0.0988, PA (1) = 0.3293 and PA (0) = 0.5488

Thus the total probability is

PA (3) +PA (2) + PA(l) + PA(0) = 0.0198 + 0.0988 + 0.3293 + 0.5488 = 0.9967

Table 5.4 shows the probability of acceptance for lots containing different percent defective items.

Table 5.4 : Calculating Probability of Acceptance by Poisson's Formula Application of Statistics . in Quality Engineering

If a graph is constructed with the X axis containing percent defective and the Y axis containing the probability of acceptance, then the resulting curve obtained is called an O.C. curve. The 0.C curve for a particular combination of n and c provides us with a plan that helps to differentiate between good and bad lots. For example, the upper curve in Figure 5.7 shows that if a lot contains 1.2 defective items in a sample of 60, the sampling plan would accept it 96.62 % of time and reject it about 3.38 % of time. If the actual lot quality contains more than 2.4 defective items then the probability of acceptance decreases drastically. The next two curves in Figure 5.7 show the effect of increasing the sample size. These O.C. curves are constructed for sample size of 100 and 150 at an acceptance number, c = 3. It is observed that if percent defective falls to 5.1, the plan with n = 60 accept lots about 63.37 percent of time; n = 100 accept lots about 25.12 percent of time; and n = 150 accept lots about 5.35 percent of time. Hence plans with larger sample sizes are definitely more effective. By changing the acceptance level, the plan becomes tighter. This is shown in Figure 5.8.

Actual % defective in lot (n x pJ

Figure 5.7 : Effect of Increasing Sample Size on O.C. Curve

Quality Tools - Statistical

Actual % defective in lot (n x p)

Figure 5.8 : Effect of Acceptance Number on O.C. Curve

Producer's and Consumer's Risk

A producer always tries to produce the products of high quality in the hope that such a lot is accepted by the customer. The acceptable quality level (AQL) is regarded as a good level of quality. AQL represents the maximum proportion of defectives which the consumer finds definitely acceptable. However, an ideal sampling plan that discriminates the good lots perfectly always from bad lots is rare. Hence, there always remains a risk that a good lot will not be accepted. The producer S risk (denoted by a) is defined as the probability that lots of good quality level AQL will not be accepted. For example, a = 0.05 means that in the long run about 1 in 20 lots will be rejected even though the lots are coming from a process controlled at AQL level.

Lot tolerance percent defective (LTPD) is the dividing line between good and bad lots. Since lots of this quality level is regarded as poor, it is expected that the probability of accepting such a lot is low. However, if the consumer accepts such a lot entire loss has to be borne by him. The consumer's risk (denoted by P) is the probability that lots of quality level LTPD will be accepted which otherwise should be rejected. If P = 0.1 and LTPD = 2.5%, it means that the consumer will most likely accept 10% of the lots containing 2.5% defectives.

Average Outgoing Quality (AOQ)

A rejected lot is subjected to 100% inspection and all defectives found are replaced by good parts. This results in a lot free of any defectives. If a lot is accepted by sample then there is a risk that some defectives have passed. The AOQ represents the average percent defective in the outgoing products after inspection. It can be calculated as follows :

The number of defectives in the lot after n items in a sample has been passed is (N-n). This multiplied by the probability of acceptance provides the outgoing quality of lot, i.e. PA (N-n). Hence, the AOQ is this number divided by the number of items in the lot, N.


Figure 5.9 : AOQ Curve for N =1000, n = 60, c = 3

The plot of AOQ versus percent defectives is called AOQ curve as shown in Figure 5.9. The characteristic of the AOQ curve is that it shows the maximum number of defectives that can be passed. This peak in the curve is called AOQL. When good quality is presented to the plan, for example 0 to 5 percent, probability of acceptance is very high, so most of the defectives that exist will pass. As percent defective becomes worse than 5 percent then probability of acceptance declines rapidly, and the probability of 100% inspection increases, so more defectives are screened out. As a result, outgoing quality increases automatically as incoming quality worsens. In this case AOQ does not exceed 3.04 %.

'ypes of Sampling Plans

Single Sampling Plan

When a decision on acceptance or rejection of the lot is made on the basis of only one sample, the acceptance plan is known as single sampling plan. In such a plan only three numbers are specified, i.e. lot size (N), sample size ( n ) and acceptance number (c). For example N = 1000, n = 60 and c = 3 means that the sampling plan consists of 60 items from a lot of 1000; The lot is accepted if the sample contains more than 3 defectives otherwise accepted.

Double Sampling Plan

In a double sampling plan the decision on acceptance or rejection of the lot is based on two samples. This type of decision becomes important when the sample taken for inspection is neither good nor bad enough. The procedure of a double sampling plan is illustrated through the following example.

Let N = 600,

nl = 30, number of pieces in the first sample,

cl = 1, acceptance number for first sample,

nz = 45, number of pieces in the second sample, and

cz = 3, acceptance number for the two samples combined together.

Quality Tools - Statistical This is interpreted as follows :

(i) Take a first sample of 30 items from a lot of 600 and inspect.

(ii) Accept the lot on the basis of first sample if it contains 0 or 1 defective.

(iii) Reject the sample if it contains more than 3 defectives.

(iv) If the first sample contains 2 or 3 defective take a second sample of 45 items.

(v) Accept the lot on the basis of first and second sample combined if the combined sample of 75 items contains 3 or less defectives.

(vi) Reject the lot on the basis of combined sample if the combined sample of 75 items contains more than 3 defectives.

Thus this lot may be accepted in the following ways:

(a) 0 or 1 defective in the first sample without taking second sample

In order to calculate the probability of this lot assume that population is 5% defective.

Therefore, the number of defective items = 600 x 0.05 = 30.

The probability for zero defective in the first sample is

The probability for one defective in the first sampleis

Hence, the total probability of acceptance of 0 or 1 defective in the first sample without taking second sample is 0.223 + 0.335 = 0.558.

(b) 2 defectives in the first sample followed by 0 or 1 defective in the second sample.

Now for second sampling plan, N = 600 - 30 = 570. Since 2 defectives are already in the first sample, hence the number of defectives left = 30 - 2 = 28. Hence percent defectives

28 = ---- x 100 = 4.91%. Hence, the probability

570

(c) 3 defectives in the first sample followed by 0 defective in the second sample.

Since 3 defectives are already in the first sample, hence the number of defectives left = 30 - 3 = 27. Hence percent defectives 27 - x LOO = 4.74%. Hence, the probability is 570

Thus the total probability of acceptance of this lot

= 0.558 + 0.0884 + 0.0145 = 0.661.

5.8.2 Acceptance Sampling by Variables

Acceptance sampling by variables involves making actual measurements instead of simply classifying items as good or bad. The measurement is assumed to be distributed normally about tbe mean. If the standard deviation is known and constant the sampling plan is determined based on average value. When the average is less than the specified value xu, then the lot fiom which samples were drawn shall be rejected. Lots for which sample average is equal or greater than xu will be accepted.

Consider the following example of accepting aluminum rods of tensile strength less than 325 MPa from a vendor. It is required that such lots are accepted with a 10% chance. Lots of average tensile strength more than 350 MPa are regarded as good quality. It is required that such lots are accepted with 95% chance. The variables to be determined are the acceptance average (xu) and the number of samples (n).

Assume the standard deviation to be 30 MPa. Therefore, the standard deviation of 30

sampling distribution of means for samples of size n will be - . To be accepted 95% of J;;

time, AQL = 350 MPa must be 1 .645~ units above the accepted average (xu) (see Table 5.3). Hence,

Also to ensure that lots of average tensile strength = 325 MPa are accepted only with 10% chance

Solving Eqs. (5.30) and (5.3 1) simultaneously yields xu = 336,MPaand n = 12 (rounded off).

This means that if average of 12 pieces is 67.27 x 10' ~ l m * or more the lot should be accepted.

SAQ 3

(a) Calculate the probability of acceptance of a lot containing 5% defective by a sampling plan with acceptance number 2 and sample size 100.

(b) If the size of the lot is 1000, calculate the average outgoing quality.


5.9 SUMMARY

In this unit, an introduction to statistics has been provided. There is certain amount of uncertainty present in the data. When one makes the decision about a population based on the study of a sample, one cannot be 100% sure about the decision. Confidence interval tells the range in which the estimate is expected to lie at certain confidence level. Testing of hypothesis helps to assess whether the variation observed in a sample should be treated a mere chance variation. Probability theory and various type of

Quality Tools - Statistical probability distribution are also introduced in this unit. One immediate application of probability theory in quality engineering is acceptance sampling. In acceptance sampling, a lot is accepted or rejected based on the inspection of a sample. Other applications of statistics will be provided in the subsequent unit.

5.10 KEY WORDS

Acceptance Sampling : Acceptance sampling is a method of inspection, in which a lot is accepted or rejected based on the inspection of a sample.

Attribute : A quality or characteristics is called an attribute.

Central Tendency : Measures of central tendency are measures of the location of the middle or the center of a distribution.

Dispersion : It means variability.

Defectives : Defectives are items that fail to meet a required standard due to the presence of defects.

Defect : A lack of something necessary is called defect.

Hypothesis : A hypothesis is a statement supposed to be true till it is proved false.

Large Sample : A sample size greater than 30 is called large sample.

Population : Population is a set of all individuals of interest in a study.

Probability : It is the statement about the likelihood of an event occurring. It is expressed as a numerical value between 0 and 1, inclusive.

Probability Distribution : It is a mathematical function that describes a Function continuous probability distribution.

Small Samples : A sample size less than 26 is called small sample.

Variables : Scalar measurements that are continuously variable, such as temperature, thickness, amount of leakage, weights sales etc.

5.11 ANSWERS TO SAQs

SAQ 1

(a) For 95% confidence level, C = 1.96

C o 1.9616 Hence, k = - = - = 0.588

J;; m Hence, Lower limit = 40 - 0.588 = 39.412

Upper limit = 40 + 0.588 = 40.588

(b) The average price in city 1, 2 = 58.2

The average price in city 2, y = 56.0

The standard deviation in city 1, sl = 5.789

The standard deviation in city 2, s2 = 4.9216

Here, nl = nz =10 Application of Statistics in Quality Engineering

At a = 0.05 and for 18 dof, to.os = 2.1

Since t is less than this, the difference is not significant. Hence, it can be said that average prices are same in both cities at 95% confidence.

SAQ 2

(a) Probability that one of the faces is 5 is the sum of the probabilities (i) first die face is 5 and the second die face is not 5 (ii) The first qie face is not 5 and the second die face is 5 (iii) both the faces are 5.

Denote this probability as P (A).

Outcomes in which, the sum exceeds 9 are

(6,613 (651, (6,4), (5,6), (5351, (4,6). Total outcome in which sum exceeds 9 = 6.

Total possible outcomes = 36.

Probability of getting a throw in which the sum exceeds 9,

P(B) 6 6 Conditional probability P - - = - = - (5)- P(A) fl 11.

1 (b) In each toss, probability of getting a head is - .

2

(i) When the first player starts the game, he can win in the following ways :

1 Wins in his first throw, the probability is -.

2

Wins in his second throw. The corresponding probability

1 Wins in his third throw, the probability is -

32

And so on

Hence, the probability of his winning

1

(ii) Do this part on your own. Answer is 113.

Quality Tools - Statistical SAQ 3

(a) Given that p = 0.05, n = 100 and c = 2, the probability that lot contains 0 defective is

(100 x 0.05)' e-(Io0 " ' . 0 5 ) PA (0) = = e-' (from Eq. (5.27))

O!

Similarly, the probability that lot contains 1 defective and 2 defectives are

(1 00 x 0.05)' e-('OO " 0.05) PA (1) = = 5e-' and

I!

Hence the probability of acceptance is PA(0) + PA (1) -t- ~ ~ ( 2 )

(ii) The A.0.Q is obtained from Eq. (5.29).

Thus, the average percent defective in the outgoing products after inspection is 54%.

Statistics and Acceptance Sampling in TQM

Engineering

statistical tools

collection of data

bits of data

interpretation of data

a3r data

study of statistics

farmers study data

application of statistics