Top Banner
1 Basic Basic Statistics Statistics 1. Introduction to Statistics 2. Probability distributions - Binomial distribution - Poisson Distribution - Normal distribution 3. Sampling distributions and Estimation.
53
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: D2 Basic Stat

1

Basic StatisticsBasic Statistics

1. Introduction to Statistics

2. Probability distributions

- Binomial distribution

- Poisson Distribution

- Normal distribution

3. Sampling distributions and Estimation.

Page 2: D2 Basic Stat

2

1. The concept of Statistics1. The concept of StatisticsWhy Statistics?

Through the advancement of electronics and computers, today’s society is inundated with vast amount of data. In its raw form, this data is of little use. But, with statistical analysis, the data can be transformed into valuable information. This knowledge is vital for drawing conclusions and making decisions.

“ Statistical thinking will be one day be as necessary for efficient citizenship as the ability to read and write.”

H.G. Wells

Page 3: D2 Basic Stat

3

The field of statistics can be broken into two major areas: Descriptive statistics and Inferential

Statistics

• Descriptive statistics: It describes some of the fundamental features of a set of data (Population or Sample) such as mean, median, standard deviation,…

• Inferential statistics: It deals with drawing conclusions from a population based on information of the sample (drawn from the population).

Page 4: D2 Basic Stat

4

Probability and Statistics

Descriptive

Statistics

Inferential

Statistics

Population

Sample

Probability

Page 5: D2 Basic Stat

5

Data Collection

A decision can be no better than the data upon which it was based.

Why do we need to collect data?1. To identify and/or verify a problem.

2. To Analyze a problem.

3. To understand, describe, or monitor a process

4. To Test a hypothesis

5. To find a relationship between inputs and outputs of a process

Page 6: D2 Basic Stat

6

Two kinds of Numerical Data

• Continuous data: length, height, volume,….

• Discrete data: number of defects, number of failures,….

Page 7: D2 Basic Stat

7

Population and Sample

• Population is a set or collection of all possible objects or individuals of interest.

• Finite population: ex) The number of employees in Samsung Electro-Mechanics as of January 1, 2001.

• Infinite population: ex) MLCC chips coming from the production line.

Page 8: D2 Basic Stat

8

Population and Sample

• A Sample is any subset or sub collection of a population.

• A Random Sample of size n is a sample chosen in such a way that every possible sample of size n has a likely chance of being chosen equally. (unbiased).

It is highly unlikely to know the true population parameters. There is a need to draw conclusions from sample statistics.

Page 9: D2 Basic Stat

9

Characteristics of distributionCharacteristics of distribution

Statistical analysis is detecting the characteristics of data distribution and expressing that characteristics into figures.

Characteristics of distributionCharacteristics of distribution Central tendencyCentral tendency (mean, median,mode) - It shows the location where data is centered.

VariationVariation (range, variance, standard deviation) - Degree of data scattering centered on the arithmetic mean

ShapeShape

- In what direction is the data biased?

Page 10: D2 Basic Stat

10

Central tendencyCentral tendencyMode

Most frequently occurring value in a data set.

Median

Number reflecting the 50% rank of a set of values.

1) In case of data in odd number : Data in the middle

2) In case of data in even number : (Sum of two data in the middle)/2

Mean(arithmetic mean)

Average of population

Sample of population

µ = = X1 + X2 + X3 + …+ Xn

N

∑Xi

N

X = = X1 + X2 + X3 + …+ Xn

n

∑Xi

n

Page 11: D2 Basic Stat

11

Variability Variability RangeRange

Numerical distance between the highest and the lowest

values in a data set.

Variance and Standard deviationVariance and Standard deviation

Population variance population standard dev.

Sample variance Sample standard dev

The arithmetic mean is a one-dimensional value, while variance is a two-dimensional value. We get the standard deviation by extracting the square root of the variance. In sample statistics, however, the variance loses 1 degree of freedom.. In case of the sample, it has n-1 degree of freedom as divisor.

2 =∑ ( Xi – X )2

N =

∑ ( Xi – X )2

N

S2 =∑ ( Xi – X )2

n-1 S =∑ ( Xi – X )2

n-1

Page 12: D2 Basic Stat

12

Value population sample statistics

number of set N n

mean X

variance 2 s2

St. dev s

Correlation coefficient r

Regression coefficient , a, b

Error e

Comparison of symbols between parameter Comparison of symbols between parameter and statisticsand statistics

Page 13: D2 Basic Stat

13

2. Probability Distribution

The Probability Distribution of a discrete random variable is an assignment of probabilities to each of the possible values that the random variable can take on. And, its mathematical model is the Probability Density Function.

It is the major pillar of the bridge that allows us to make inferences about a population based on information

obtained from a sample

Page 14: D2 Basic Stat

14

(1) Binomial distribution(1) Binomial distribution

The problem of determining the probability associated with defective data.

A Binomial Distribution needs to satisfy the following conditions:

1) A sequence of n Bernoulli trials.(Only two possible outcomes)

2) Trials are identical.

3) Trials are independent.

4) Probability of success on every trial is the same.

Page 15: D2 Basic Stat

15

Example

<Problem>

In a certain diode manufacturing process, the defective rate is known to be 1%. When the inspector take 50 random sample every hour, what is the probability of finding no more than 1 defective.

<Solution>

The solution can be obtained by adding the probability of finding none and one.

At first, we will try to find the probability of finding none of defectives,

Page 16: D2 Basic Stat

16

From Minitab menu

Calc>Probability Distributions>Binomial

This is the place where all the probability

distributions can be found!

Page 17: D2 Basic Stat

17

Probability of finding none of defectives

Number of Random Sample

Defective rate

No defective

Page 18: D2 Basic Stat

18

Result in Session window

Defective rate of 1%

Number of Random sample

Probability of no defective is

0.6050.

Page 19: D2 Basic Stat

19

Next, probability of one defective

In this case, we put 1 here

Result is 0.3056

Total Probability: 0.6050+3056=0.9106

Page 20: D2 Basic Stat

20

Another way of calculation using worksheet.

Prepare a following worksheet.

Input the number of defect in C1( named x)

Prepare a column for

probability(named p)

Page 21: D2 Basic Stat

21

From Minitab Menu Calc>Probability Distribution>Binomial

We use this

Page 22: D2 Basic Stat

22

Probability of no defective

Probability of one defective

Final answer is additives.

Result is..

Page 23: D2 Basic Stat

23

To find cumulative probability at a time

Cumulative Probability

Check here!

Page 24: D2 Basic Stat

24

Understanding of Binomial Distribution

The binomial probability distribution is defined by

P(X=x)=nCxpx(1-p)n-x

nCx = ( ) =

n!

x!(n-x)!nx

The Binomial distribution is used frequently in quality control. It is appropriate probability model for sampling from an infinitely large population, where p represents the defective rate and x, the number of defects out of n sample.

The control chart of defects is based on the Binomial distribution with the mean and variance in the next page.

Page 25: D2 Basic Stat

25

The property of binomial distributionThe property of binomial distribution

0 1 2 3 4

P(X)

x1/16

2/16

3/16

4/16

5/16

6/16

0 1 2 3 4

P(X)

x

0.1

0.2

0.3

Binomial distribution for n=4, p=1/2

Binomial distribution for n=9, p=1/3

5 6 7 8 9

Form of binomial distributionForm of binomial distribution

1) The probability distribution always shows

symmetry in p=0.5 although n is low.

2) If n increases, probability distribution gets near

symmetry even not in p=0.5.

Expectation value, standard deviation, variaExpectation value, standard deviation, variance of binomial distributionnce of binomial distribution

Expectation value : = E(X) = np

Variance : 2 = Var(X) = np(1-p) = npq

Standard deviation : = √np(1-p) = √npq

Page 26: D2 Basic Stat

26

(2) Poisson distribution(2) Poisson distributionPoisson distribution is characterized by the form

“ the number of occurrences per unit interval”

Defect, Electric or Mechanical failure, an arrival, call,..

Time, space, area,…

Page 27: D2 Basic Stat

27

example

<Problem>

Suppose that the number of wire-bonding defects per unit that occur in a semiconductor device is Poisson distributed with mean=4. Then, what is the probability that a randomly selected semiconductor device will contain two or fewer wire-bonding defect?

Page 28: D2 Basic Stat

28

From Minitab menu File>New>Minitab Worksheet

In the worksheet, make one column of defect number(x),

And another column for cumulative probability(p)

Page 29: D2 Basic Stat

29

Calc>Probability Distribution>Poisson

1. Select Cumulative

2. Mean=4

3. Input defect number column and output

column

Page 30: D2 Basic Stat

30

Probability of no defect

Cumulative Probability of 0,1

Cumulative Probability of 0, 1, 2

Page 31: D2 Basic Stat

31

Examples for Poisson Distribution

1. The number of speeding tickets issued in a certain county per week

2. The number of disk drive failures per month for a particular kind of disk drive

3. The number of calls arriving at an emergency dispatch station per hour.

4. The number of flaws per square yard in a certain type of fabric.

Page 32: D2 Basic Stat

32

Relationship with RTYRelationship with RTY

When x=0

RTY = e-dpu

dpu = -ln(RTY)

P(X=x) = e-m mx

x!m : Average

x : no of occurence

Page 33: D2 Basic Stat

33

(3) Normal distribution(3) Normal distribution

The normal distribution is probably the most important distribution in quality control and statistical analysis.

X~N( )2 ,

Variable Normal distribution

Mean Standard deviation

Normal distribution is defined by the mean and standard deviation.

Page 34: D2 Basic Stat

34

The shape of normal distribution?The shape of normal distribution?

95.595.5%%

43210-1-2-3-4

68.368.3%%

99.7399.73%%

Symmetric

Unimodal

Bell-shaped

Page 35: D2 Basic Stat

35

What is Sigma?What is Sigma?

95.595.5%%

43210-1-2-3-4

68.368.3%%

99.7399.73%%

The distance from mean to deflection

point.

68.3% of the population values fall

between the limits defined by the mean plus and minus one

sigma.

Page 36: D2 Basic Stat

36

Probability density function Probability density function

The Probability distribution function is defined by

Page 37: D2 Basic Stat

37

Shapes of Normal curveShapes of Normal curve

95.595.5%%

43210-1-2-3-4

68.368.3%%

99.7399.73%%

1 2

1 = 1

1 2

1 2

1

2

2

1

[For difference and ]1 2 , 1 = 2

1 = 2 , 1 2

1 2 , 1 2

Page 38: D2 Basic Stat

38

Standard Normal DistributionStandard Normal Distribution

It becomes normal distribution with mean=0 and standard deviation=1.

X - Z = ————

Is used for coordinate transformation.

95.595.5%%

43210-1-2-3-4

68.368.3%%

99.799.73%3%

N(0,12)

Page 39: D2 Basic Stat

39

Minitab application

Calc>Probability distribution>Normal

X

Find area(probability)

with known x

Find x with known

Probability

Minitab recognizes left-sided area as cumulative probability

Page 40: D2 Basic Stat

40

Normal distribution Example 1

<Problem> The tensile strength of a certain product is an important quality characteristics. It is known that the strength is normally distributed with mean=40 and standard distribution of 2, denoted as N(40,22).

When the customer wants a strength of at least 35, what is the probability of customer satisfaction?

Page 41: D2 Basic Stat

41

solution

40

2

35

Known spec.

What is the

area?N(40,22).

Minitab solution provides area here!

Page 42: D2 Basic Stat

42

Check here

Mean is 40

St. deviation is 2

X is 35

Calc>Probability Distribution>Normal

Page 43: D2 Basic Stat

43

The area we want(probability) is

1-0.0062=0.9938

Page 44: D2 Basic Stat

44

Example 2It is known that the quality characteristics of certain process follows normal probability function(mean=0, st.dev.=1). When the defective rate is 1%, what is the sigma level?

<Solution> The problem is to find the value of z when the cumulative probability is known. In minitab, the inverse cumulative probability is used.

Page 45: D2 Basic Stat

45

Check here

Input 1-0.01=0.99

Page 46: D2 Basic Stat

46

Z is 2.33

Page 47: D2 Basic Stat

47

3. Sampling Distributions and Estimation

Question:

When we do not know the mean of the population, we use sample but what is degree of accuracy that this represent

the population mean?

Page 48: D2 Basic Stat

48

Standard Error of the MeanStandard Error of the Mean

x2 =_ 2

n

x =_ √n

Variance of the sample mean

Standard error of the mean

Mean of the sample mean

=

Page 49: D2 Basic Stat

49

Central Limit TheoremCentral Limit Theorem

Z=

/n

X-

For almost all populations, the sampling distribution of the mean can be approximated closely by a normal distribution, provided the sample size is sufficiently

large.

Page 50: D2 Basic Stat

50

Estimation

Estimate parameters out of sample

1) Point Estimation

single number

2) Interval Estimation

estimate confidence interval

Page 51: D2 Basic Stat

51

Confidence interval for population mean.

0

/2 = 0.025

-Z0.025= -1.96

/2 = 0.025

Z0.025= 1.96

=0.05 일때 Z/2 와 -Z /2 의 값

즉 , 신뢰수준 : 95%

1) Known standard deviation :Known standard deviation : use Normal distribution

P(-Z /2 < < Z /2 ) = 1-

X - /√n

P(L< <U) = 1-

이를 에 대해서 풀면

X- Z /2 /√n < < X+ Z /2 /√n 의 100(1-) 신뢰구간

Page 52: D2 Basic Stat

52

=0.05 일 때 t/2 와 -t /2 의 값

즉 , Reliability standard : 95%

2)unknown standard deviation : t-distribution

P(-t /2 < < t /2 ) = 1-

X - S/√n

P(L< <U) = 1-

이를 에 대해서 풀면

X- t /2 S/√n < < X+ t /2 S/√n 의 100(1-) 신뢰구간

참고 ) 상기의 모든 t- 분포는 자유도가 n-1 인 t /2, n-1 을 의미합니다 .

Page 53: D2 Basic Stat

53

Example

1. A random sample of 64 customers at a local supermarket showed that their average shopping time was 33 minutes with a sample standard deviation of 16 minutes. Find a 90% confidence interval for the true average shopping time.

2. A test on a random sample of 9 cigarettes yielded an average nicotine content of 15.6 milligrams and a standard deviation of 2.1 milligrams. Construct a 99% confidence interval for the true but unknown average nicotine content of this particular brand of cigarette. Assume that nicotine content is normally distributed.