Top Banner
ST 205: STATISTICAL INFERENCE I (Teaching Notes) Introduction: Statistics is a science – the science of inference. Data summarized or otherwise, are used in the inference along with tools of probability theory and inductive or deductive reasoning. Definition Statistical inference comprises those methods concerned with the analysis of a subset of data leading to predictions or inferences about the entire set of data. Also, statistical inference means making a probability judgment concerning a population on the basis of one or more samples. There are two subdivisions within statistics: Descriptive statistics and inferential statistics. Descriptive statistics simply summarize the given data, bringing out their important features and no attempt is made to infer anything that pertains to more than the data themselves. E.g. By Josephat Peter - UDOM
101

St 205_lecturer Notes

Oct 16, 2014

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: St 205_lecturer Notes

ST 205: STATISTICAL INFERENCE I(Teaching Notes)

Introduction: Statistics is a science – the science of inference. Data summarized or otherwise, are used in the inference along with

tools of probability theory and inductive or deductive reasoning.

Definition Statistical inference comprises those methods concerned with the analysis of a subset of data leading to predictions or inferences about the entire set of data.

Also, statistical inference means making a probability judgment concerning a population on the basis of one or more samples.

There are two subdivisions within statistics: Descriptive statistics and inferential statistics.

Descriptive statistics simply summarize the given data, bringing out their important features and no attempt is made to infer anything that pertains to more than the data themselves. E.g. In the financial year 2008/2009, seventy one (71) out of 133 district councils obtained clean financial audit which is 53.4%.

Inferential statistics uses a number of qualitative techniques that enable us to make appropriate generalization from limited observations. e.g.Suppose the department of mathematics and statistics wants to establish masters programme in the academic year 2011/2012. To meet statistician market demand, the department will conduct a survey in various institutions to explore lacking skills so that the masters curriculum to focus on.

Note: Statistical inference rely on the theory of probability regardless whether we are dealing with point or interval estimation, tests of hypothesis or correlation.

By Josephat Peter - UDOM

Page 2: St 205_lecturer Notes

A numerical measure of a population is called a population parameter, or simply parameter.

A numerical measure of the sample is called a sample statistic, or simply a statistic.

Population parameters are estimated by sample statistics. When a sample statistic is used to estimate a population parameter, the statistic is called an estimator of the parameter.

Statistical Inference is to be subject to this subject

Two important problems in statistical inference are estimation and tests of hypotheses.

Topic 1: ESTIMATIONAssume that some characteristic of the elements in a population can be represented by a random variable whose density is (x; θ) = f(x; θ), where the form of the density is assumed known except that it contains an unknown parameter θ (if θ were known, the density function would be completely specified, and there would be no need to make inferences about it).

Further more assume that the values of a random sample from f(x; θ) can be observed. On the basis of the observed

sample values it is desired to estimate the value of the unknown parameter θ or the value of some function, say (θ), of the unknown parameter. This estimation can be made in two ways (types of estimates): point and interval estimation.

1.1POINT ESTIMATIONDefinition

- Is to let the value of some statistic to represent or estimate the unknown parameter.- Is a single number which is used to estimate an unknown population parameter.

e.g. The sample mean, is the sample statistic used as an estimator of the population mean, . This estimate is a point estimate because it constitutes a single number

By Josephat Peter - UDOM

Page 3: St 205_lecturer Notes

Although it is a common way of expressing an estimate, it suffers from a limitation since it fails to indicate how close it is to the quantity it is supposed to estimate. i.e. lack of reliability and precision.

For instance, if some one claim that 40% of all second year B.Sc. statistics students (91) do not appreciate Statistical inferences lecturer, it would not be very helpful if this claim is based on a small number of students, say 5. However it is more meaningful and reliable if the number of students increases.

This implies that a point estimates should always be accompanied by some relevant information so that it is possible to judge how far it is reliable.

Other problems with point estimate are: first, careful plan some means of obtaining a statistic to use as an estimator and second, to select criteria and techniques to define and find a best estimator among many possible estimators.

Methods of finding estimatorsAssume that is a random sample from a density f(x; θ), where the form of the density is known but the parameter θ is unknown. Further assume that θ is a vector of real numbers, say that θ1, ……., θk are k parameters. We will let , called the parameter space, denote the set of possible values that the parameter θ can assume. The object is to find the statistics to be used as estimators of certain functions.

There are several methods of finding point estimators, but for our case we are going to study only three: method of moments, maximum likelihood and least square methods. Probably the most important is the method of maximum likelihood.

A statistic that is used to obtain a point estimate is called an estimator

The word estimator stands for the function, and the word estimate stands for a value of that function.

e.g. is an estimator of a mean , and is an estimate

of

By Josephat Peter - UDOM

Page 4: St 205_lecturer Notes

Notation in estimation that has widespread usage is: that is used to denote an estimate of .

Methods of MomentsIn mechanics moment is used to denote the rotating effect of a force. In statistics, it is used to indicate peculiarities of a frequency distribution. We can measure central tendency, dispersion or variability, skewness and the peakedness of the curve.

The moments about the actual arithmetic mean are:

First moment:

Second moment: =

Third Moment: =

Fourth Moment: = KURTOSIS

Let be a density of a random variable X which has k parameters . Let denote the rth moment about 0; that is, . In general will be a known function of the k parameters . This can be denoted by writing Let be a random sample from the density , and be the jth sample moment;

i.e.

Form the k equations , in k variables

and let be their solution (we assume that there is a unique solution). We say that the estimator , where estimates , is the estimator of obtained by the method of moments. The estimators were obtained by replacing population moments by sample moments.

For simplicity it can be defined that:- Population moment. Let X follow a specific population distribution. The k-th moment of the population distribution with pdf f(x) is: .

By Josephat Peter - UDOM

Page 5: St 205_lecturer Notes

- Sample moment. Let be a random sample from a pdf f(x). The

k-th sample moment is:

First moment: (These are moments about zero)

Second moment: , e.t.

Sample moments can be used to estimate population moments

Example 1Let be a random sample from a normal distribution with mean

and variance . Let . Estimate the parameters and by the method of moments.

SolutionRecall and

The method of moment’s estimator of is

(Treat as a first moment, then equate the first moment formula with )

(Treat as a second moment, then equate a second moment formula with the formula for calculating variance)

The method of moment’s estimator of is

= =

By Josephat Peter - UDOM

Page 6: St 205_lecturer Notes

Note: estimator of is not

Example 2

Solution

and

To get

and

NOTE: Method of moments estimators are not uniquely defined. So far we have been using first k raw moments. But central moments could also be used to obtain equations whose solution would also produce estimators that would be labeled method of moment estimators. Also moments other than the first k could be used to obtain estimators.

Exercise1. Let be a random sample from a uniform distribution on

. Use the method of moments to estimate the parameters

2. Let be a random sample from a Poisson distribution with parameter . Estimate .

Maximum LikelihoodThis technique of finding estimators was first used and developed by Sir R.A Fisher in 1922, who called it the maximum likelihood method.

The maximum Likelihood method provides estimators with the desirable properties such as efficiency, consistency and sufficient. It usually does not give an unbiased estimation.

By Josephat Peter - UDOM

Page 7: St 205_lecturer Notes

Example 1:Suppose we want to estimate the average grade of LG university examination. A random sample of size is taken and the sample mean

is found to be 64 marks.

Clarification: - The assumption is that a sample of represent a population.

- From which population is most probably come? A population with , 64 or 75?

Note: The population mean is either 64 or not; it has only one value.

- Hence, the term likely is used instead of probably.

Example 2:Suppose that an urn contains a number of black and a number of white balls, and suppose that it is known that the ratio of the numbers is 3:1 but it is not known whether the black or white balls are more frequent.

Explanation:

- The probability of drawing a black ball is either or .

- If n balls are drawn with replacement from the urn, the distribution of X, the number of black balls, is given by the binomial distribution

, for

Where and is the probability of drawing a black ball.

or

- Here; we draw a sample of three balls, , with replacement and attempt to estimate the unknown parameter

of the distribution.

- The choice is to be done between only two numbers, and

.The possible outcomes and their probabilities are as follows:

Outcome: x 0 1 2 3

By Josephat Peter - UDOM

Page 8: St 205_lecturer Notes

For example, if we found in a sample of 3, the estimate for p would be

preferred over because the probability is greater than .

Generally we should estimate p by when or 1 and by when or 3.

The estimator may be defined as

The estimator thus selects for every possible x the value of such that, where is an alternative value of p.

Further more:If several alternative values of p were possible, we might reasonably proceed in the same manner. Thus if we found in a sample of 25 from a binomial population, we should substitute all possible values of p in the expression

for

and choose as our estimate that value of p which maximized .

Maximum value can be found by equating first derivative equal to zero.

i.e.

we found that are the roots.

The root which gives maximum value is

The estimate has the property , where is an alternative value of p in the interval

Definition of Likelihood Function (likelihood=chance=probability)The likelihood function of n random variables is defined to be the joint density of the n random variables, say , which

By Josephat Peter - UDOM

Page 9: St 205_lecturer Notes

is considered to be a function of . In particular, if is a random sample from the density , then the likelihood function is

.

Notation for likelihood function is

- The likelihood function gives the likelihood that the random variables assume a particular value .

- The likelihood is the value of a density function; so for discrete random variable it is a probability.

- We want to find the value of which maximizes the likelihood function.

Definition of Maximum – likelihood estimator:Let be the likelihood function for the random variables

. If [where is a function of the observations] is the value of in which maximizes ,

then is the maximum–likelihood estimator of . is the maximum-likelihood estimate of for the sample

.

- is a random sample from some density , so that the likelihood function is .

Many likelihood functions satisfy regularity conditions; so the maximum likelihood estimator is the solution of the equation

Also and ln have their maxima at the same value of , and it is sometimes easier to find the maximum of the natural logarithm of the likelihood.

If the likelihood function contains k parameters, i.e.

By Josephat Peter - UDOM

Page 10: St 205_lecturer Notes

then the maximum-likelihood estimators of the parameters are the random variables , , …..,

, where are the values in which maximize .

If certain regularity conditions are satisfied, the point where the likelihood is a maximum is a solution of the k equations

.

.

.

In this case it may also be easier to work with the natural logarithm of the likelihood.

ExampleSuppose that a random sample of size n is drawn from the Bernoulli distribution , and . The sample values

will be a sequence of 0s and 1s.

The likelihood function is

= By apply ln we get:

First derivative:

By substituting ,

By Josephat Peter - UDOM

Page 11: St 205_lecturer Notes

We find the estimate as:

Consider that ; the likelihood function can be represented by the following four curves:

L(p)

0 1 p

ExampleA random sample of size n from the normal distribution has the density

=

Taking ln

, where and

Maximum location, compute first derivative

and

Equating the equations to 0 to get

By Josephat Peter - UDOM

Page 12: St 205_lecturer Notes

NOTE: - One must not always rely on the differentiation process to

locate the maximum

- Locates both minima and maxima, and hence one

must avoid using a root of the equation which actually locates a minimum. For example consider the following figure;

The actual maximum is at , but the derivative set equal to 0 would locate as the maximum.

- Maximum likelihood estimator has some desirable optimum properties other than the natural (intuitively).

- Maximum likelihood estimators posses a property which is sometimes called the invariance property of maximum likelihood estimators.

Theorem: Invariance property of maximum likelihood estimatorsLet be the maximum likelihood estimator of in the density , where is assumed unidimensional. If is a function with a single-valued inverse, then the maximum likelihood estimator of is .

By Josephat Peter - UDOM

Page 13: St 205_lecturer Notes

For example, in the normal density with known the maximum likelihood

estimator of is

By the invariance property of maximum likelihood estimators, the maximum

likelihood estimator of is

Similarly, the maximum likelihood estimator of, say, is

Extension of invariance property of maximum likelihood estimatorsExtension is done in two ways:

1. first is taken as k – dimensional rather than unidimensional2. The assumption that has a single valued inverse will be

removed.

Consider the following simple examples:- Suppose an estimate of the variance, namely , of a Bernoulli

distribution is desired. We known an estimate of is , but since is not a one-to-one function of , theorem of invariance does

not give the maximum likelihood estimator of .- The theorem below will give such an estimate as

Also consider that is desired ( is not a one-to-one function of and ), since we known the estimate of and . Then the estimate will

be

TheoremLet , where , be a maximum likelihood estimator of in the density . If

for is a transformation of the parameter space , then a maximum likelihood estimator of is

. Note that so the maximum

likelihood estimator of is

Exercise 2

By Josephat Peter - UDOM

Page 14: St 205_lecturer Notes

1. Uniform distribution

Least SquareRegression refers to the statistical technique of modeling the relationship between variables.

Consider the following simple linear regression

y …. . . . . . . . Data point

.. . . ... . . .

. . . . . . . . Regression line

x

- The points on the graph are randomly chosen observations of the two variables, X and Y.

- The straight line describes the general movement in the data

By Josephat Peter - UDOM

Page 15: St 205_lecturer Notes

We would like our model to explain as much as possible about the process underlying our data. However due to the uncertainty inherent in all real-world situation, our model will probably not explain every thing, and we will always have some remaining errors. The errors are due to unknown outside factors that affect the process generating our data.

A good statistical model uses as few mathematical terms as possible to describe the real situation. The model captures the systematic behavior of the data, leaving out the factors that are nonsystematic and can not be foreseen or predicted – the errors.

Systematic component

Data Random errors

Model extracts everything systematic in the data, leaving purely random errors

The errors, denoted by , constitute the random component in the model.

How do we deal with errors?- This is where the probability theory comes in.- The random errors are probably due to a large number of

minor factors that we cannot trace. - We assume random errors, , are normally distributed- If we have a properly constructed model, the resulting

observed errors will have an average of zero (although few, if any, will actually equal to zero)

- should be independent of each other

Note: The assumption of a normal distribution of the errors is not absolutely necessary in the regression model rather the assumption is made to so that we can carry out statistical hypothesis tests using F and t distribution.

The necessary assumption is that the errors have mean zero, a constant variance and uncorrelated.

Consider simple linear regression model:

By Josephat Peter - UDOM

Statistical model

Page 16: St 205_lecturer Notes

- We estimate the model parameters from the random sample of data we have

- Next is to consider the observed errors resulting from the fit of the model to the data. These observed errors are called residuals; represent the information in the data not explained by the model.

- If the residuals are found to contain some nonrandom, systematic component, we reevaluate our proposed model, and if possible adjust it to incorporate the systematic component found in the residual or to discard the model and try another. But if found residuals contain only randomness then the model is used.

The population simple linear regression model: ……...………………………………………… (1)

Where by: is the dependent variable, is the independent variable

(predictor), is the error term.

The model contains two parameters: = population parameter and = population slope.Equation 1 above is composed of two components: nonrandom component which is line itself and a purely random component – the error term.

The nonrandom part of the model, the straight line, is the equation for the

mean of Y, given X. i.e . If the model is correct, the average value of

Y for a given value of X falls right on the regression line.

The conditional mean of Y: ………………………………(2)

Sometimes or are used instead of to denote conditional

mean of Y, for a given value of X.

By Josephat Peter - UDOM

Page 17: St 205_lecturer Notes

As X increases, the average population value of Y also increases, assuming a positive slope of the line, and vice versa.

The actual population value of Y is equal to the average Y conditional on X, plus a random error, . Thus for a given value X:

Y = Average Y for given X + Error

Y .. . . . …..Regression line error . . . . .. . . . .. . . . . . . . . . . . . . =slope . . .. . . . . Points are the popln values of X and Y

. . . x

Model assumptions:1. The relationship between X and Y is a straight line relationship2. The values of the independent variables X are assumed fixed (not

random); the only randomness in the values of Y comes from the error term, .

3. The errors, , are normally distributed with mean 0 and a constant variance . The errors are uncorrelated with each other in successive observations. i.e. ~ or it can be written as and

EstimationSo far, we have described the population model, that is, the assumed true relationship between the two variables X and Y. Our interest is focused on this unknown population relationship, and we want to estimate it using sample information.

We want to find good estimates of the regression parameters, and . A method that gives us good estimates of the regression coefficients is the methods of least squares compared to other methods such as minimizing the sum of the absolute errors.

By Josephat Peter - UDOM

Page 18: St 205_lecturer Notes

The estimated regression equation:

In terms of data, it can be written as follows with the subscript i to signify each particular data point:

Where:

Generally:

Sum of squares for error:

Calculus (first derivative with respect to and ) is used in finding the expression for and that minimize SSE. These expressions are called the normal equations.

To discuss inference procedures, two assumptions will be considered;a. We assume that n random variables are jointly independent

and each is a normal random variable. Point estimation of and for any x will be discussed. [not to be discussed here; confidence interval for and for any x & test of hypotheses on )].

For point estimation: are independent normal random variables with means and variances . To find point estimators, we shall use the method of maximum likelihood. The likelihood function is

By Josephat Peter - UDOM

Page 19: St 205_lecturer Notes

and

The partial derivatives of with respect to are obtained and set equal to zero. We have three equations

The first two equations are called the normal equations. Solving the above equations we get

These are maximum likelihood estimates of and respectively. We notice that the must be such that ; that is there must be at least two distinct values for the .

(Properties of point estimation such as minimum variance should be shown later).

b. Assumption is that only the are pair wise uncorrelated; that is, for all . Point estimation of and

for any x will be discussed.

For this case, are pair-wise uncorrelated random variables with means and variances .

By Josephat Peter - UDOM

Page 20: St 205_lecturer Notes

Since the joint density of the is not specified, maximum-likelihood estimators of , and cannot be obtained. In models when the joint density of the observable random variables is not given, a method of estimation called least-squares can be utilized.

i.e. The values of that minimize the sum of squares

are defined to be the least-squares estimators of and

.

From the normal equations shown above, we get; .

and

The least-squares method gives no estimator for , but an estimator of based on the least-squares estimators of and is

By Josephat Peter - UDOM

Page 21: St 205_lecturer Notes

1.2INTERVAL ESTIMATE Point estimates are useful, yet they leave something to be desired. When the point estimator under consideration had a probability density function, the probability that the estimator actually equaled the value of the parameter being estimated was zero (The probability that a continuous random variable equals any one value is 0).

Hence it seems desirable that a point estimate should accompanied by some measure of the possible error of the estimate. i.e. Instead of making the inference of estimating the true value of the parameter to be a point, we might make the inference of estimating that the true value of the parameter is contained in some interval (we are referring interval estimation).

Interval estimate is an estimate constituting an interval of numbers rather than a single number. An interval estimate is an interval believed likely to contain the unknown population parameter. It conveys more information than just the point estimate on which it is based.

Like point estimation, the problem of interval estimation is twofold. There is the problem of finding interval estimators (we need

methods of finding a confidence interval). There is the problem of determining good or optimal interval

estimators (we need criteria for comparing competing confidence interval or for assessing the goodness of a confidence interval).

By Josephat Peter - UDOM

Page 22: St 205_lecturer Notes

An interval estimate of a population parameter is an interval of the form, where and depend on the value of the statistics for a

particular sample and also on the sampling distribution of .

e.g. a random sample of Matriculation examination scores for student entering B.A Statistics at the University of Dar es Salaam in the year 2002 produce an interval 50 – 70 within which we expect to find the true average all scores. The values of the end points 50 and 70 will depend on the computed sample mean and the sampling distribution of .

As the sample size increases, we know that decreases, and consequently our estimate are likely to be closer to the parameter , resulting in a shorter interval. Thus the interval estimate indicates, by its length, the accuracy of the point estimate.

Since different samples will generally yield different values of and, therefore different values of and , we shall be able to determine and such that the is equal to any positive fractional value we care to specify.

, for

Then we have the probability a probability of of selecting a random sample that will produce an interval containing .

The interval is computed from the selected sample, is then called a confidence interval, the fraction is called the confidence

coefficient or the degree of confidence, and the end points and are called the lower and upper confidence limits.

Note: 95% is most useful confidence interval. e.g. it is better to be 95% confident that the average life of LG refrigerator is between 7 and 8 years that to be 99% confident that it is between 4 and 11. We prefer a short interval with a high degree of confidence.

Some times the restrictions on the size of our sample prevent us from achieving us from achieving short intervals.

By Josephat Peter - UDOM

Page 23: St 205_lecturer Notes

In practice, estimates are often given in the form of the estimates plus or minus a certain amount. e.g. The National Bureau of Statistics, department of labor statistics may estimate the number of unemployed in a certain area to be million at a given time, feeling rather sure that the actual number is between 5.5 and 5.9 million.

Suppose that a random sample (1.2, 3.4, 0.6, 5.6) of four observations is drawn from a normal population with an unknown mean and a known standard deviation 3. The maximum likelihood estimate of is the mean of the sample observations; .

We wish to determine upper and lower limits which are rather certain to contain the true unknown parameter value between them.

From sample of size 4 from normal distribution will be normally

distributed with mean 0 and unit variance.

Hence . We can compute the probability that Z will be

between any two arbitrary chosen numbers. Consider 95%.

Thus

Substituting Z, we get

The method for finding a confidence that has been used in the example above is a general method. This technique is applicable in many important

By Josephat Peter - UDOM

Page 24: St 205_lecturer Notes

problems, but in other it is not because in these others it is either impossible to find functions of the desired form or it is impossible to rewrite the derived probability statements.

1.2.1 Confidence Interval of MEANThere are really two cases to consider depending on whether or not is known.

Confidence Interval of Mean when the population Standard Deviation is knownThe central limit theorem tells us that when we select a large random sample from any population with mean and standard deviation , the sample mean,

is (at least approximately) normally distributed with mean and standard deviation . If the population itself is normal, is normally distributed for any sample size.

Transforming Z to the random variable with mean and standard deviation , we find that – before the sampling – there is a

probability the will fall within the interval:

Once we have obtained our random sample, we have a particular value, . This particular either lies within the range of values specified by the formula above or not lie within this range.

Since the random sampling has already taken place and a particular has been computed, we no longer have a random variable and may no longer talk about probabilities. We may say that we are 95% confident that lies within the interval (about 95% of the values of obtained in a large number of repeated samplings will fall within the interval).

Note: We cannot say that there is a 0.95 probability that is inside the interval, because the interval is not random, and neither is . The population mean is unknown to us but is a fixed quantity – not a random variable.

By Josephat Peter - UDOM

Z value for 90% CI is 1.645For 99% CI. is 2.58 or (using approximation interpolation, 2.576)For 95% CI is 1.96

Page 25: St 205_lecturer Notes

We define as the Z value that cuts off an area of to the right

A confidence interval for when is known and sampling is

done from a normal population, or with a large sample:

Note: When sampling from the same population, using a fixed sample size, the higher the confidence level, the wider the interval.

e.g. 80% CI for with n = 25, = 122 and is (116.88, 127.12) but 95% CI is [114.16, 129.84]

80% is narrow compared to 95%

That means a wider interval has more of a presampling chance of capturing the unknown population parameter. If we want 100% CI for a parameter, the interval must be since the probability of capturing a parameter is 1. Such probability will be obtained by allowing Z to be anywhere from to

.

If want both a narrow interval and a high degree of confidence, we need to have a large amount of information because the larger the sample size the narrower the interval.

When sampling from the same population, using a fixed confidence level, the larger the sample size, n, the narrower the confidence interval.

Confidence interval for mean when standard deviation in unknown In constructing confidence intervals for , we assume a normal distribution or a large sample size. The assumption of known standard deviation was necessary for theoretical reasons so that we could use standard normal probabilities in constructing intervals.

In reality, is rarely known because both and are population parameters and they have to be estimated. When is unknown we may use the sample standard deviation, S, in its place. If the population is normally distributed, the standardized statistic

has a t distribution (students distribution) with degrees of freedom. The degrees of freedom of the distribution are the degrees of freedom associated with the sample standard deviation.

By Josephat Peter - UDOM

Page 26: St 205_lecturer Notes

The distribution was discovered by Gosset who was a scientist at the Guiness brewery in Dublin, Ireland in 1908. Gosset publish under the name student since he restricted his workers not to publish under their names.The t distribution is resembles the standard normal distribution, Z: it is symmetric and bell shaped. However it is flatter than Z in the middle part and has a wider tails.

The mean of a t distribution is zero. For df > 2, the variance of the t distribution is equal to

Mean of t is the same as the mean of Z, but the variance of t is larger that the variance of Z. As df increases, the variance of t is approaching 1 (as that of Z). Large variance of t implies greater uncertainty compared to Z since it is estimated by two random variables and . Since there are many t distributions we need standardized table of probabilities.

A confidence interval for when is not known (assuming a

normally distributed population); where is the value of the t

distribution with n – 1 degrees of freedom that cuts off a tail area of to its right.

Although the t distribution is correct distribution to use whenever is unknown, when df is large we may use the standard normal distribution. E.g. sample size of 200 (df is 199).

Estimation problems can be divided into two kinds; Small sample problems ( sample is less than 30) Large sample problems (sample is 30 or more)

ExampleA stock market analyst wants to estimate the average return on a certain stock. A random sample of 15 days yields an average (annualized) return of

and a standard deviation of s = 3.5%. Assuming a normal population of returns, give a 95% confidence interval for the average return on this stock. Solution

By Josephat Peter - UDOM

Page 27: St 205_lecturer Notes

Thus the analyst may be 95% sure that the average annualized return on the stock is any where from 8.43% to 12.31%.

Theorem: Error in estimating . If is used as an estimate of , we can then be confident that the

error will not exceed .

Frequently, we wish to know how large a sample is necessary to ensure that the error in estimating will not exceed a specified amount e. It means we

must choose n such that

Theorem: Sample size for estimating If is used as an estimate of , we can be confident that the error

will not exceed a specified amount e when the sample size is .

1.2.2 Confidence Interval of the difference between two Means in Paired and Independent Samples

If we have two populations with means and and variances and , respectively, a point estimator of the difference between and is given by the statistic . To obtain , we shall select two independent random samples, one from each population, of size and and compute the difference, , of the sample means.

If the independent samples means (greater than 30) is selected from normal population, we can establish a confidence interval for by considering the sampling distribution of .

We known sampling distribution of ~ ,

Then

By Josephat Peter - UDOM

Page 28: St 205_lecturer Notes

With probability;

Confidence Interval for ; and KnownIf and are the means of independent random samples of size and from populations with known variances and , respectively, a

confidence interval for is given by

where is the value

leaving an area of to the right.

For small sample we use t distribution when the populations are approximately normally distributed.

ExampleA standardized chemistry test was given to 50 girls and 75 boys. The girls made an average grade of 76 with a standard deviation of 6, while the boys made an average grade of 82 with a standard deviation of 8. Find a 96% confidence interval for the difference mean, where the first mean is the mean score of all boys and second mean is score of all girls who might take this test. (Answer: )

Small Sample Confidence Interval for ; = UnknownIf and are the means of small independent random samples of size and

respectively, from approximate normal populations with unknown but equal variances, confidence interval for is given by

where is the pooled

estimate of the population standard deviation and is the t value with degree of freedom, leaving an area of to the right.

ExampleA course in statistics is taught to 12 students by the conventional classroom procedure. A second group of 10 students was given the same course by means of programmed

By Josephat Peter - UDOM

Page 29: St 205_lecturer Notes

materials. At the end of the semester the same examination was given to each group. The 12 students meeting in the classroom made an average grade of 85 with a standard deviation of 4, while the 10 students using programmed materials made an average of 81 with a standard deviation of 5. Find a 90% confidence interval for the difference between the population means, assuming the populations are approximately normally distributed with equal variances. (Answer: )

Small Sample Confidence Interval for ; UnknownIf and , and and , are the means and variances of small independent samples of size and , respectively, from approximate normal distributions with unknown and unequal variances, an approximate

confidence interval for is given by

where is the t value with

degree of freedom, leaving an area of to the right.

ExampleRecords for the past 15 years have shown the average rainfall in a certain region of the country for the month on may to be 4.93 centimetres, with a standard deviation of 1.14 centimetres. A second region of the country has had an average rainfall in May of 2.64 centimetres with a standard deviation of 0.66 centimetres during the past 10 years. Find a 95% confidence interval for the difference of the true average rainfalls in these two regions, assuming that the observations come from normal populations with different variances. (answer: )

Difference of two means when the samples are not independent and the variances of the two populations are not necessary equal This will be true if the observations in the two samples occur in pairs so that the two observations are related. e.g. If we run a test for second year B.Sc. Statistics on a new ST 200 lecturer using 22 students, the scores before and after form our two sample.

Observations in the two samples made on the same students are related and hence form a pair. To determine the effective of the new lecturer we have to consider the difference of scores.

By Josephat Peter - UDOM

Page 30: St 205_lecturer Notes

e.g. 2. investigate maize output using different fertilizers but same area/soil/land

Confidence Interval for for paired observationsIf and are the mean and standard deviation of the difference of n random pairs of measurements, a confidence interval for is

,

Where is the t value with degree of freedom, leaving an area of

ExampleTwenty college freshmen were divided into 10 pairs, each member of the pair having approximately the same IQ. One of each pair was selected at random and assigned to a statistics section using programmed materials only. The other member of each pair was assigned to a section in which the professor lectured. At the end of the semester each group was given the same examination and the following results were recorded.

Pair 1 2 3 4 5 6 7 8 9 10Programmed Material 76 60 85 58 91 75 82 64 79 88Lecturer 81 52 87 70 86 77 90 63 85 83

Find a 98% confidence interval for the true difference in the two learning procedures. (Answer: )

1.3.1 Confidence Interval of PROPORTION

Sometimes our interest is qualitative rather than quantitative variable. Interest may be relative frequency of occurrence of some characteristic in a population. E.g. proportion of population who are users of colgate.

By Josephat Peter - UDOM

Page 31: St 205_lecturer Notes

A point estimator of the proportion in a binomial experiment is given by

the statistic , where represents the number of success in trials.

Therefore, the sample proportion will be used as the point estimate of

the parameter.

If is not expected to be too close to zero or 1, we can establish a confidence interval for by considering the sampling distribution of .

Therefore for large the distribution of is approximately normally

distribution with mean

And variance

Theorem: Confidence Interval of A large sample confidence interval for the population proportion,

:

Where the sample proportion, , is equal to the number of successes in the sample, , divided by the number of trials (the sample size), , and .

ExampleA market research firm wants to estimate the share that foreign companies have in the Tanzania market for certain products. A random sample of 100 consumers is obtained, and it is found that 34 people in the sample are users of foreign-made products; the rest are users of domestic products. Give 95% confidence interval for the share of foreign products in this market.Solution We have x = 34 and n = 100

[0.2472, 0.4328]

Thus, the firm may be 95% confident that foreign manufactures control anywhere from 24.72% to 43.28% of the market.

By Josephat Peter - UDOM

Page 32: St 205_lecturer Notes

Suppose the firm is not happy with such a wide confidence interval. What

can be done about it? Answers: either to increase sample size, if not sample

is to be increased then reduce confident interval say to 90%

Note: When estimating proportions using small samples, the binomial distribution may be used in forming confidence intervals. Since the distribution is discrete, it may not be possible to construct an interval with an exact, prespecified confidence level such as 95% or 99%.

If is used as an estimate of p, then we can be confident that the

error will not exceed a specified amount when the sample size is

. Assumption is that error can not exceed

e.g. How large a sample is required if we want to be 95% confident that our estimate of is within 0.02? let p = 0.32Solution

Since sample size is obtained after estimating p, some time it is not possible to estimate p (p is not given and cant be computed) therefore

the following technique will be used

e.g. How large a sample is required if we want to be 95% confident that our estimate of is within 0.02? Answer is 2401

1.3.2 DIFFERENCE BETWEEN TWO PROPORTIONS

Sometimes the interest is to find the difference of two proportions. E.g. estimating a difference between proportions of people with skin problems who are using medicated soap and proportions of people with no skin problem but using medicated soap.

By Josephat Peter - UDOM

Page 33: St 205_lecturer Notes

We have two populations and the problem is to estimate from first population and from population two. Sample from first population is and from population two is .

and . Their means are and , variances: and

Interest is

and variance where

Hence confidence interval is given by

Example A poll is taken among the residents of a city and the surrounding country to determine the feasibility of a proposal to construct a civic center. If 24000 of 5000 city residents favor the proposal and 1200 of 2000 country residents favor it, find a 90% confidence interval for the true difference in the fractions favoring the proposal to construct the civic centre.Answer: - 0.1414 < p1 – p2 < - 0.0986. since both ends of the interval are negative, we can also conclude that the proportion of country residents favoring the proposal is greater than the proportion of city residents favoring the proposal.

THE FINITE POPULATION CORRECTION FACTORSo far we have been assuming that the population is much larger than the sample. That is we sample from infinite population. In some case sample is obtained from finite population. In such cases, the standard error of our estimate needs to be corrected to reflect the fact that the sample constitutes a nonnegligible fraction of the entire population. When the size of the sample, n, constitutes at least 5% of the size of the population, N, we have to use a finite-population correction factor and modify the standard error of our estimator.

We need correction factor since the standard error does not account for the relative size of the sample with respect to the size of the sampled population.

Consider , as n approaching N the standard error is required to be

equal to zero since uncertainty decreases. And when n = N then standard

By Josephat Peter - UDOM

Page 34: St 205_lecturer Notes

error is required to be zero since we dealing with the entire population. The formula above says that standard error is zero only when sample size is infinite.

For that reason we need some reduction by multiplying the standard error by a finite-population correction factor.

Finite population correction factor:

Note: Correction factor is close to 1 when the sample size is small relative to the population size. The expression approaching zero as the sample size approaches the population size as required.

A large sample 32confidence interval for using a finite population

correction:

A large sample confidence interval for p using a finite

population correction:

ExampleA company has 1000 accounts receivable. To estimate the average amount of these accounts, a random sample of 100 accounts is chosen. In the sample, the average amount is units and the standard deviation is

units. Give a 95% confidence interval for the average of all 1000 accounts. Solution

Sampling fraction is . Since fraction is grater than 0.05, we

need to use a confidence interval with a finite population correction factor.

[520.96, 543.74]

1.3.1 CONFIDENCE INTERVAL OF VARIANCE

By Josephat Peter - UDOM

Page 35: St 205_lecturer Notes

In some situations, our interest centers on the population variance (population standard deviation) this happen in production processes, queuing processes and other situations.

To compute confidence intervals for the population variance, we must have knowledge of chi square denoted by . The chi-square distribution, like the t distribution has associated with it a degrees of freedom parameter, .

Note: distribution is used to estimate population variance while t distribution is used to estimate the population mean

Unlike the t distribution and the normal distribution, however, the chi-square distribution is not symmetric.

Definition- The chi-square distribution is the probability of the sum of several

independent, squared normal random variables. - Is a parametric test used for comparing a sample variance to a

theoretical population variance.

Since it is a sum of squares it can not bring out a negative value and therefore the distribution is bounded on the left by zero. The distribution is skewed to the right.

The mean of a chi-square distribution is equal to the degrees of freedom parameter, df. The variance of a chi-square distribution is equal to 2(df).

The chi-square distribution looks more and more like a normal distribution as df increase.

By Josephat Peter - UDOM

Page 36: St 205_lecturer Notes

In sampling from a normal population, the random variable has

a chi-square distribution with n – 1 degrees of freedom.

The probability that a random sample produces a value greater than some specified value is equal to the area under the curve to the right of this value.

0

We are asserting that , substituting the value of

we get .

A confidence interval for the population variance (where the

population is assumed normal): where is the value of

the chi-square distribution with n -1 degree of freedom that cuts off an area of to its right and is the value of the distribution that cuts off an area of to its left (equivalently, an area of to its right).

Since distribution in not symmetric, we cannot use equal values with opposite signs (e.g. ) and must construct the confidence interval using the two distinct tails of the distribution.

Example

By Josephat Peter - UDOM

Page 37: St 205_lecturer Notes

In an automated process, a machine fills cans of coffee. If the average amount filled is different from what is should be, the machine may be adjusted to correct the mean. If the variance of the filling process is too high, however, the machine is out of control and needs to be repaired. Therefore, from time to time regular checks of the variance of the filling process are made. This is done by randomly sampling filled cans, measuring their amounts, and computing the sample variance. A random sample of 30 cans gives an estimate . Give a 95% confidence interval fro the population variance .

SolutionDegrees of freedom = n – 1 = 30 – 1 = 29

Using these values, the confidence interval is

We can be 95% sure that the population variance is between [11765, 33604]

By Josephat Peter - UDOM

Page 38: St 205_lecturer Notes

1.3.2 CONFIDENCE INTERVAL OF RATIO OF two Variances

A point estimate of the ration of two population variances is given by

the ratio of the sample variances. If and are the variances of a

normal populations, we can establish an interval estimate of by using

the statistic whose sampling distribution is called F distribution.

Theoretically, it is the ratio of two chi-square distributions

By Josephat Peter - UDOM

Page 39: St 205_lecturer Notes

, with and degrees of freedom.

The number of degrees of freedom associated with numerator is stated first followed by sample variance in denominator. The curve of F depends not only on the two parameters and but also on the order of which we state them.

The distribution is similar to chi-square, that it is not symmetric. It is represented similar to chi-square.

Writing for with and degrees of freedom, then

.

0 f

We can establish a confidence interval for as

Where and are the values of F distribution with and degrees of freedom.

Substituting for F we get,

By Josephat Peter - UDOM

Page 40: St 205_lecturer Notes

Hence Confidence interval is

ExampleA standardized placement test in ST 205 was given to 11 female and 80 male. Female made an average grade of 82 with a standard deviation of 8, while males made an average grade of 78 with a standard deviation of 7. Find 98% confidence interval for and , where and are the variances of the populations grades for all female and male, respectively. Assume the population to be normal. Solution

, , , For 98% means Reading from the table, (this is assumed to be since 79 is not shown)

ONE SIDED CONFIDENCE INTERVALSIt is possible to construct confidence intervals with only one side. It is useful when we are interested in finding an upper bound only or lower bound only.

A right hand confidence interval for :

A left hand confidence interval for :

Note: replaces because we have only one side where an error of probability may take place in the estimation.

By Josephat Peter - UDOM

Page 41: St 205_lecturer Notes

Topic 2: PROPERTIES OF ESTIMATORS

The sample statistics we discussed, as well as other sample statistics are used as estimators of population parameters. May be we can ask ourselves that, are some of many possible estimators better in some sense than the other?

There are several criteria by which we can evaluate the quality of a statistic as an estimator. We are going to discuss: Unbiasedness, efficient, sufficient, minimum variance, Cramer – Rao inequality and Consistency.

UnbiasednessThis is very important property that an estimator should possess. If we take all possible samples of the same size from a population and calculate their means, the mean of all these means will be equal to the mean of the population.

Repeated samples are drawn by resampling while keeping the values of the independent variables unchanged. Bias is often assessed by characterizing the sampling distribution of an estimator.

Definition:An estimator is said to be unbiased if its expectation value is equal to the population parameter it estimates.

- An estimator is said to be unbiased if .-

This is to say that the sample mean is an unbiased estimator of the population mean.

This is an important property of the estimator because it means that there is no systematic bias away from the parameter of interest.

Suppose we take the smallest sample observation as an estimator of the population mean , it can be easily shown that this estimator is biased since the smallest observation is less than the mean. Its expected value must be less that , . Thus the estimator is biased downwards.

The extent of bias (systematic deviation) is the difference between the expected value of the estimator and the value of the parameter,

By Josephat Peter - UDOM

Page 42: St 205_lecturer Notes

Also i.e.

Any systematic deviation of the estimator away from the parameter of interest is called bias.

is said to be unbiased if

Note: In reality we usually sample once and obtain our estimate.

Consistency

An estimator is said to be consistent if its probability of being close to the parameter it estimates increases as the sample size increases.

The sample mean, , is a consistent estimator of because the standard error of is . As the sample size increases the standard error decreases and hence the probability that will be close to its expected value increases.

- A consistent estimator is one that concentrates in a narrower and narrower band around its target as sample size increases indefinitely.

Mean Squared Error (MSE) Mean squared error of estimators is defined as We known that;

(Refer that , B is Bias)

By Josephat Peter - UDOM

Page 43: St 205_lecturer Notes

MSE = variance of estimator +

If the estimator is unbiased, then

ExampleA company has 4,000 employees whose average month wage comes to Tshs 480,000 with a standard deviation of Tshs 120,000. Let be the mean monthly wage for a random sample of certain employees selected from this company. Find the mean and standard deviation of for a sample size of 40 and 100.Solution

, and

For sample size of 40The samples mean

which gives . As this value is less than 0.05

correction factor is not considered.

For sample size of 100The sample mean

which gives . The value is less than 0.05, no

need of correction factor.

From the example we learn that sample mean is equal to the population mean regardless of the sample size. Standard deviation is usually affected by sample size, and as sample size increase it decreases.

Efficiency (not real the same as minimum variance)(Remember efficiency differ with consistency because efficiency is based on relative frequency i.e. comparisons between two estimators).

Efficiency is a relative property. We say that one estimator is efficient relative to another. This means that the estimator has a smaller variance (also standard deviation) than the other. Efficient is measured in terms of

By Josephat Peter - UDOM

Page 44: St 205_lecturer Notes

size of the standard error of the statistic. Since an estimator is a random variable, it is necessarily characterized by a certain amount of variability. This is to say some estimates may be variable than others.

Definition An estimator is efficient if it has relatively small variance (and standard deviation).

If and are two unbiased estimators, is more efficient than if. Usually the estimator is selected based on MSE.

For exampleIn large samples, the variance of the sample mean is . As the sample size increases, the variance becomes smaller, so the estimate becomes more efficient.

Consider the probability distribution of the two estimators A and B.

AB

Curve A shows the distribution of sample means. It is more precise estimator as compared to curve B.

Estimator A is biased, though it may yield an estimate that will be close to the true value (though it is likely to be wrong). Estimate B though unbiased, can give estimates that are far away from the true value.

As such we would prefer estimate A.

By Josephat Peter - UDOM

Page 45: St 205_lecturer Notes

e.g. The sampling distribution of mean and median have the same mean, that is population mean. However, the variance of the sampling distribution of the means is smaller that the variance of the sampling distribution of the medians. As such the sample mean is an efficient estimator of the population mean, while the sample mean is an inefficient estimator.

More examples- The sample mean is an unbiased estimator for the population mean.- Given a random sample, the first observation is an unbiased

estimator for the population mean.- Given and both are unbiased, which estimator is more efficient than

the other?

SufficiencyAn estimator is said to be sufficient if it contains all the information in the data about the parameter it estimates.

is a sufficient statistics because it utilizes all the information a sample contains about the parameter to be estimated. We say is a sufficient estimator of the population mean . That means no any other estimates can provide additional information about . Another sufficient statistic is

.

Cramer – Rao Inequality Since estimator with uniform minimum mean-squared error rarely exists, a reasonable procedure is to restrict the class of estimating functions and look for estimators with uniformly minimum mean-squared error within the restricted class. One way of restricting the class of estimating functions would be to consider only unbiased estimators and then among the class of unbiased estimators search for an estimator with minimum mean-squared error.

Definition: Uniformly minimum-variance unbiased estimator (UMVUE)

Let be a random sample from . An estimator is defined to be uniformly minimum variance unbiased estimator of if and only if ;

i. (that is unbiased)

By Josephat Peter - UDOM

Page 46: St 205_lecturer Notes

ii.

Derivation of lower bound for the variance of unbiased estimatorLet be a random sample from . Let be unbiased estimator of . We consider as the probability density function that satisfies the following assumption, called regularity conditions:

i. exists for all x and all

ii.

iii.

iv. for all

The above assumptions based on continuous density function, it applies the same to discrete density function.

Under the assumptions above;

The above expression is what is called Cramer – Rao Inequality. The right hand side is called Cramer – Rao lower bound for the variance of unbiased estimators.

The Cramér-Rao (Cramer – Rao lower bound) is a limit to the variance that can be attained by an unbiased estimator of a parameter θ of a distribution.

Given a certain estimator we expect it to have low Mean Squared Error (MSE). But the question is; what is the smallest variance that an estimator can be attained by unbiased estimator of? An answer is given by Cramaer-Rao inequality.

Example

Let be a random sample from for x = 0,

1, 2, ………Solution

By Josephat Peter - UDOM

Page 47: St 205_lecturer Notes

Therefore;

   Hence this is Cramer-Rao lower

bound.

2.6 Minimum Variance

???????????????????????????????????????????????????????

APPLICATION OF THE PROPERTIES OF ESTIMATORS

Normally Distributed Population Normal population implies symmetric.

UnbiasedBoth sample mean and median are unbiased estimators of the population mean .

EfficiencyMean is more efficient than the sample median. This is because the variance of the sample median happens to be 1.57 times as large as the variance of the sample mean.

i.e.

SufficiencySample mean is more sufficient than median because its computation use the entire data set. Median is not sufficient because it is found as the point in the middle of the data set regardless of the exact magnitudes of all other data elements.

By Josephat Peter - UDOM

Page 48: St 205_lecturer Notes

ConsistenceMean is also consistent.

ProportionUnbiasedThe sample proportion is the best estimator of the population proportion.

.It is also has the smallest variance of all unbiased estimators of p.

Sample Variance,

It seems logical to divide the sum of squared deviations by n rather than n -1 because we are seeking the average squared deviation from the sample mean. The reasoning for this is explained by the concept of degree of freedom.

But if we divide by n -1, become unbiased, and if we divide by n, becomes biased.

Note: As we known that is an unbiased estimator of the population variance , the sample standard deviation S is not an unbiased estimator of the population standard deviation . There is a small bias which result S be use as an estimator relying on the fact that is the unbiased estimator of .

Degree of freedomThe number of degrees of freedom is equal to the total number of measurement (these are not always raw data points), less the total number of restrictions on the measurements. A restriction is a quantity computed from the measurements.

e.g. given 10, 12, 16 and 18 where its mean is14. We are able only to compute one unknown value let say 10 + 12 + 16 + x = 14.

If we have two samples and we known its means, degrees of freedom

becomes

By Josephat Peter - UDOM

Page 49: St 205_lecturer Notes

Topic 3: TESTING OF HYPOTHESES

Concept of HypothesisA hypothesis is a proposition that we want to verify. Collection of relevant information is required, process it using statistical techniques and then test the proposition. Hypothesis helps us to proper decision making. It is very helpful in examining the validity or theories. Hypothesis is not always necessary except for problem oriented study.

There are two types of hypothesis; Null and alternative. Amir (1989) defines a null hypothesis as an assertion about one or more population parameters. This is the assertion we hold as true until we have sufficient statistical evidence.

By Josephat Peter - UDOM

Page 50: St 205_lecturer Notes

Normally null hypothesis is denoted by . This is the hypothesis of no effects or difference. Consider the following current saga in Bagamoyo District Council, some officials are accused for fund misuse, if they are taken before magistrate (before verdict) the persons are considered not committed fund misuse. So the statement “Bagamoyo District Council officials did not misuse fund” is called the null hypothesis.

Alternative hypothesis, denoted by , is the assertion of all situation not covered by the null hypothesis (Amir, 1989). Beri (2003) definite alternative hypothesis as the opposite of the null hypothesis. From an example of Bagamoyo saga, alternative hypothesis is “Bagamoyo District Council officials misused fund”.

Generally, whenever null hypothesis is specified, alternative hypothesis must also be specified. It should be noted that it is not possible for null and alternative hypotheses to be true at once. There are only two ways we can make a conclusion on a proposition: either not to reject the null hypothesis that means alternative hypothesis is untrue or reject the null hypothesis while accepting the alternative hypothesis.

It is possible to have two or more alternative hypothesis but should be tested one at a time against null hypothesis.

Both in the null and alternative hypothesis, the sample statistics such as are not used. Instead the population parameter such as Example (statistical example)Consider that a drug manufacturing company has installed a machine that fills automatically 5 grams in a small bottle. SolutionAt the beginning we are assuming that what the company claims is true. Thus;

Procedure in hypothesis testing There five steps involved in testing a hypothesis.

1. Formulate a hypothesis. This is the first step where setting of two hypotheses should be done, i.e. and .

2. Set up a suitable significance level. In testing validity of hypothesis we need a certain level of significance. The confidence

By Josephat Peter - UDOM

Page 51: St 205_lecturer Notes

level with which a null hypothesis is rejected or accepted depends upon the significance level used for the purpose. E.g. a significance level of 5% means that we have about 5% of making wrong decision, accepting a false hypothesis or rejecting a true hypothesis.

3. Select test criterion. Selection of appropriate statistical technique as a test criterion is the third step. We know that there is a lot of statistical test, z-test for >30 and t-test for <30 etc. Statistical test normally used in hypothesis testing are Z, t, F and .

4. Compute. Computation of testing statistic and other necessary computations.

5. Make Decision. This is the final step where statistical decision is made involving the acceptance or rejection of the null hypothesis. This depends whether the computed value of the test criterion falls in the region of acceptance or in the region of rejection at a given level of significance. The statement “rejecting the hypothesis” is stronger than the statement accepting it. It is much easier to prove something false than to prove it true.

Often, we wish to test the null hypothesis and see whether we can reject it in favour of the alternative hypothesis. In a test of the value of the population parameter we normally employ a test statistics.

A test statistic is a sample statistics computed from the data. The value of the test statistic is used in determining whether or not we may reject the null hypothesis.

We decide whether or not to reject the null hypothesis by following a rule called the decision rule.

The decision rule of a statistical hypothesis test is a rule that specifies the conditions under which the null hypothesis may be rejected.

Two types of errors in hypothesis testingIn testing a hypothesis there are four possibilities;

1. The hypothesis is true but our test leads to its rejection 2. The hypothesis is false but our test leads to its acceptance3. The hypothesis is true and our test leads to its acceptance4. The hypothesis is false and our test leads to its rejection

By Josephat Peter - UDOM

Page 52: St 205_lecturer Notes

The first two leads to an erroneous decision. The first probability leads to a Type I error and the second possibility leads to a Type II error .

State of Nature Decision is true is falseAccept Correct decision Type II errorReject Type I error correct decision

i.e.P (Reject ; is true)P (Accept ; is false)

Note The word accept above is in order. Usually before carrying out the actual test to try to reject the null

hypothesis, the probability that we will make type I error is known. This probability is preset small say 0.05. Knowing the probability of making type I error i.e. to reject a null hypothesis which should not be rejected, makes our rejection of a null hypothesis a strong conclusion.

We can not say that we are accepting the null hypothesis because we do not know the probability of making type II error , i.e. fail to reject a false null hypothesis – this is weak conclusion.

When we reject the null hypothesis, we feel fairly confident that the hypothesis should indeed be rejected. When we fail to reject the null hypothesis, we feel that we did not have enough evidence to reject the hypothesis. Either the null hypothesis is indeed true, or more evidence is needed for it to be rejected.

We emphasize, however, that accept will mean that there is not evidence to reject the null hypothesis.

NoteThe level of significance has a big role in committing either of these two errors. If we choose a level of significance which is very small (we are avoiding making type I error) we increase the probability of committing type II error. Similarly, if level of significance is high (avoiding type II error) there is an increase of making type I error. The solution is to choose the level of significance which is not too small or big. The only way to get rid of this is to increase sample size.

Definition

By Josephat Peter - UDOM

Page 53: St 205_lecturer Notes

The level of significance of a statistical hypothesis test is , the probability of committing a type I error.

DefinitionThe rejection region of a statistical hypothesis test is the range of numbers that will lead us to reject the null hypothesis in case the test statistic falls within this range. The rejection region, also called the critical region, is defined by the critical points. The rejection region is designed so that, before the sampling takes place, our test statistic will have a probability of falling within the rejection region if the null hypothesis is true.

Rejection region Acceptance Rejection regionregion

Tabulated value Tabulated valueDefinitionThe acceptance region is the range of values that will lead us not to reject the null hypothesis if the test statistic should fall within this region. The acceptance region is designed so that, before the sampling takes place, our test statistic will have a probability of falling in the acceptance region if the null hypothesis is true.

Tails of a testRejection region in hypothesis can be on both sides of the curve with the non-rejection region in between the two rejection regions.

A hypothesis test with two rejection regions is called a two-tail test and a test with one rejection region is called a one-tail test. The one rejection region can be either of the regions, right (right tail test) or left (left tail test).

How to find out that a particular test is a two-tail, right or left tail test?

By Josephat Peter - UDOM

Page 54: St 205_lecturer Notes

Signs in the tails of a testTwo tail test left tail test right tail test

Sign in the = = or = or Sign in the < >Rejection region in both tails in the left tail in the right tail

e.g.

Note: We say that a statistical result is significant at the level of significance if the result causes us to reject our null hypothesis when we carry out the

test using level of significance .

Testing Hypotheses about Mean (large sample)Consider the problem of testing the hypothesis that the mean of a population, with known variance equals a specified value against two sided alternative that the mean is not equal to .

i.e.

An appropriate statistic on which we base our decision criterion is the random variable . By using the significance level of , it is possible to find two critical values, and , such that the interval defines the acceptance region and the two tails of the distribution, and , constitute the critical region.

The critical region is given in terms of Z values by the means of transformation

Hence with given level of significance we have

By Josephat Peter - UDOM

Page 55: St 205_lecturer Notes

From the population we select a random sample of size n and compute the sample mean .

ExampleA company manufacturing automobile tyres finds that tyres-life is normally distributes with a mean of 40,000 km and standard deviation of 3,000 km. It is believed that a change in the production process will result in a better product and the company has developed a new tyre. A sample of 100 new tyres has been selected. The company has found that the mean life of these new tyres is 40,900 km. Can it be concluded that the new tyre is significantly better that the old ones, using the significance level of 0.01?SolutionWe are interested in testing whether or not there has been an increase in the mean life of tyre.Steps

1.

2. The significance level is 0.013. The test criterion is the Z-test4. Computation

5. z tabulated = . If we compare with z computed we see that z computed is greater that z tabulated, then we reject the null hypothesis.

By Josephat Peter - UDOM

Page 56: St 205_lecturer Notes

i.e. since we reject the null hypothesis that . That means that the new tyre is significantly better than the old ones.

The power of statistical testThe power of a statistical test, given as = P (reject when is false) measures the ability of the test to perform as required.

is called the power of the function

When is low (a value very close to zero) it is an indication that our hypothesis test is working poorly. In contrast if is large (very close to 1), we can be sure that our hypothesis test is working quite well.

The power of a statistical hypothesis test depends on the following factors;

The power depends on the distance between the value of the parameter under the null hypothesis and the true value of the parameter in question. The greater this distance, the greater the power.

The power depends on the population standard deviation. The smaller the population standard deviation, the greater the power.

The power depends on the sample size used. The larger the sample, the grater the power.

The power depends on the level of significance of the test. The smaller the level of significance, , the smaller the power.

Testing Hypotheses about Mean (small sample)Small sample test statistic for the population mean, :

When the population is normally distributed and the null hypothesis is true, the test statistic has a t distribution with degrees of freedom.

Example

By Josephat Peter - UDOM

Page 57: St 205_lecturer Notes

A manufacturer of electric batteries claims that the average capacity of a certain type of battery that the company produces is at least 140 ampere-hours with a standard deviation of 2.66 ampere-hours. An independent sample of 20 batteries gave a mean of 138.47 ampere-hours. Test a 5 percent significance level the null hypothesis that the mean life is 140 ampere-hours against alternative that it is lower. Can the manufacturer’s claim be sustained on the basis of this sample?Solution

The mean life of batteries is 140 ampere-hoursThe mean life of batteries is < 140 ampere-hours

Level of significance:

Test statistic: t

Computation:

140

1.729 We reject the null hypothesis since is within rejection region. Hence we conclude that the sample mean is less than 140 ampere-hours.

Testing Hypotheses about DIFFERENCE BETWEEN TWO POPULATION MeansIndividual means is referred to as a one sample test. In some cases we are required to test whether there is any difference between two means, in such a case we need samples from each group. This is known as two sample tests.

The procedure for testing the hypothesis is similar to that used in on-sample tests. Here, we have two populations and our concern is to test the claim as to a difference in their sample means.

By Josephat Peter - UDOM

Page 58: St 205_lecturer Notes

e.g. the government may claim that there is no difference between the average monthly pension of its central and local government retired employees.

From the example we have average monthly pension for central government employees and that of local government . We take a random samples of size and and determine their means and along with sample standard deviations and .

When the Z statistic takes the following form

, when

and are unknown.

ExampleA potential buyer wants to decide which of the two brands of electric bulbs he should buy as he has to buy them in bulk. As a specimen, he buys 100 bulbs of each of the two brands – A and B. On using these bulbs, he finds that brand A has a mean life of 1,200 hours with a standard deviation of 50 hours and brand B has a mean life of 1,150 hours with a standard deviation of 40 hours. Do the two brands differ significantly in quality? Use 0.05SolutionStep 1

Where = mean life of brand A bulbs and = mean life of brand B bulbs

Step 2Level of significance = 0.05

Step 3Test statistic = Z

Step 4: Computations

By Josephat Peter - UDOM

Page 59: St 205_lecturer Notes

Step 5: decisionThis is two tails, Z value is . The calculated Z value falls in the rejection region, then we reject the null hypothesis and therefore conclude that the bulbs of two brands differ significantly in quality.

When , t test is used

, where

Testing Hypotheses for the population proportion (large sample)

We known that, when the sample size is large, the distribution of the sample proportions, may be approximately by a normal distribution with mean p

and standard deviation , recall conditions when and . The test

statistics we use is Z,

or (binomial approximation)

We use - the hypothesized value of p under the null hypothesis.

ExampleA commonly prescribed drug on the market for relieving nervous tension is believed to be only 60% effective. Experimental results with a new drug administered to a random sample of 100 adults who were suffering from nervous tension showed that 70 received relief. Is this sufficient evidence to conclude that the new drug is superior to the one commonly prescribed? Use 0.05Solution

By Josephat Peter - UDOM

Page 60: St 205_lecturer Notes

Critical region:

Computed

Decision: Reject null hypothesis and conclude that the new drug is superior. (z computed > z tabulated)

Testing Hypotheses for the DIFFERENCE BETWEEN TWO proportionThe test statistics z for test concerning differences between two population proportions

When , then the test statistics z is

, where pooled ,

e.g.You obtain a large number of components to an identical specification from two sources. You may notice that some of the components are from the suppliers own plant in Msalato and some are from the plant located Makuru. You would like to know whether the proportions of defective components are the same or there is a difference between the two. You take a random sample of 600 components from each plant and find that the rejection rate is 0.015 for Msalato components as compared to = 0.017 for Makuru component. Set up the null hypothesis and test it at 5 percent level of significance. Solution

, where and are the proportions of defective

Components from Msalato and Makuru respectivelyThis is two tails testLevel of significance is 0.05, both large samplesZ tabulated is

By Josephat Peter - UDOM

Page 61: St 205_lecturer Notes

Z computed ,

We do not reject the null hypothesis since z computed does not fall in the rejection region. Thus, there is no difference in the rejection rates of components from Msalato and Makuru.

Testing Hypotheses for POPULATION VARIANCE

Some times we may be interested to draw conclusion on whether population variance exceeds some level. Test statistics for the population variance:

Where is the value of the variance stated in the null hypothesisTesting the chi-square distribution require the assumption of a normally distributed population.We reject null hypothesis if chi-square computed > chi-square tabulated

ExampleA machine makes small metal plates that are used in batteries for electronic games. The diameter of a plate is a random variable with mean 5mm. As long as the variance of the diameter of the plates is at most 1.00 (mm2), the production process is under control and the plates are acceptance. If, however, the variance exceeds 1.00, the machine must be repaired. The engineer collects a random sample of 31 plates and finds that the sample variance is . Is there evidence that the variance of the production process is above 1.00?SolutionA quality control engineer wants, therefore, to test the following hypotheses:

By Josephat Peter - UDOM

Page 62: St 205_lecturer Notes

Reading chi square with and 30 df the value is 43.77 reject null hypothesisFor , chi-square tabulated is 46.98 reject null hypothesisFor , chi-squared tabulated is 50.89 do not reject null hypothesis

The variance of production exceed 1 when . So at this level of significance better stop the machine and do service.

Note:As degree of freedom increases approaches a normal distribution with mean df and variance 2df (from the central moment theorem).

e.g. if we have a random variable with 150 df , normal approximation is and standard deviation (because the variance is twice).

Z computed is (if it is two tails test), Z tabulated is 1.96

Testing Hypotheses for difference in two VARIANCEIn measuring whether two independent populations have the same variability, we use F-test, which is the ratio of the two sample variances. The population is assumed to be normally distributed.

The F-test statistics for testing the equality of two variances is given below:

Is variance of sample 1

Is variance of sample 2The test statistics F follows an F distribution with and degrees of freedom.

e.g.Suppose a company manufacturing lights bulbs is using two different processes A and B. The life of the light bulbs of process A has a normal

By Josephat Peter - UDOM

Page 63: St 205_lecturer Notes

distribution with mean and standard deviation . Similarly, for process B, it is and . The data pertaining to the two processes are given below

Test that the variability of the two processes is the same. Solution

Test statistic is F

Computations:

This is a two tail test, then 1.46 is compared with As 2.20 is greater than 1.46 we do not reject the null hypothesis indicating that there is no significant in the variability of the two samples.

THE p-VALUESo far we have been arbitrary specifying level of significance. As such mere acceptance or rejection of a hypothesis fails to show the full strength of the sample evidence.

Alternative is to use p-value approach.

ExampleLet n = 600

By Josephat Peter - UDOM

Sample A Sample B

Page 64: St 205_lecturer Notes

If we let , the critical point is we do not reject the null hypothesis

Again if , critical point is 1.645, we do not reject the null hypothesis.

Question: is it not possible to accept null hypothesis at larger value of 0.1? Answer: we can – simply compute the smallest possible at which we may reject the null hypothesis.

If that is the case at which level of can we reject the null hypothesis, given that the value of our test statistic is if we insist on rejecting the null hypothesis?

Test statistic value z = 0.519

nnnnnnnnRejection region: Area = 0.10 0 1.28

Note: the area to the right of 1.28 is 0.10, which is why we could reject the null hypothesis at 0.1 level if our test statistics (computed) is as small as 1.28.

From that concept, let find the area to the right of the computed value. That is the area right of z = 0.519. The area represents the smallest probability of a type I error, the smallest possible level at which we may reject the null hypothesis.

Read z = 0.519 in the standard normal probabilities tables you get 0.3018 i.e. 0.5 – 0.1982 (through interpolation) or 0.3015 if we consider nearby value. This number is called the p-value. The number 0.3018 means that, assuming that the null hypothesis is true, there is a 0.3018 probability of obtaining a test statistic value as extreme as we have (0.519) or more extreme i.e. further to the right (one tail) of z = 0.519. Since 0.3018 is greater probabilities than normal probabilities 0.1, 0.05, 0.01 we accept the null hypothesis.

Test statistic value z = 0.519 n

nnn p-value = area to the right of the test statistic

By Josephat Peter - UDOM

Page 65: St 205_lecturer Notes

nnnnnn (right –hand tail test) nnnnnnnnnnnnn p-value = 0.3018 0 Definitions

The p-value is the observed level of significance, which is the smallest value at which can be rejected.

Or

The p-value is the probability of obtaining a value of the test statistic as extreme as, or more extreme than, the actual value obtained, when the null hypothesis is true.

p – Value decision 1. If the , is not rejected [do not reject if ]2. If the , is rejected [reject if ]

Some of the rule of thumb developed by statisticians as aids in interpreting p-values

When the p –value is smaller than 0.01, the result is called very significant.When the p – value is between 0.01 and 0.05, the result is called significant.When the p – value is between 0.05 and 0.10, the result is considered by some as marginally significant (and by others as not significant).When the p –value is greater than 0.10, the result is considered by most as not significant.

The p –value gets smaller as the test statistic falls further away in the tail of the distribution. That even if we unable to compute the p –value, we may have an idea about its size. Suppose z = 120.97 implies that p – value is extremely small number, hence reject the null hypothesis with much conviction.

Conversely, the closer our test statistic is to the centre of the sampling distribution, the larger is the p –value; hence we may be more convinced that we do not have enough evidence to reject the null hypothesis and should therefore accept it.

By Josephat Peter - UDOM

Page 66: St 205_lecturer Notes

p- value for t , chi-square and other distributionsIn the case of statistical tests where the sampling distribution of the test statistic is the t –distribution, we can see that the exact p-values are not obtained because the table contains values for only a few selected standard values of such as 0.01, 0.05.

In such situations we make some relative statements about the p-value. For example if we have a left hand tailed test with test statistic as t = - 2.4 with df = 15 we find the value 2.4 for a t random variable with 15 degrees of freedom falls between the two values 2.131 and2.602, corresponding to one-tail areas of 0.025 and 0.01 respectively. We may therefore conclude that the p –value is between 0.01 and 0.025.

The same should apply for other distributions.

Two –Tailed testsIn a two-tailed test, we find the p-value by doubling the area in the tail of the distribution beyond the value of the test statistic. e.g. if p – value is 0.1131 for one tail, then for two tail is 2(0.1131)=0.2262

Refer the concept of

Level of significance, 0.10 0.05 0.01

One-tailed test + or – (1.28) + or – (1.645) +or – (2.326)Two-tailed test + or – 1.645 + or – (1.96) + or – (2.576)

TESTS INVOLVING FINITE POPULATIONS

For finite population use the correction factor by multiplying the

standard error so long as sample size, n, represents 5% or more of the population.

Sample size determination for hypothesis tests:

The minimum required sample size in hypothesis tests of to satisfy a given significance level and a given power:

By Josephat Peter - UDOM

Page 67: St 205_lecturer Notes

Where and are the required z values determined by the probabilities and , respectively, and are used in their absolute value form. The values and are the population mean under the null hypothesis, and a value of the population mean under the alternative hypothesis at which the specified power is needed, respectively.

The minimum required sample size in hypothesis tests of p to satisfy a given significance level and a given power:

Where and are the required z values determined by the probabilities and , respectively, and are used in their absolute value form. The values and are the null –hypothesized population proportion and the value of p under the alternative hypothesis at which the stated power is needed, respectively.

GOODNESS –OF – FIT TESTSo far we have been testing statistical hypotheses about single population parameters such as and . Now we consider testing if a population has a specified theoretical distribution. The test is based upon how good a fit we have between the frequency of occurrence of observations in an observed sample and the expected frequencies obtained from the hypothesized distribution. We need to estimate how accurately the fit function approximates the observed distribution.

For binned data, one typically applies a statistic to estimate the fit quality. It should be noted that the application of the statistic is limited. The test is neither capable nor expected to detect fit inefficiencies for all possible problems. This is a powerful and versatile tool but it should not be considered as the ultimate solution to every goodness – of – fit problem.

e.g. Consider the tossing of a die. If we hypothesized that the toss is fair (uniform distribution of outcomes) and the die is tossed 120 times, then we expect that each face will occur 20 times.

By Josephat Peter - UDOM

Page 68: St 205_lecturer Notes

Faces1 2 3 4 5 6

Observed 20 22 17 18 19 24Expected 20 20 20 20 20 20

By comparing the observed frequencies with the corresponding expected frequencies we must decide whether these discrepancies are likely to occur due to sampling fluctuations and the die is balanced, or the die is not honest and the distribution of outcomes is not uniform.

The appropriate statistic on which we base our decision criterion for an experiment involving k cells is defined as:

A goodness – of – fit test between observed and expected frequencies is based on the quantity

Where is a value of the random variable whose sampling distribution is approximated very closely by the chi-square distribution. The symbols and represents the observed and expected frequencies, respectively, for the ith cell.

If the observed frequencies are close to the corresponding expected frequencies, the value will be small, indicating a good fit (Accept

). If the observed frequencies differ considerably from the expected

frequencies, the value will be large and the fit it poor (Reject ) constitutes the critical region The decision criteria should be use only when the expected

frequencies is at least equal to 5 The df depends on two factors: the number of cells in the experiment

and the number of quantities obtained from the observed data the are necessary in the calculation of the expected frequencies.

The number of degrees of freedom in a chi-square goodness-of-fit test is equal to the number of cells minus the number of quantities obtained from

By Josephat Peter - UDOM

Page 69: St 205_lecturer Notes

the observed data, which are used in the calculations of the expected frequencies.

E.g. Uniform Distribution exampleIn a uniform distribution, the probabilities for each expected value are the same. When the data are discrete, the expected value for each category is obtained by dividing the total number of observations by the number of categories. When the data are continuous, a suitable number of classes must first be determined. The total number of observations can then be divided equally among the classes.

Fifty students were randomly selected and asked to state their preference for one of five candly bars. The results are shown below.

Candy Bar A B C D ENumber 8 12 9 11 10

Can we conclude that the students do not prefer one candy bar over another? In other words, can we conclude that the preference for candy bars is uniformly distributed?

SolnIf all candy bars are equal in terms of preference, the results from the survey would have been as follows

Candy Bar A B C D ENumber 10 10 10 10 10

Hypothesesthe five candy bars are equally preferredthe five candy bars are not equally preferred

Candy Bar A B C D ENumber 8 12 9 11 10Expected 10 10 10 10 10

By Josephat Peter - UDOM

Page 70: St 205_lecturer Notes

Since chi square computed < chi square tabulate we accept the null hypothesis.

Test for Independence- Chi square test is also used to test for independence of two

variables. - The observed frequencies are presented in a contingency

tableRow/column

- Expected frequencies

where the summation extends over all row x column cells in

the row *column contingency table. If with degrees of freedom, reject the null hypothesis of independence at the level of significance; otherwise, accept the null hypothesis.

Test for IndependenceThe chi-square statistic for testing independence is also applicable when testing the hypothesis that k binomial populations have the same parameter p. Hence we are interested in testing the hypothesis against the alternative hypothesis that the population proportions are not all equal.

To perform this test we first select independent random samples of size from k populations and arrange the data in the 2 x k contingency

table.

The expected cell frequencies are calculated as above and substituted together with the observed frequencies into the chi-square formula for

By Josephat Peter - UDOM

Page 71: St 205_lecturer Notes

independence, with degrees of freedom.

Conclusion is reached with

By Josephat Peter - UDOM