ESTIMATION OF STATISTICAL PARAMETERS Estimation theory is a branch of statistics based on measured/empirical data that has a random component. An estimator.

ESTIMATION OF STATISTICAL PARAMETERS

Estimation theory is a branch of statistics based on measured/empirical data that has a random component.

An estimator attempts to approximate the unknown parameters using the measurements.

1

In statistics, estimation refers to the process by which one makes inferences about a population, based on information obtained from a sample

OUTLINEObjectives:

• Describe the characteristics of the normal distribution in statistical terms

• Explain the concept of a confidence interval

and how it relates to an estimated parameter

Point Estimate vs. Interval Estimate

Statisticians use sample statistics to estimate population parameters.

For example:• sample means are used to estimate population means;• sample proportions, to estimate population proportions.

3

An estimate of a population parameter may be expressed in:

Point estimate. A point estimate of a population parameter is a single value of a statistic.

For example, the sample mean x is a point estimate of the population mean μ.

Similarly, the sample proportion p is a point estimate of the population proportion P.

4

Point Estimate vs. Interval Estimate

Interval estimate. An interval estimate is defined by two numbers, between which a population parameter is said to lie.

For example, a < x < b is an interval estimate of the population mean μ. It indicates that the population mean is greater than a but less than b.

Confidence IntervalsStatisticians use a confidence interval to express the precision and

uncertainty associated with a particular sampling method.

A confidence interval consists of three parts.

1. A confidence level.

2. A statistic.

3. A margin of error.

The confidence level describes the uncertainty of a sampling method.

The statistic and the margin of error define an interval estimate that describes the precision of the method.

The interval estimate of a confidence interval is defined by:

the sample statistic + margin of error.

The probability part of a confidence interval is called a confidence level.

The confidence level describes how strongly we believe that a particular sampling method will produce a confidence interval that includes the true population parameter.

5

Standard Error• To compute a confidence interval for a statistic, you

need to know the the standard deviation or the standard error of the statistic.

• This lesson describes how to find the standard deviation and standard error, and shows how the two measures are related

NotationThe following notation is helpful, when we talk about the standard deviation and the standard error.

Population parameter Sample statistic

N: Number of observations in the population n: Number of observations in the sample

μ: Population mean x: Sample estimate of population mean

σ: Population standard deviation s: Sample estimate of σ

Standard Deviation of Sample Estimates

• Statisticians use sample statistics to estimate population parameters. Naturally, the value of a statistic may vary from one sample to the next.

• The variability of a statistic is measured by its standard deviation.

8

Statistic Standard Deviation

Population mean

Statistic Standard Error

Sample mean,

The equations for the standard error are identical to the equations for the standard deviation, except for one thing - the standard error equations use statistics where the standard deviation equations use parameters. Specifically, the standard error equations use p in place of P, and s in place of σ.

Central Limit Theorem• The distribution of sample means (sampling distribution) from a

population is approximately normal if the sample size is large, i.e.,

9

1. The population distribution can be non-normal. 2. Given the population has mean m, then the mean of the sampling distribution, 3. if the population has variance s2, the standard

deviation of the sampling distribution, or the standard error (a measure of the amount of sampling error) is

Estimation & Confidence Intervals• Normal distribution:

• Gaussian distribution• Symmetric• Not skewed• Unimodal• Described by two parameters:

• Probability density function:• μ & σ are parameters• μ = mean• σ = standard deviation• π, e = constants

10

2x21

e21

)x(

http://www.shodor.org/interactivate/activities/NormalDistribution/?version=1.6.0_07&browser=Mozilla&vendor=Sun_Microsystems_Inc.&flash=10.0.22

Estimation of Confidence Intervals

• Normal distribution: Why do we use it!• Many biological variables follow a normal distribution • The normal distribution is well-understood, mathematically

• Punctual estimation• Is a value for estimated theoretical parameter

• m (sample mean) is a punctual estimation of μ (population mean)

• Is influenced by the fluctuations from sampling• Could be very far away from the real value of the

estimated parameter

11

Point Estimations•

Why Confidence Intervals?We are not only interested in finding the point estimate for the mean, but also determining how accurate the point estimate is.

The Central Limit Theorem plays a key role here. We assume that the sample standard deviation is close to the population standard deviation (which will almost always be true for large samples).

Then the Central Limit Theorem tells us that the standard deviation of the sampling distribution is

13

We will be interested in finding an interval around x such that there is a large probability that the actual mean falls inside of this interval.

This interval is called a confidence interval and the large probability is called the confidence level.

DefinitionsA range around the sample estimate in which the population estimate is expected to fall with a specified degree of confidence, usually 95% of the time at a significance level of 5%.

P[lower critical value < estimator < higher critical value] = 1-α

α = significance levelThe range defined by the critical values will contains the population

estimator with a probability of 1-α It is applied when variables are normal distributed!

14

15

Confidence Intervals95% Confidence Interval for m:

16

Definition 1:

You can be 95% sure that the true mean (μ) will fall within the upper and lower bounds.

Definition 2:

95% of the intervals constructed using sample means ( x ) will contain the true mean ( μ ).

Confidence Intervals• It is calculated taking into consideration:

• The sample or population size• The type of investigated variable (qualitative OR quantitative)

Formula of calculus comprised two parts:I. One estimator of the quality of sample based on which the

population estimator was computed (standard error)• Standard error: is a measure of how good our best guess is.• Standard error: the bigger the sample, the smaller the standard

error.• Standard error: i always smaller than the standard deviation

II. Degree of confidence (Zα score)

It is possible to be calculated for any estimator but is most frequent used for mean

Confidence Intervals for Means• Standard error of mean is equal to standard deviation

divided by square root of number of observations:• If standard deviation is high, the chance of error in estimator is high• If sample size is large, the chance of error in estimator is small.

n

sZX,

n

sZX

Confidence Intervals for Means

• Lower confidence limit is smaller than the mean• Upper confidence limit is higher than the mean• For the 95% confidence intervals: Z5% = 1.96

• For the 99% confidence intervals : Z1% = 2.58

n

sZX,

n

sZX

Confidence Interval for a Mean When the Population Standard Deviation is Unknown

When the population is normal or if the sample size is large, then the sampling distribution will also be normal, but the use of s to replace s is not that accurate.

The smaller the sample size the worse the approximation will be. Hence we can expect that some adjustment will be made based on the sample size. The adjustment we make is that we do not use the normal curve for this approximation.

Instead, we use the Student t distribution that is based on the sample size. We proceed as before, but we change the table that we use. This distribution looks like the normal distribution, but as the sample size decreases it spreads out. For large n it nearly matches the normal curve. We say that the distribution has n - 1 degrees of freedom.

Confidence Intervals

CI for μ if n>120:

90% CI : x ± 1.65 ()

95% CI : x ± 1.96 ()

99% CI : x ± 2.58 ()

21

CI for μ if n<120:

90% CI : x ± t,n-1 ()

95% CI : x ± t,n-1 ()

99% CI : x ± t,n-1 ()

where t,n-1 distribution is read from table "t" at the and n-1 degrees of freedom The EXCEL function T.INV.2T ((probability grade_libertate)

GL 0,05 0,01 0,001 GL 0,05 0,01 0,001 GL 0,05 0,01 0,0012 4,3027 9,925 31,599 46 2,0129 2,687 3,515 89 1,987 2,632 3,4033 3,1824 5,841 12,924 47 2,0117 2,6846 3,5099 90 1,987 2,632 3,4024 2,7764 4,604 8,6103 48 2,0106 2,6822 3,5051 91 1,986 2,631 3,4015 2,5706 4,032 6,8688 49 2,0096 2,68 3,5004 92 1,986 2,63 3,3996 2,4469 3,707 5,9588 50 2,0086 2,6778 3,496 93 1,986 2,63 3,3987 2,3646 3,5 5,4079 51 2,0076 2,6757 3,4918 94 1,986 2,629 3,3978 2,306 3,355 5,0413 52 2,0066 2,6737 3,4877 95 1,985 2,629 3,3969 2,2622 3,25 4,7809 53 2,0057 2,6718 3,4838 96 1,985 2,628 3,39510 2,2281 3,169 4,5869 54 2,0049 2,67 3,48 97 1,985 2,628 3,39411 2,201 3,106 4,437 55 2,004 2,6682 3,4764 98 1,985 2,627 3,39312 2,1788 3,055 4,3178 56 2,0032 2,6665 3,4729 99 1,984 2,626 3,39213 2,1604 3,012 4,2208 57 2,0025 2,6649 3,4696 100 1,984 2,626 3,39114 2,1448 2,977 4,1405 58 2,0017 2,6633 3,4663 101 1,984 2,625 3,3915 2,1314 2,947 4,0728 59 2,001 2,6618 3,4632 102 1,984 2,625 3,38916 2,1199 2,921 4,015 60 2,0003 2,6603 3,4602 103 1,983 2,624 3,38817 2,1098 2,898 3,9651 61 1,9996 2,6589 3,4573 104 1,983 2,624 3,38718 2,1009 2,878 3,9216 62 1,999 2,6575 3,4545 105 1,983 2,624 3,38619 2,093 2,861 3,8834 63 1,9983 2,6561 3,4518 106 1,983 2,623 3,38520 2,086 2,845 3,8495 64 1,9977 2,6549 3,4491 107 1,982 2,623 3,38421 2,0796 2,831 3,8193 65 1,9971 2,6536 3,4466 108 1,982 2,622 3,38322 2,0739 2,819 3,7921 66 1,9966 2,6524 3,4441 109 1,982 2,622 3,38223 2,0687 2,807 3,7676 67 1,996 2,6512 3,4417 110 1,982 2,621 3,38124 2,0639 2,797 3,7454 68 1,9955 2,6501 3,4394 102 1,984 2,625 3,38925 2,0595 2,787 3,7251 69 1,9949 2,649 3,4372 103 1,983 2,624 3,38826 2,0555 2,779 3,7066 70 1,9944 2,6479 3,435 104 1,983 2,624 3,38727 2,0518 2,771 3,6896 71 1,9939 2,6469 3,4329 105 1,983 2,624 3,38628 2,0484 2,763 3,6739 72 1,9935 2,6459 3,4308 106 1,983 2,623 3,38529 2,0452 2,756 3,6594 73 1,993 2,6449 3,4289 107 1,982 2,623 3,38430 2,0423 2,75 3,646 74 1,9925 2,6439 3,4269 108 1,982 2,622 3,38331 2,0395 2,744 3,6335 75 1,9921 2,643 3,425 109 1,982 2,622 3,38232 2,0369 2,739 3,6218 76 1,9917 2,6421 3,4232 110 1,982 2,621 3,38133 2,0345 2,733 3,6109 77 1,9913 2,6412 3,4214 111 1,982 2,621 3,3834 2,0322 2,728 3,6007 78 1,9908 2,6403 3,4197 112 1,981 2,62 3,3835 2,0301 2,724 3,5911 79 1,9905 2,6395 3,418 113 1,981 2,62 3,37936 2,0281 2,72 3,5821 80 1,9901 2,6387 3,4163 114 1,981 2,62 3,37837 2,0262 2,715 3,5737 81 1,9897 2,6379 3,4147 115 1,981 2,619 3,37738 2,0244 2,712 3,5657 82 1,989 2,637 3,413 116 1,981 2,619 3,37639 2,0227 2,708 3,5581 83 1,989 2,636 3,412 117 1,98 2,619 3,37640 2,0211 2,705 3,551 84 1,989 2,636 3,41 118 1,98 2,618 3,37541 2,0195 2,701 3,5442 85 1,988 2,635 3,409 119 1,98 2,618 3,37443 2,0167 2,6951 3,5316 86 1,988 2,634 3,407 120 1,98 2,617 3,37444 2,0154 2,6923 3,5258 87 1,988 2,634 3,406 >120 1,96 2,576 3,29145 2,0141 2,6896 3,5203 88 1,987 2,633 3,405

Table t

Confidence Intervals for Means• The mean of blood sugar

concentration of a sample of 121 patients is equal to 105 and the variance is equal to 36.

• Which is the confidence levels of blood sugar concentration of the population from which the sample was extracted?

• Use a significance level of 5% (Z = 1.96). It is considered that the blood sugar concentration is normal distributed.

• n = 121• s2 = 36• s = 6• m = 105

• [105-1.07; 105+1.07]• [103.93; 106.07]• [104;106]

105X

1216

96.1105;1216

96.1105

200

100

TAS (mmHg)

Treatament A

Treatament B

Treatament C

Comparing Means by using Confidence Levels

𝑥CI

Problem:

A fellow wanted to determine the average serum creatinine level among healthy elderly adult male subjects from Timisoara city. From the literature she could not find any information on on μ or s of serum creatinine among local healthy elderly males.

She measured 15 health elderly male volunteers from Timisoara city and the sample mean sCr is 0.94 mg/dL with a sample standard deviation of 0.15 mg/dL.

What should be the 95% CI for μ ?

25

26

Confidence Intervals• Solution:

27

Example

• Suppose a student measuring the boiling temperature of a certain liquid observes the readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different samples of the liquid.

• He calculates the sample mean to be 101.82. • If he knows that the standard deviation for this procedure is 1.2

degrees, what is the confidence interval for the population mean at a 95% confidence level?

• In other words, the student wishes to estimate the true mean boiling temperature of the liquid using the results of his measurements. If the measurements follow a normal distribution, then the sample mean will have the distribution N(,/n). Since the sample size is 6, the standard deviation of the sample mean is equal to 1.2/sqrt(6) = 0.49.

28

Remember!1. Correct estimation of a statistical parameter is done with

confidence intervals (CI).

2. Confidence intervals depend by the sample, size and standard error.

3. The confidence intervals is larger for:

• High value of standard error• Small sample sizes

ESTIMATION OF STATISTICAL PARAMETERS Estimation theory is a branch of statistics based on measured/empirical data that has a random component. An estimator.

Documents

population parameter

population proportions

population proportionp

confidence interval

statistical termsexplain

sample meanxis

sample means

sample proportionpis