Sampling Error. When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be.

Sampling Error

When we take a sample, our results will not exactly equal the

correct results for the whole population. That is, our results will be

subject to errors.

Sampling error: A sample is a subset of a population. Because of

this property of samples, results obtained from them cannot reflect

the full range of variation found in the larger group (population).

This type of error, arising from the sampling process itself, is called

sampling error which is a form of random error.

Sampling error can be minimized by increasing the size of the

sample. When n = N sampling error = 0⇒

Non-sampling error (bias)

It is a type of systematic error in the design or conduct of

a sampling procedure which results in distortion of the

sample, so that it is no longer representative of the

reference population.

We can eliminate or reduce the non-sampling error (bias)

by careful design of the sampling procedure and not by

increasing the sample size.

Sources of non sampling errors:

Accessibility bias, volunteer bias, etc.

The best known source of bias is non response. It is the failure to obtain

information on some of the subjects included in the sample to be

studied.

Non response results in significant bias when the following two

conditions are both fulfilled.

1. When non-respondents constitute a significant proportion of the

sample (about 15% or more)

2. When non-respondents differ significantly from respondents.

There are several ways to deal with this problem and reduce

the possibility of bias:

1. Data collection tools (questionnaire) have to be pre-tested.

2. If non response is due to absence of the subjects, repeated

attempts should be considered to contact study subjects who

were absent at the time of the initial visit.

3. To include additional people in the sample, so that non

respondents who were absent during data collection can be

replaced (make sure that their absence is not related to the

topic being studied).

ESTIMATION

The sample from a population is used to provide the estimates

of the population parameters.

A parameter is a numerical descriptive measure of a population

( μ is an example of a parameter).

A statistic is a numerical descriptive measure of a sample ( X is

an example of a statistic).

To each sample statistic there corresponds a population

parameter.

We use X , S2, S , p, etc. to estimate μ, σ2, σ, P (or π), etc

Sample statistic Corresponding population parameter

X (sample mean) μ (population mean)

S2 ( sample variance) σ2 ( population variance)

S (sample Standard deviation) σ(population standard deviation)

p ( sample proportion) P or π (Population proportion)

Sampling Distribution of Means

Sampling Distribution is a frequency distribution and it

has its own mean and standard deviation.

Steps:

1. Obtain a sample of n observations selected completely at

random from a large population . Determine their mean

and then replace the observations in the population.

2. Obtain another random sample of n observations from

the population, determine their mean and again replace

the observations.

1. Repeat the sampling procedure indefinitely, calculating the

mean of the random sample of n each time and subsequently

replacing the observations in the population.

2. The result is a series of means of samples of size n. If each

mean in the series is now treated as an individual observation

and arrayed in a frequency distribution, one determines the

sampling distributionof means of samples of size n.

Because the scores ( X s) in the sampling distribution of

means are themselves means (of individual samples), we

shall use the notation σ X for the standard deviation of the

distribution.

Standard error of mean (SEM): The standard deviation

of the sampling distribution of means is called the

standard error of the mean.

Formula: σ x = √ Ʃ ( x i - μ)2 / N

Properties of sampling distribution

1. The mean of the sampling distribution of means is the

same as the population mean, μ .

2. The SD of the sampling distribution of means is σ / √n

3. The shape of the sampling distribution of means is

approximately a normal curve, regardless of the shape of

the population distribution and provided n is large

enough (Central limit theorem).

Confidence interval

Interval Estimation (large samples)

A point estimate does not give any indication on how far

away the parameter lies. A more useful method of

estimation is to compute an interval which has a high

probability of containing the parameter.

An interval estimate is a statement that a population

parameter has a value lying between two specified limits.

Confidence interval

Confidence interval provides an indication of how close the sample

estimate is likely to be to the true population value.

Gives an estimated range of values which is likely to include the true

value of the unknown population parameter with a certain confidence

(probability) and the estimated range being calculated from a given set

of sample data.

Consider the standard normal distribution and the statement Pr (-1.96≤

Z ≤1.96) = . 95. it means that 95% of the standard normal curve lies

between + 1.96 and –1.96.

Formula:

Pr( X - 1.96(σ /√n) ≤ μ ≤ X + 1.96(σ /√n) ) = .95

The range X -1.96(σ /√n) to X + 1.96(σ /√n) ) is called the

95% confidence interval;

X -1.96(σ /√n) is the lower confidence limit while X +

1.96(σ /√n) is the upper confidence limit

Few things to remember

At 90%, the corresponding Z score to be used in the

formula is 1.64

At 95%, the corresponding Z score to be used in the

formula is 1.96

99%, the corresponding Z score is 2.58

Confidence Interval for Formula

Population Mean

Population Proportion

Difference in Population Means

Difference in Population Proportions

npq

zp value

ns

zx value

2

22

1

21

value21 ns

ns

zxx

2

22

1

11value21 n

qpnqp

zpp

problem 1 : Suppose x= 50, SD = 10 and N=100. what is the 99%

confidence interval?

CI lower, X –2.58 (σ /√n) = 47

CI upper, X + 2.58 (σ /√n) = 53

Example 1

The mean fasting blood sugar of a group of 70 individuals

was found to be 115 with a SD of 12.56. Find the 95% and

99% CI’s for the population mean.

Solution:

SE (mean) = (12.56/√70)

= 1.51

95% CI = 115 ± 1.96 (1.51)

= (112.05, 117.95)

Interpretation of 95% CI:

Probabilistic Interpretation:

In repeated sampling, approximately 95 percent of the intervals constructed will include the population mean.

Practical Interpretation:

One can say with 95 percent confidence that the population mean fasting blood sugar is between 112.05 and 117.95

Example 2 In a study, it was found that 129 out of 150 carcinoma of lung patients were smokers. Find the 95% & 99% CI’s for the proportion of smokers among lung cancer patients.

Solution:

SE (Proportion) = √ (0.86)(0.14)/150)

= 0.028

95% CI = 0.86 ± 1.96 (0.028)

= (0.8, 0.91)

Similarly, 99% CI is (0.932, 0.788)

Example 3

In a study to assess the effect of anabolic steroids in weight gain, the following data was observed

Find the 95% CI for the difference in the mean weight gain?

Solution:

SE (Diff. in mean) = √(21.22/50)+(92/50))

= 3.257

CI= (3.7 – 3.1) + (1.96 x 3.257)

95% CI = (-5.78, 6.98)

Group n Mean weight gain SD of Weight

Study 50 3.7 21.2

Control

50 3.1 9

Example 4 In a study to assess the effect of BCG, the following data was observed

Find the 99% CI for the difference in proportion?

Solution:SE (Diff. in Proportion) = √((0.0088)(0.9912)/2500+(0.03)(0.97)/3000) = 0.363

99% CI = ( -0.03 , 0.019)Actual difference between proportion = 3.00-0.88 = 2.12

BCG n TB developed Disease rate

Vaccinated 2500 22 0.88%

Unvaccinated 3000 90 3.00%

Factors affecting the width of confidence interval Variation in the data (standard deviation): more the SD,

more the confidence interval

Sample size : as N increases, confidence interval decreases

Level of Confidence: more the confidence level (90%, 95% and 99%) , more the Confidence interval

THANK YOU

Sampling Error. When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be.

Documents