Top Banner
Last lecture summary
32

Last lecture summary. 2345678 0 344434243533 Population 2015.

Jan 19, 2016

Download

Documents

Annabel Mosley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Last lecture summary. 2345678 0 344434243533 Population 2015.

Last lecture summary

Page 2: Last lecture summary. 2345678 0 344434243533 Population 2015.
Page 3: Last lecture summary. 2345678 0 344434243533 Population 2015.

2

3 4

56 7

8

0

34

4

4

3

4

2

4

3

5

3

3

Page 4: Last lecture summary. 2345678 0 344434243533 Population 2015.

Population 2015

Page 5: Last lecture summary. 2345678 0 344434243533 Population 2015.

Population 2014

Page 6: Last lecture summary. 2345678 0 344434243533 Population 2015.

2

3 4

56 7

8

0

34

4

4

3

4

2

4

3

5

3

3

průměr = 3.3

průměr = 3.0

Page 7: Last lecture summary. 2345678 0 344434243533 Population 2015.

Data 2015

Population:

4,3,3,5,0,4,4,4,3,4,2,6,8,2,4,3,5,7,3,3

25 samples (n=3) and their averages

3.3,5.3,3.6,4.3,2.3,3.0,3.6,3.0,5.3,5.6,3.3,4.3,3.3,4.0,5.6,4.3,4.3,4.6,6.3,3.3,4.0,3.3,4.6,3.0,4.3

http://blue-lover.blog.cz/1106/lentilky

Page 8: Last lecture summary. 2345678 0 344434243533 Population 2015.

2015, n = 3, number of samples = 25

Page 9: Last lecture summary. 2345678 0 344434243533 Population 2015.

2015, n = 3, number of samples = 50

Page 10: Last lecture summary. 2345678 0 344434243533 Population 2015.

2015, n = 3, number of samples = 300

Page 11: Last lecture summary. 2345678 0 344434243533 Population 2015.

2015, n = 3, all possible samples (1540)

Page 12: Last lecture summary. 2345678 0 344434243533 Population 2015.

2015, n = 5, all possible samples (42 504)

Page 13: Last lecture summary. 2345678 0 344434243533 Population 2015.

2015, n = 10, all possible samples (20 030 010)

Page 14: Last lecture summary. 2345678 0 344434243533 Population 2015.

Central limit theorem• Distribution of sample means is normal.

• The distribution of means will increasingly approximate a normal distribution as the sample size increases.

• Its mean is equal to population mean.

• Its standard deviation is equal to population standard deviation divided by the square root of .• is called standard error.

𝑆𝐸=𝜎 𝑥=𝜎√𝑛

𝑀 ¿𝜇𝑥=𝜇

Page 15: Last lecture summary. 2345678 0 344434243533 Population 2015.

ESTIMATION, CONFIDENCE INTERVALS

Page 16: Last lecture summary. 2345678 0 344434243533 Population 2015.

Statistical inference

If we can’t conduct a census, we collect data from the sample of a population.

Goal: make conclusions about that population

Page 17: Last lecture summary. 2345678 0 344434243533 Population 2015.

Demonstration• You sample 36 apples from your farm’s harvest of over

200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation).

• What is the probability that the mean weight of all 200 000 apples is within 100 and 124 grams?

Page 18: Last lecture summary. 2345678 0 344434243533 Population 2015.

What is the question?• We would like to know the probability that the population

mean is within 12 of the sample mean.

• But this is the same thing as

• But this is the same thing as

• So, if I am able to say how many standard deviations away from I am, I can use the Z-table to figure out the probability.

Page 19: Last lecture summary. 2345678 0 344434243533 Population 2015.

Slight complication• There is one caveat, can you see it?• We don’t know the standard deviation of a sampling

distribution (standard error). We only know it equals to , but is uknown.

• What we’re going to do is to estimate . Best thing we can do is to use sample standard deviation .

• . This is our best estimate of a standard error.• Now you finish the example. What is the probability that

population mean lies within 12 of the sample if the SE equals to 6.67?• 92.82%

Page 20: Last lecture summary. 2345678 0 344434243533 Population 2015.

This is neat!• You sample 36 apples from your farm’s harvest of over

200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation). What is the probability that the population mean weight of all 200 000 apples is within 100 and 124 grams?

• We started with very little information (we know just the sample statistics), but we can infere that

with the probability of 92.82% a population mean lies within 12 of our sample mean!

Page 21: Last lecture summary. 2345678 0 344434243533 Population 2015.

Point vs. interval estimate• You sample 36 apples from your farm’s harvest of over

200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation).

• Goal: estimate population mean

1. Population mean is estimated as sample mean. i.e. we say population mean equals to 112 g. This is called a point estimate (bodový odhad).

2. However, we can do better. We can estimate that our true population mean will lie with the 95% confidence within an interval of (interval estimate).

𝑥±1.96×𝑠

√𝑛

Page 22: Last lecture summary. 2345678 0 344434243533 Population 2015.

Confidence interval• This type of result is called a confidence interval

(interval spolehlivosti, konfidenční interval).

• The number of stadandard errors you want to add/subtract depends on the confidence level (e.g. 95%) (hladina spolehlivosti).

𝑥±𝑍×𝑠

√𝑛margin of error

možná odchylka

critical valuekritická hodnota

Page 23: Last lecture summary. 2345678 0 344434243533 Population 2015.

Confidence level• The desired level of confidence is set by the researcher,

not determined by data.• If you want to be 95% confident with your results, you add/subtract

1.96 standard errors (empirical rule says about 2 standard errors).• 95% interval spolehlivosti

Confidence level Z-value

80 1.28

90 1.64

95 1.96

98 2.33

99 2.58

Page 24: Last lecture summary. 2345678 0 344434243533 Population 2015.

80% 90%

95% 99%

1.28

1.96

1.64

2.58

Page 25: Last lecture summary. 2345678 0 344434243533 Population 2015.

Small sample size confidence intervals

• 7 patient’s blood pressure have been measured after having been given a new drug for 3 months. They had blood pressure increases of 1.5, 2.9, 0.9, 3.9, 3.2, 2.1 and 1.9. Construct a 95% confidence interval for the true expected blood pressure increase for all patients in a population.

Page 26: Last lecture summary. 2345678 0 344434243533 Population 2015.

• We will assume that our population distribution is normal, with and .

• We don’t know anything about this distribution but we have a sample. Let’s figure out everything you can figure out about this sample: • ,

• We estimate true population standard deviation with sample standard deviation

• However, we are estimating our standard deviation with of only seven! This is probably goint to be not so good estimate.

• In general, if this is considered a bad estimate.

Page 27: Last lecture summary. 2345678 0 344434243533 Population 2015.

William Sealy Gosset aka Student• 1876-1937• an employee of Guinness

brewery• 1908 papers addressed the

brewer's concern with small samples• "The probable error of a mean".

Biometrika 6 (1): 1–25. March 1908.• Probable error of a correlation

coefficient". Biometrika 6 (2/3): 302–310. September 1908.

Page 28: Last lecture summary. 2345678 0 344434243533 Population 2015.

Student t-distribution• Instead of assuming a sampling distribution is normal we

will use a Student t-distribution.• It gives a better estimate of your confidence interval if you

have a small sample size.• It looks very similar to a normal distribution, but it has

fatter tails to indicate the higher frequency of outliers which come with a small data set.

Page 29: Last lecture summary. 2345678 0 344434243533 Population 2015.

Student t-distribution

Page 30: Last lecture summary. 2345678 0 344434243533 Population 2015.

Student t-distribution

df – degree of freedom (stupeň volnosti)

Page 31: Last lecture summary. 2345678 0 344434243533 Population 2015.

Back to our case

• Because sample size is small, the sampling distribution of the mean won’t be normal. Instead, it will have a Student t-distribution with .

• Construct a 95% confidence interval, please

for𝑛<30 :𝑥 ±𝑡𝑛−1×𝑠

√𝑛

Page 32: Last lecture summary. 2345678 0 344434243533 Population 2015.

• Just to summarize, the margin of error depends on1. the confidence level (common is 95%)

2. the sample size • as the sample size increases, the margin of error decreases• For the bigger sample we have a smaller interval for which we’re

pretty sure the true population lies.

3. the variability of the data (i.e. on σ)• more variability increases the margin of error

• Margin of error does not measure anything else than chance variation.

• It doesn’t measure any bias or errors that happen during the proces.

• It does not tell anything about the correctness of your data!!!

neco×𝑠

√𝑛