SUMMARY. Central limit theorem Statistical inference If we can’t conduct a census, we collect data from the sample of a population. Goal: make conclusions.

SUMMARY

• Central limit theorem

Statistical inference

If we can’t conduct a census, we collect data from the sample of a population.

Goal: make conclusions about that population

Confidence interval

for𝑛≥30 :𝑥±𝑍×𝑠

√𝑛margin of error

možná odchylka

critical valuekritická hodnota

for𝑛<30 :𝑥 ±𝑡𝑛−1×𝑠

√𝑛

• The margin of error depends on1. neco – the confidence level (common is 95%)

2. s or σ – the variability of data

3. the sample size

• Margin of error does not measure anything else than chance variation.

• It doesn’t measure any bias or errors that happen during the process.

• It does not tell anything about the correctness of your data!!!

margin of error=neco×𝑠

√𝑛

HYPOTHESIS TESTING

Aim of hypothesis testing• decision making

Engagement self-assessment• Hopefully, you like this course so far.• How to measure this?• At the scale between 1 and 10, self-report how engaged

you think you’re during the lecture (1 is the lowest value, 10 is the highest value).

Engagement distribution

1 2 3 4 5 6 7 8 9 10


1 2 3 4 5 6 7 8 9 10

1 10


7.8

1 10


7.8

Hypothesis testing song

1 7.8

8.2

Did the song help?

1 7.8

Mean engagement of randomly chosen 30 students from the population of 100 students can lie anywhere on the blue curve. What is the probability of getting the value of at least 8.2? Use Z-tables.

, ,

8.2

corresponding probability is 0.0022

Situation on the battlefield• From the no-song population, we have 0.22% chance that

randomly sampled data (n=30) will have the mean engagement of 8.2 or more.

• 30 students that were subjected to the song testing showed mean engagement of 8.2.

• If there was no difference between no-song and song populations, it is rather unlikely (0.22% is really not that much) that we have chosen 30 students with the mean engagement of 8.2.

• Conclusion: Because of such a low probability, we interpret 8.2 as a significat increase over 7.8 caused by undeniable pedagogical qualities of the 'Hypothesis testing song'.

Hypothesis testing• So, where are the hypotheses?• Null hypothesis H0

• The song did not cause any observable effect.• H0 generally – there is no significant difference between current

population parameters and new population parameters after some sort of intervention ().

• Alternative (research) hypothesis H1, Ha

• The song improved students' engagement• H1 generally – there is a significant difference between current

population parameters and new population parameters after an intervention.

Null hypothesis

• H0 states that nothing happened, there is no change, no difference.

• Does the song improve students' engagement? H0: students' engagement is the same regardless the song.

• Does diet coke tastes differently than whole coke? H0: there is no difference in the taste between diet and whole coke.

• Does this drug increases the blood pressure? H0: this drug causes no increase in the blood pressure.

• Is this a fair coin? H0: the coin is fair.

Alternative hypothesis• The alternative hypothesis states the opposite to the null

and is usually the hypothesis you are trying to prove.

• Does the song improve students' engagement? Ha: song improved students' engagement.

• Does diet coke tastes differently than whole coke? Ha: diet coke tastes much worse/better than whole coke.

• Does this drug increases the blood pressure? Ha: this drug leads to increase in the blood pressure.

• Is this a fair coin? Ha: the coin is unfair.

Hypothesis testing• Formulate the null hypothesis. There is no statistical

testing without the null.• You assume that H0 is true.

• I.e., you assume that the hypothesis song did not influence students' engagement.

• Then, you must find an evidence for rejecting or not rejecting the null.

• Collect the data.• Population data, no song: μ = 7.8, = 0.76𝜎• Sample data, song fired: n = 30,

Hypothesis testing• Then, you will calculate the probability of observing your

sample results (or more extreme) given that the null hypothesis is true.

• Null hypothesis which we consider true: teaching with and without song produces the same effect.• Higher value in the mean of the song sample () compared to the

no-song mean (μ = 7.8) is caused only by the variability in random sampling.

Hypothesis testing• If there is really no difference between the two teaching

methods (with and without song) in the population (i.e., given that the null hypothesis is true), how likely would it be to see a difference in the mean engagement between the two teaching methods as large as (or larger than) that which has been observed in our sample?

• Given the null is true (both populations are the same), there is only 0.22% chance that the random sampling would lead to the mean engagement of 8.2.

• The value of 0.0022 is referred to as p-value.

Levels of likelihood• Conclusion: 8.2 is a statistically significant increase over

7.8.• We want a crisp decision, i.e. which probability is still

considered 'unlikely' and which is considered as 'likely'?

Levels of likelihood - levels

• If the probability of getting sample mean is less than 0.05 – 0.01 – 0.001 then it is usually considered unlikely.

• These are called the levels. Or significance levels (hladiny významnosti).

• level is our criteria for deciding if something is likely or unlikely.

0.05 (5%)0.01 (1%)0.001 (0.1%)

Lingo• Probability by which we decide if the result is likely or

unlikely is called p-value.• And we compare the p-value to the α level: if the p-value

is less than the α level then such a result is considered to be unlikely.

• Or, alternatively, we can calculate Z-score of the result and compare it with the Z-score corresponding to the α level (so-called Z-critical value).

Z-critical value

If the Z-score of the sample mean is greater than the Z-critical value we have an evidence that this mean is different from the regular population (the population that had not watched the musical lesson).

If the probability of obtaining a particular sample mean is less than alpha level then it will fall in this tail which is called the critical region.

Z*Z-critical value

Critical regions• What is the Z-critical value for ?

• Using Z-table you find Z-value for 0.95 probability. Which is 1.65.

• What is the Z-critical value for ?• 2.33

• What is the Z-critical value for ?• 3.08

• We take mean from the sample of size . • Then we calculate its Z-score

• And we get the Z-score of .• We say that this is significant at .

• 1.82 is somewhere in the red region in the previous picture. It is less than 0.05, but not less than 0.01.

• It means that a probability of obtaining this sample mean is less than 5%, but is not less than 1%.

• And remember, 0.05 is the alpha level.

Test statistic

Significance quiz level Z-critical value

0.05 1.65

0.01 2.32

0.001 3.08

Z-score Significant at3.14 p < 2.07 p < 2.57 p < 14.31 p <

Significance quiz level Z-critical value

0.05 1.65

0.01 2.32

0.001 3.08

Z-score Significant at3.14 p < 0.0012.07 p < 0.052.57 p < 0.0114.31 p < 0.001

Quiz

• Focus on • Which of the following are true, if the null hypothesis were

true?1. If the probability of getting a particular sample mean is less

than , it is unlikely to occur.

2. If a sample mean has a Z-score greater than Z*, it is “unlikely” to occur.

3. If the probability of getting a particular sample mean is “unlikely”, the sample mean is in he orange region.

4. The alpha level corresponds to the orange region.

Sampling distribution

Z*

Another engagement score• After the sample of 30 students heard the „Hypotheses

testing song“, their mean engagement score is 7.13.• Just to remind you. The population parameters are

, • Z-score of this sample mean:

• Z-score is -4.83, what does it mean?

Two-tailed test (oboustranný test)

Z=??

𝛼=0.05 mean engagement score of 7.13 is significant at p < 0.05

One-tailed

level Z-critical value

0.05 ±1.65

0.01 ±2.32

0.001 ±3.08

Two-tailed

level Z-critical value

0.05 ±1.96

0.01 ±2.57

0.001 ±3.27

critical regionprobability =

critical regionprobability =

𝛼2

𝛼2

In the critical region, we (most likely) did not get sample mean by chance.

The critical region can also be on the left.

One-tailed and two-tailedone-tailed (directional) test

two-tailed (non-directional) test

One-tailed or two-tailed• In general, we use two-tailed tests.• One exception to this general rule is when we’re

comparing a new treatment with an established treatment. In such cases we often only care if the new treatment is better than the old one. And we would use a one-tailed directional test.

Alternative hypothesis

two-tailed test

one-tailed test

Quiz – reject the null• What does it mean to reject the null using two-tailed test?

• Our sample mean falls within/outside the critical region.• The Z-score of our sample mean is less than/greater than the Z-

critical value.• The probability of obtaining the sample mean is less than/greater

than the alpha level.

Four steps of hypothesis testing

1. Formulate the null and the alternative (this includes one- or two-directional test) hypothesis.

2. Select the significance level α – a criterion upon which we decide that the claim being tested is true or not.

--- COLLECT DATA ---

3. Compute the p-value. The p-value is the probability that the data would be at least as extreme as those observed, if the null hypothesis were true.

4. Compare the p-value to the α-level. If p ≤ α, the observed effect is statistically significant, the null is rejected, and the alternative hypothesis is valid.

SUMMARY. Central limit theorem Statistical inference If we can’t conduct a census, we collect data from the sample of a population. Goal: make conclusions.

Documents

hypothesis testing slide

summary slide

mean engagement

hypothesis testing song

song testing

engagement distribution

null hypothesis h

song populations