SUMMARY
Dec 18, 2015
SUMMARY
• Central limit theorem
Statistical inference
If we can’t conduct a census, we collect data from the sample of a population.
Goal: make conclusions about that population
Confidence interval
for𝑛≥30 :𝑥±𝑍×𝑠
√𝑛margin of error
možná odchylka
critical valuekritická hodnota
for𝑛<30 :𝑥 ±𝑡𝑛−1×𝑠
√𝑛
• The margin of error depends on1. neco – the confidence level (common is 95%)
2. s or σ – the variability of data
3. the sample size
• Margin of error does not measure anything else than chance variation.
• It doesn’t measure any bias or errors that happen during the process.
• It does not tell anything about the correctness of your data!!!
margin of error=neco×𝑠
√𝑛
HYPOTHESIS TESTING
Aim of hypothesis testing• decision making
Engagement self-assessment• Hopefully, you like this course so far.• How to measure this?• At the scale between 1 and 10, self-report how engaged
you think you’re during the lecture (1 is the lowest value, 10 is the highest value).
Engagement distribution
1 2 3 4 5 6 7 8 9 10
Engagement distribution
1 2 3 4 5 6 7 8 9 10
1 10
Engagement distribution
7.8
1 10
Engagement distribution
7.8
Hypothesis testing song
1 7.8
8.2
Did the song help?
1 7.8
Mean engagement of randomly chosen 30 students from the population of 100 students can lie anywhere on the blue curve. What is the probability of getting the value of at least 8.2? Use Z-tables.
, ,
8.2
corresponding probability is 0.0022
Situation on the battlefield• From the no-song population, we have 0.22% chance that
randomly sampled data (n=30) will have the mean engagement of 8.2 or more.
• 30 students that were subjected to the song testing showed mean engagement of 8.2.
• If there was no difference between no-song and song populations, it is rather unlikely (0.22% is really not that much) that we have chosen 30 students with the mean engagement of 8.2.
• Conclusion: Because of such a low probability, we interpret 8.2 as a significat increase over 7.8 caused by undeniable pedagogical qualities of the 'Hypothesis testing song'.
Hypothesis testing• So, where are the hypotheses?• Null hypothesis H0
• The song did not cause any observable effect.• H0 generally – there is no significant difference between current
population parameters and new population parameters after some sort of intervention ().
• Alternative (research) hypothesis H1, Ha
• The song improved students' engagement• H1 generally – there is a significant difference between current
population parameters and new population parameters after an intervention.
Null hypothesis
• H0 states that nothing happened, there is no change, no difference.
• Does the song improve students' engagement? H0: students' engagement is the same regardless the song.
• Does diet coke tastes differently than whole coke? H0: there is no difference in the taste between diet and whole coke.
• Does this drug increases the blood pressure? H0: this drug causes no increase in the blood pressure.
• Is this a fair coin? H0: the coin is fair.
Alternative hypothesis• The alternative hypothesis states the opposite to the null
and is usually the hypothesis you are trying to prove.
• Does the song improve students' engagement? Ha: song improved students' engagement.
• Does diet coke tastes differently than whole coke? Ha: diet coke tastes much worse/better than whole coke.
• Does this drug increases the blood pressure? Ha: this drug leads to increase in the blood pressure.
• Is this a fair coin? Ha: the coin is unfair.
Hypothesis testing• Formulate the null hypothesis. There is no statistical
testing without the null.• You assume that H0 is true.
• I.e., you assume that the hypothesis song did not influence students' engagement.
• Then, you must find an evidence for rejecting or not rejecting the null.
• Collect the data.• Population data, no song: μ = 7.8, = 0.76𝜎• Sample data, song fired: n = 30,
Hypothesis testing• Then, you will calculate the probability of observing your
sample results (or more extreme) given that the null hypothesis is true.
• Null hypothesis which we consider true: teaching with and without song produces the same effect.• Higher value in the mean of the song sample () compared to the
no-song mean (μ = 7.8) is caused only by the variability in random sampling.
Hypothesis testing• If there is really no difference between the two teaching
methods (with and without song) in the population (i.e., given that the null hypothesis is true), how likely would it be to see a difference in the mean engagement between the two teaching methods as large as (or larger than) that which has been observed in our sample?
• Given the null is true (both populations are the same), there is only 0.22% chance that the random sampling would lead to the mean engagement of 8.2.
• The value of 0.0022 is referred to as p-value.
Levels of likelihood• Conclusion: 8.2 is a statistically significant increase over
7.8.• We want a crisp decision, i.e. which probability is still
considered 'unlikely' and which is considered as 'likely'?
Levels of likelihood - levels
• If the probability of getting sample mean is less than 0.05 – 0.01 – 0.001 then it is usually considered unlikely.
• These are called the levels. Or significance levels (hladiny významnosti).
• level is our criteria for deciding if something is likely or unlikely.
0.05 (5%)0.01 (1%)0.001 (0.1%)
Lingo• Probability by which we decide if the result is likely or
unlikely is called p-value.• And we compare the p-value to the α level: if the p-value
is less than the α level then such a result is considered to be unlikely.
• Or, alternatively, we can calculate Z-score of the result and compare it with the Z-score corresponding to the α level (so-called Z-critical value).
Z-critical value
If the Z-score of the sample mean is greater than the Z-critical value we have an evidence that this mean is different from the regular population (the population that had not watched the musical lesson).
If the probability of obtaining a particular sample mean is less than alpha level then it will fall in this tail which is called the critical region.
Z*Z-critical value
Critical regions• What is the Z-critical value for ?
• Using Z-table you find Z-value for 0.95 probability. Which is 1.65.
• What is the Z-critical value for ?• 2.33
• What is the Z-critical value for ?• 3.08
• We take mean from the sample of size . • Then we calculate its Z-score
• And we get the Z-score of .• We say that this is significant at .
• 1.82 is somewhere in the red region in the previous picture. It is less than 0.05, but not less than 0.01.
• It means that a probability of obtaining this sample mean is less than 5%, but is not less than 1%.
• And remember, 0.05 is the alpha level.
Test statistic
Significance quiz level Z-critical value
0.05 1.65
0.01 2.32
0.001 3.08
Z-score Significant at3.14 p < 2.07 p < 2.57 p < 14.31 p <
Significance quiz level Z-critical value
0.05 1.65
0.01 2.32
0.001 3.08
Z-score Significant at3.14 p < 0.0012.07 p < 0.052.57 p < 0.0114.31 p < 0.001
Quiz
• Focus on • Which of the following are true, if the null hypothesis were
true?1. If the probability of getting a particular sample mean is less
than , it is unlikely to occur.
2. If a sample mean has a Z-score greater than Z*, it is “unlikely” to occur.
3. If the probability of getting a particular sample mean is “unlikely”, the sample mean is in he orange region.
4. The alpha level corresponds to the orange region.
Sampling distribution
Z*
Another engagement score• After the sample of 30 students heard the „Hypotheses
testing song“, their mean engagement score is 7.13.• Just to remind you. The population parameters are
, • Z-score of this sample mean:
• Z-score is -4.83, what does it mean?
Two-tailed test (oboustranný test)
Z=??
𝛼=0.05 mean engagement score of 7.13 is significant at p < 0.05
One-tailed
level Z-critical value
0.05 ±1.65
0.01 ±2.32
0.001 ±3.08
Two-tailed
level Z-critical value
0.05 ±1.96
0.01 ±2.57
0.001 ±3.27
critical regionprobability =
critical regionprobability =
𝛼2
𝛼2
In the critical region, we (most likely) did not get sample mean by chance.
The critical region can also be on the left.
One-tailed and two-tailedone-tailed (directional) test
two-tailed (non-directional) test
One-tailed or two-tailed• In general, we use two-tailed tests.• One exception to this general rule is when we’re
comparing a new treatment with an established treatment. In such cases we often only care if the new treatment is better than the old one. And we would use a one-tailed directional test.
Alternative hypothesis
two-tailed test
one-tailed test
Quiz – reject the null• What does it mean to reject the null using two-tailed test?
• Our sample mean falls within/outside the critical region.• The Z-score of our sample mean is less than/greater than the Z-
critical value.• The probability of obtaining the sample mean is less than/greater
than the alpha level.
Four steps of hypothesis testing
1. Formulate the null and the alternative (this includes one- or two-directional test) hypothesis.
2. Select the significance level α – a criterion upon which we decide that the claim being tested is true or not.
--- COLLECT DATA ---
3. Compute the p-value. The p-value is the probability that the data would be at least as extreme as those observed, if the null hypothesis were true.
4. Compare the p-value to the α-level. If p ≤ α, the observed effect is statistically significant, the null is rejected, and the alternative hypothesis is valid.