Top Banner
Lecture 10 - Basic Statistics and the Z-test C2 Foundation Mathematics (Standard Track) Dr Linda Stringer Dr Simon Craik [email protected] [email protected] INTO City/UEA London
39

C2 st lecture 10 basic statistics and the z test handout

Jan 27, 2015

Download

Education

fatima d

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: C2 st lecture 10   basic statistics and the z test handout

Lecture 10 - Basic Statistics and the Z-testC2 Foundation Mathematics (Standard Track)

Dr Linda Stringer Dr Simon [email protected] [email protected]

INTO City/UEA London

Page 2: C2 st lecture 10   basic statistics and the z test handout

Lecture 9 skills

I Calculate the following measures of location (AVERAGES)I ModeI MedianI Mean

I Calculate the following measures of dispersion(MEASURES OF SPREAD)

I Interquartile rangeI Standard deviationI Absolute deviation

I Perform a Z-testI Write the null and alternative hypothesisI Look up the critical valueI Calculate the test statisticI Make the decisionI Write a conclusion

Page 3: C2 st lecture 10   basic statistics and the z test handout

A data set

I A data set is usually a list of values (numbers) that hasbeen gathered in a survey.

I We will use the following data set to demonstrate the ideasin the first part of this lecture.

I A statistician wants to find how many pets the averageperson has. He interviews 10 people and gets the followingvalues

0 2 0 1 0 8 2 1 0 0

Page 4: C2 st lecture 10   basic statistics and the z test handout

Bar charts

A bar chart showing how many pets 10 people have:

0 2 0 1 0 8 2 1 0 0

1 2 3 4 5 6 7 8 9 10

0

2

4

6

8

Page 5: C2 st lecture 10   basic statistics and the z test handout

Pie charts

A pie chart of the data

0 2 0 1 0 8 2 1 0 0

0

50%

1

20%

2

20% 810%

Page 6: C2 st lecture 10   basic statistics and the z test handout

Histogram

A histogram of the data showing how many people have eachnumber of pets.

0 2 0 1 0 8 2 1 0 0

0 1 2 8

1

2

3

4

5

Page 7: C2 st lecture 10   basic statistics and the z test handout

Mode

I In a data set the mode is the most frequent value (the valuewhich occurs most often). The mode is a type of average.

I Example: Find the mode of the following data set

0 2 0 1 0 8 2 1 0 0

I In this data set the mode is 0.

Page 8: C2 st lecture 10   basic statistics and the z test handout

Mode

I There can be more than one mode in a data setI Example:

0 5 5 0 1 5 0 1 6

I There are two modes, they are 0 and 5.

Page 9: C2 st lecture 10   basic statistics and the z test handout

Median

I The median is the middle value in an ordered data set. It isanother type of average.

I First order the data, with values increasing from left to right.I Let n be the size of the data set (the number of values).I If n

2 is an integer (whole number) then the median is themidpoint of the n

2 th value and the n2 + 1th value (to find the

midpoint, add the values together and divide by 2).I If n

2 is not an integer (whole number) then round it up to thenearest integer (n+1

2 ). The median is the n+12 th value.

I OR find the median by crossing off pairs of values, startingfrom the ends of the data set.

Page 10: C2 st lecture 10   basic statistics and the z test handout

Example

I Order the data:

0 0 0 0 0 1 1 2 2 8

I n = 10 (the number of values)n2 = 10

2 = 5, which is an integerI The median is the midpoint of the 5th and 6th value =

0+12 = 0.5.

Page 11: C2 st lecture 10   basic statistics and the z test handout

Example 2

I Order the data:

0 0 0 1 1 5 5 5 6

I n = 9 (the number of values)I n

2 = 92 = 4.5, which is not an integer.

I Round up to 5. The median is the 5th value = 1.

Page 12: C2 st lecture 10   basic statistics and the z test handout

Interquartile range

I First order the data, with values increasing from left to right.I We want to find two values: the first quartile Q1 and the

third quartile Q3.I Let n be the size of the data set (the number of values).I To find Q1 we multiply n by 1

4 . If n4 is an integer (whole

number) then Q1 is the midpoint of the (n4 )th value and the

(n4 + 1)th value

I If n4 is not an integer then round it up to the nearest integer.

Q1 is the corresponding value.I To find Q3 we multiply n by 3

4 . If 3n4 is an integer then Q3 is

the midpoint of the (3n4 )th value and the (3n

4 + 1)th value

I If 3n4 is not an integer then round it up to the nearest

integer. Q3 is the corresponding value.I The interquartile range is Q3 −Q1.

Page 13: C2 st lecture 10   basic statistics and the z test handout

Example

I Order the data

0 0 0 0 0 1 1 2 2 8

I n4 = 10

4 = 2.5, which is not an integer.I Round up to 3.I Q1 is the third value, so Q1 = 0.I 3n

4 = 3×104 = 7.5, which is not an integer.

I Round up to 8.I Q3 is the eighth value, so Q3 = 2.I The interquartile range is Q3 −Q1 = 2− 0 = 2.

Page 14: C2 st lecture 10   basic statistics and the z test handout

Sigma notation Σ

I Given a data set X , we denote the sum of all the values xin X by ∑

x

I Example: If

X = 0 2 0 1 0 8 2 1 0 0

then∑

x = 0 + 2 + 0 + 1 + 0 + 8 + 2 + 1 + 0 + 0 = 14

Page 15: C2 st lecture 10   basic statistics and the z test handout

Mean

I The mean is our third average.I In a data set of size n the mean, denoted x̄, is the sum of

all the values divided by n.

x̄ =

∑x

n

I Example: What is the mean number of pets?I Calculate the sum of all the values and divide by n

x̄ =

∑x

n=

0 + 2 + 0 + 1 + 0 + 8 + 2 + 1 + 0 + 010

=1410

= 1.4

Page 16: C2 st lecture 10   basic statistics and the z test handout

Standard deviation, σ

I The standard deviation, σ is a measure of dispersion.I First calculate the variance, σ2. The standard deviation, σ,

is the square root of the variance.I There are two formulae for variance. They give the same

answer. Usually the second formula is easier to use.

σ2 =

∑(x − x̄)2

n=

∑x2

n− x̄2

I When you have found the variance, do not forget to takethe square root !

σ =

√∑x2

n− x̄2

Page 17: C2 st lecture 10   basic statistics and the z test handout

Proof that the two formulae for standard deviation areequivalent

σ2 =∑

(x−x̄)2

n

=∑

x2−2xx̄+x̄2

n

=∑

x2

n − 2x̄∑

xn +

∑x̄2

n

=∑

x2

n − 2x̄2 + x̄2∑

1n

=∑

x2

n − x̄2

Page 18: C2 st lecture 10   basic statistics and the z test handout

ExampleI What is the standard deviation of the following data ?

0 2 0 1 0 8 2 1 0 0

I Use the second formula to calculate the variance.

σ2 =

∑x2

n− x̄2

I We previously worked out the mean x̄ = 1.4.∑x2 = 02 + 22 + 02 + 12 + 02 + 82 + 22 + 12 + 02 + 02 = 74

I The variance is

σ2 =

∑x2

n− x̄2 =

7410− 1.42 = 5.44

I The standard deviation is σ =√

5.44 = 2.33 to 2 d.p.

Page 19: C2 st lecture 10   basic statistics and the z test handout

Absolute value

I The absolute value function gives the positive value of anynumber

|x| =

{x if x ≥ 0−x if x < 0

I |5| = 5,I | − 8| = 8,I | − 1.213| = 1.213.I |1,000,000| = 1,000,000.

Page 20: C2 st lecture 10   basic statistics and the z test handout

Absolute deviation

I The absolute deviation measures the average distancefrom each value to the mean. It is another measure ofdispersion.

I As a formula:

AD =

∑|x − x̄|

n

Page 21: C2 st lecture 10   basic statistics and the z test handout

Example

I What is the absolute deviation of the data

0 2 0 1 0 8 2 1 0 0

I The mean is x̄ = 1.4. We first work out |x − x̄|:

1.4 0.6 1.4 0.4 1.4 6.6 0.6 0.4 1.4 1.4

I The absolute deviation is

AD =

∑|x − x̄|

n=

15.610

= 1.56

Page 22: C2 st lecture 10   basic statistics and the z test handout

Hypothesis testing

We use hypothesis testing to compare the mean of a very largedata set, a population mean, with the mean of a sample dataset, a sample mean.

Example: A lightbulb company says their lightbulbs last a meantime of 1000 hours with a standard deviation of 50. We thinktheir lightbulbs last longer than this and propose a test at a 5%level of significance. We buy 75 lightbulbs and they last a meantime of 1022 hours.

The population mean is 1000 hours.The sample is the 75 light bulbs that we test.The sample mean is 1022 hours.

Page 23: C2 st lecture 10   basic statistics and the z test handout

Hypothesis testing

I The null hypothesis, H0 is a statement which is assumed tobe true.

I Sample data is collected and tested to see if it is consistentwith the null hypothesis.

I If the sample mean is significantly different from thepopulation mean, then we say that we have sufficientevidence to reject the null hypothesis, H0, in favour of thealternative hypothesis, H1.

Page 24: C2 st lecture 10   basic statistics and the z test handout

The null hypothesis and the alternative hypothesis

I The null hypothesis concerns the population mean.I It is of the form

H0 : µ = A

where µ is ’population mean’ and A is the hypotheticalvalue

I The alternative hypothesis is that the null hypothesis isincorrect and will be one of

H1 : µ 6= AH1 : µ < AH1 : µ > A

I The question will direct you which of the above to use.

Page 25: C2 st lecture 10   basic statistics and the z test handout

Significance level

I The null hypothesis will always be tested to a given level ofsignificance.

I A 5% level of significance means we are testing to see ifthe probability of getting the sample data is less than 0.05.If the probability is less we reject the null hypothesis infavour of the alternative hypothesis.

I A 1% level of significance translates to a probability of 0.01.

Page 26: C2 st lecture 10   basic statistics and the z test handout

Critical value

I A critical value is the value beyond which we reject the nullhypothesis. It tells us the boundary of the critical region(s)

I In a Z-test this depends on the alternative hypothesis andthe significance level.

I We look up the critical value(s) in tables.

Sig. Lev. 5% Sig. Lev. 1%One-tail Two-tail One-tail Two-tail

Critical value 1.65 1.96 2.33 2.58

Page 27: C2 st lecture 10   basic statistics and the z test handout

H1 : µ 6= A

If our alternative hypothesis is H1 : µ 6= A we are doing atwo-tailed test and we have 2 critical values, one negative andone positive.The critical value is the boundary of the rejection region.For a 5% level of significance we have the following picture:

−1.96 1.96

x

y

The rejection (shaded) regions have a combined area of 0.05.

Page 28: C2 st lecture 10   basic statistics and the z test handout

H1 : µ > A

If our alternative hypothesis is H1 : µ > A we are doing aone-tailed test and we have 1 critical value which is positive.The critical value is the boundary of the rejection region.For a 5% level of significance we have the following picture:

1.65

x

y

The rejection region has an area of 0.05.

Page 29: C2 st lecture 10   basic statistics and the z test handout

H1 : µ < A

If our alternative hypothesis is H1 : µ < A we are doing aone-tailed test and we have 1 critical value which is negative.The critical value is the boundary of the rejection region.For a 5% level of significance we have the following picture:

1.65

x

y

The rejection region has an area of 0.05.

Page 30: C2 st lecture 10   basic statistics and the z test handout

Test statistic

I The test statistic is difference between the sample mean, x̄and the (hypothetical) population mean A, divided by thestandard error.

I The standard error is σ/√

n for the Z -test and s/√

n for theT -test, where n is the sample size, σ is the populationstandard deviation and s is the sample standard deviation.

I The Z-test statistic is

Z =x̄ − Aσ/√

n

I If the test statistic lies beyond the critical value(s) (in therejection region) we reject H0. If it does not, we accept H0.

Page 31: C2 st lecture 10   basic statistics and the z test handout

Z-test - Example 1

Research says that the mean height for a man is 182cm with astandard deviation of 9. We suspect men might be shorter thanthis. We get the heights of 100 men and their mean height is176. We test at a 1% level of significance.

Page 32: C2 st lecture 10   basic statistics and the z test handout

Z-test - Example 1

I The null hypothesis and alternative hypothesis are:H0 : µ = 182H1 : µ < 182

I We are doing a 1-tailed test at a 1% level of significance sothe critical value is: C = −2.33.

I The test statistic is Z = 176−1829/√

100= −6.67.

I −6.67 < −2.33 so we reject the null hypothesis.

Page 33: C2 st lecture 10   basic statistics and the z test handout

Z-test - Example 2

A company says employees are supposed to work an averageof 40 hours a week with a standard deviation of 5 hours. Alfredwants to know if he fits this to a 5% level of significance. Henotes down how many hours he works over 48 weeks and hasa mean of 39 hours.

Page 34: C2 st lecture 10   basic statistics and the z test handout

Z-test - Example 2

I The null hypothesis and alternative hypothesis are:H0 : µ = 40H1 : µ 6= 40

I We are doing a 2-tailed test at a 5% level of significance sothe critical values are: C = −1.96,1.96.

I The test statistic is Z = 39−405/√

48= −1.39.

I −1.96 < −1.39 < 1.96 so we accept the null hypothesis.

Page 35: C2 st lecture 10   basic statistics and the z test handout

Z-test - Example 3

A lightbulb company says their lightbulbs last a mean time of1000 hours with a standard deviation of 50. We think theirlightbulbs last longer than this and propose a test at a 5% levelof significance. We buy 75 lightbulbs and they last a mean timeof 1022 hours.

Page 36: C2 st lecture 10   basic statistics and the z test handout

Z-test - Example 3

I The null hypothesis and alternative hypothesis are:H0 : µ = 1000H1 : µ > 1000

I We are doing a 1-tailed test at a 5% level of significance sothe critical value is: C = 1.65.

I The test statistic is Z = 1022−100050/√

75= 3.81.

I 1.65 < 3.81 so we reject the null hypothesis.

Page 37: C2 st lecture 10   basic statistics and the z test handout

Z-test summary

I You will be given1. Population mean, A2. Population standard deviation, σ3. Significance level4. Sample mean, x̄5. Sample size, n6. Quantifying word.

I You have to work out1. Null hypothesis, alternative hypotheis2. Critical value(s)3. Test statistic4. Decision - accept/reject H0 (sketch a picture if possible)5. Conclusion

Page 38: C2 st lecture 10   basic statistics and the z test handout

The theory behind the Z-test and the T-test

If samples of size n are taken from a population with mean Aand standard deviation σ, then the sample means aredistributed normally, with mean A and standard deviation σ/

√n

When we calculate the test statistic, we are calculating theZ-score of the sample mean

The critical value is the Z-score of a sample mean which wehave a 5% (or 1%) probability of obtaining

For further information, try a statistics book from the library, orthe khanacademy videos on youtube

Page 39: C2 st lecture 10   basic statistics and the z test handout

Normal distribution X ∼ N(µ, σ2)

I The normal distribution is defined as

f (x) =1

σ√

2πe−

(x−µ)2

2σ2

where σ is the population standard deviation and µ is thepopulation mean.

I The graph below is when µ = 0 and σ = 1.

−4 −2 2 4

0.1

0.2

0.3

0.4

0.5

x

y

I Probabilities correspond to areas under this curve