S1: Chapter 4 Representation of Data Dr J Frost ([email protected]) Last modified: 25 th September 2014
Dec 14, 2015
S1: Chapter 4Representation of Data
Dr J Frost ([email protected])
Last modified: 25th September 2014
Stem and Leaf recap
4.7 3.6 3.8 4.7 4.1 2.2 3.6 4.0 4.4 5.0 3.7 4.6 4.8 3.7 3.22.5 3.6 4.5 4.7 5.2 4.7 4.2 3.8 5.1 1.4 2.1 3.5 4.2 2.4 5.1
Put the following measurements into a stem and leaf diagram:
12345
41 2 4 52 5 6 6 6 7 7 8 80 1 2 2 4 5 6 7 7 7 7 80 1 1 2
Now find:
ππππ=4.7πΏππ€ππππ’πππ‘πππ=3.6πππππππ’πππ‘πππ=4.7 ππππππ=4.05
(1)(4)(9)(12)(4)
Key:2 | 1 means 2.1?
? ?
? ?
Back-to-Back Stem and Leaf recap
Girls55 80 84 91 8092 98 40 60 6493 72 96 85 8890 76 54 58 92
91 80 79
Boys80 60 91 65 6781 75 46 72 7174 57 64 60 50
68
The data above shows the pulse rate of boys and girls in a school.
Comment on the results.The back-to-back stem and leaf diagram shows that boyβs pulse rate tends to be lower than girlsβ.
Girls Boys
456789
60 7 90 0 4 5 7 81 2 4 501
08 5 46 4 0
9 8 6 28 5 4 0 0 08 6 2 2 1 0
Key: 0|4|6Means 40 for girls and 46 for boys.
?
?
Box Plots allow us to visually represent the distribution of the data.
Minimum Lower Quartile Median Upper Quartile Maximum
3 15 17 22 27
0 5 10 15 20 25 30
Sketch Sketch Sketch Sketch Sketch
How is the IQR represented in this diagram?
How is the range represented in this diagram?
Sketch Sketch
IQR
range
Box Plot recap
Box Plots recap
0 4 8 12 16 20 24
Sketch a box plot to represent the given weights of cats:
5lb, 6lb, 7.5lb, 8lb, 8lb, 9lb, 12lb, 14lb, 20lb
Minimum Maximum Median Lower Quartile Upper Quartile
5 20 8 7.5 12? ? ? ? ?
Sketch
OutliersAn outlier is: an extreme value.
0 5 10 15 20 25 30
More specifically, itβs generally when weβre 1.5 IQRs beyond the lower and upper quartiles.(But you will be told in the exam if the rule differs from this)
Outliers beyond this point
?
Outliers
0 5 10 15 20 25 30
We can display outliers as crosses on a box plot. But if we have one, how do we display the marks for the minimum/maximum?
0 5 10 15 20 25 30
Maximum point is not an outlier, so remains unchanged.
But we have points that are outliers here. This mark becomes the βoutlier boundaryβ, rather than the minimum.
ExamplesSmallest values Largest values Lower Quartile Median Upper Quartile
0, 3 21, 27 8 10 14
0 5 10 15 20 25 30
Smallest values Largest values Lower Quartile Median Upper Quartile
3, 7 20, 25, 26 12 13 16
0 5 10 15 20 25 30
?
?
Exercises
Pages 58 Exercise 4BQ2
Page 59 Exercise 4CQ1, 2
Β£100k Β£150k Β£200k Β£250k Β£300k Β£350k Β£400k Β£450k
Kingston
Croydon
Box Plot comparing house prices of Croydon and Kingston-upon-Thames.
Comparing Box Plots
βCompare the prices of houses in Croydon with those in Kingstonβ. (2 marks)
For 1 mark, one of:β’In interquartile range of house prices in Kingston is greater than Croydon.β’The range of house prices in Kingston is greater than Croydon.i.e. Something spread related.
For 1 mark:β’The median house price in Kingston was greater than that in Croydon.β’i.e. Compare some measure of location (could be minimum, lower quartile, etc.)
? ?
6 7 8 9
Shoe Size
Fre
quen
cy
Height
1.0m 1.2m 1.4m 1.6m 1.8m
Fre
quen
cy D
ensi
ty
Bar Chartsβ’ For discrete data.β’ Frequency given by
height of bars.
Histogramsβ’ For continuous data.β’ Data divided into (potentially
uneven) intervals.β’ [GCSE definition] Frequency
given by area of bars.*β’ No gaps between bars.
? ?
??
Bar Charts vs Histograms
* Not actually true. Weβll correct this in a sec.
Use this as a reason whenever youβre asked to justify use of a histogram.
F.D.
Freq
Width
Weight (w kg) Frequency Frequency Density
0 < w β€ 10 40 4
10 < w β€ 15 6 1.2
15 < w β€ 35 52 2.6
35 < w β€ 45 10 1
??
??
10 20 30 40 50Height (m)
5
4
3
2
1
Freq
uenc
y D
ensi
ty
Frequency = 15
Frequency = 30
Frequency = 40
Frequency = 25?
?
?
?
Bar Charts vs HistogramsStill using the βincorrectβ GCSE formula:
The area of each bar in fact isnβt necessarily equal to the frequency.Actually:
i.e.
Similarly:
Area = frequency?
However, we often let , so that that the becomes an =, as we were allowed to assume at GCSE.
The key to almost every histogram questionβ¦β¦This diagram!
Area FrequencyΓπ
For a given histogram, thereβs some scaling to get from an area (whether the total area of the area of a particular bar) to the corresponding frequency.Once youβve worked out this scaling, any subsequent areas you calculate can be converted to frequencies.
Area = frequency?
5
4
3
2
1
0
Freq
uenc
y D
ensi
tyThere were 60 runners in a 100m race. The following histogram represents their times. Determine the number of runners with times above 14s.
9 12 18
Time (s)
We first find what area represents the total frequency.
Total area = 15 + 9 = 24
Then use this scaling along with the desired area.
Area=4Γ1.5
Area Freq
Area Freq
?
?
Frequency Density = Frequency Class width?
Weight (to nearest kg) Frequency
1-2
3-6
7-9
5
4
3
2
1
0
Freq
uenc
y D
ensi
ty
1 2 3 4 5 6 7 8 9 10
Time (s)
??
Note the gaps!We can use the complete set of information in the first row combined with the bar to again work out the correct βscalingβ.
A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results.
(a) Calculate the number of cars that were exceeding the speed limit by at least 5 mph in the sample. (4 marks)
M1 A1: Determine what one small square or one large square is worth.
(i.e. work out scaling)
M1 A1: Use this to find number of cars travelling >35mph.
May 2012
7
6
5
4
3
2
1
We can make the frequency density scale what we like.
Area Freq?
Area Freq
?
A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results. (b) Estimate the value of the mean speed of the cars in the sample. (3 marks)
M1 M1: Use histogram to construct sum of speeds.
30Γ12.5+240Γ25+β¦450
A1 Correct value
ΒΏ28.8
?
?
May 2012
Bro Tip: Whenever you are asked to calculate mean, median or quartiles from a histogram, form a grouped frequency table. Use your scaling factor to work out the frequency of each bar.
May 2012
Speed
10-15 12.5 30
20-30 15 240
30-35 32.5 90
35-40 37.5 30
40-45 42.5 60
Jan 2012
14?
5?
Bro Tip: Be careful that you use the correct class widths!
21 + 45 + 3 = 69?
M1
A1
B1
M1
A1= 12 runners
?????
Jan 2008
Answer: Distance is continuous
Note that gaps in the class intervals!4 / 5 = 0.819 / 5 = 3.853 / 10 = 5.3...
?
?
35 15
(5 x 5) + 15 = 40
? ?
?
Jun 2007
SkewSkew gives a measure of whether the values are more spread out above the median or below the median.
Height
Freq
uenc
y
Weight
Freq
uenc
y
Sketch Mode
Sketch Median
Sketch Mean
mode
median
mean
mode
median
mean
Sketch Mode
Sketch Median
Sketch Mean
We say this distribution has positive skew.(To remember, think that the βtailβ points in the positive direction)
We say this distribution has negative skew.? ?
Skew
Salaries on the UK.
Distribution Skew
High salaries drag mean up.So positive skew.Mean > Median
IQ A symmetrical distribution, i.e. no skew.Mean = Median
Heights of people in the UK Will probably be a nice βbell curveβ.i.e. No skew.Mean = Median
Age of retirement Likely to be people who retire significantly before the median age, but not many who retire significantly after. So negative skew.Mean < Median
Remember, think what direction the βtailβ is likely to point.
?
?
?
?
?
?
?
?
Exam QuestionIn the previous parts of a question youβve calculated that the mean mark of students in a test was and .
(d) Describe the skewness of the marks of the students, giving a reason for your answer. (2)
Negative skew
because mean < median
1st mark
2nd mark
?
?
Skew
Positive skew Negative skew
Given the quartiles and median, how would you work out whether the distribution had positive or negative skew?
? ?
No skew?
Exam Question
π3βπ2>π2βπ11st mark
2nd mark Therefore positive skew.
?
?
Calculating SkewOne measure of skew can be calculated using the following formula: (Important Note: this will be given to you in the exam if required)
3(mean β median)standard deviation
When mean > median, mean < median, and mean = median, we can see this gives us a positive value, negative value, and 0 respectively, as expected.
Find the skew of the following teachersβ annual salaries:
Β£3 Β£3.50 Β£4 Β£7 Β£100
Mean = Β£23.50 Median = Β£4 Standard Deviation = Β£38.28
Skew = 1.53
? ? ?
?
S1: Chapter 4 Revision!
RevisionStem and leaf diagrams:β’ Can you construct one, and write the appropriate key?β’ Can you calculate mode, mean, median and quartiles?β’ Can you assess skewness by using these above values?Back-to-back stem and leaf diagrams:β’ Can you construct one with appropriate key?β’ Can you compare the data on each side?
12345
41 2 4 52 5 6 6 6 7 7 8 80 1 2 2 4 5 6 7 7 7 7 80 1 1 2
(1)(4)(9)(12)(4)
ππππ=4.7πΏππ€ππππ’πππ‘πππ=3.6πππππππ’πππ‘πππ=4.7 ππππππ=4.05
? ?
? ?Type of skewReason:
Key:2 | 1 means 2.1?
? ?
Girls Boys
456789
60 7 90 0 4 5 7 81 2 4 501
Key: 0|4|6Means 40 for girls and 46 for boys.
Revision
08 5 46 4 0
9 8 6 28 5 4 0 0 08 6 2 2 1 0
The data above shows the pulse rate of boys and girls in a school.
Comment on the results.Boyβs pulse rate tends to be lower than girlsβ.
Notice the values go outwards from the centre.
??
?
Revision
Can you:β’ Appreciate that the frequency density scale doesnβt matter. This is why frequency is
only proportional to area, and not equal to it.β’ You often need to identify the scaling .
You might only be given the total frequency (in which case you need to find the total area of the histogram to find ).But if you know the frequency associated with a particular bar, just find the area of that single bar.
β’ If you donβt care about the scaling, then β’ Be incredibly careful about class widths (i.e. widths of boxes). If the class interval in
the frequency table was with gaps, then youβd draw on the histogram, and use 6 as the width of the box.
β’ If you want to find the quartiles/median/mean, you need to first construct a grouped frequency table using the histogram.
β’ When asked to find the number of people with values in a certain range (e.g. with times between 10 and 15s) and it crosses multiple ranges/bars, itβs easier to use the frequency table youβve constructed from the histogram. Use linear interpolation where necessary.
Histograms
M1
A1
B1
M1
A1= 12 runners
?????
Revision
Revision
Smallest values Largest values Lower Quartile Median Upper Quartile
0, 3 21, 27 8 10 14
0 5 10 15 20 25 30
Smallest values Largest values Lower Quartile Median Upper Quartile
3, 7 20, 25, 26 12 13 16
0 5 10 15 20 25 30
?
?
Given that an outlier is a value outside the lower and upper quartilesβ¦
Revision
You can determine skewness in three ways:β’ Comparing quartiles:
When , the width of the right box in the box plot is wider, so itβs positive skew.If a box plot is drawn, it should be immediately obvious!
β’ Comparing mean/median:When , large values have dragged up the mean, so thereβs a tail in the positive direction, and thus the skew is positive.
β’ Looking at the shape of the distribution. If thereβs a βpositive tailβ, the skew is positive.
When asked to justify your answer for skewness, youβre expected to put either something like ββ or .You will always be given a formula if you have to calculate a value for skew. But for all formulae, 0 means no skew (i.e. a βsymmetric distributionβ), >0 means positive skew and <0 means negative skew.
Skewness
ππππ€=3 (ππππβππππππ)π π‘ππππππ πππ£πππ‘πππ
Find the skew of the following teachersβ annual salaries:
Β£3 Β£3.50 Β£4 Β£7 Β£100Mean = Β£23.50 Median = Β£4 Standard Deviation = Β£38.28
Skew = 1.53
? ? ?
?