Transcript
Statistical Methods.
Why Statistics.
• Statistics is used to take the analysis of data one stage beyond what can be achieved with maps and diagrams.
• You can gain a primitive insight into patterns at a glance but mathematical manipulation usually gives greater precision.
• This allows us to discover things which might otherwise go unnoticed.
The need for justification.
• Justifying mathematical manipulation is vital.
• It is vital to be aware that statistics is an aid to analysis and no more.
• Too often students make statistical calculations in geographical projects without adequate justification.
• Before statistics is used it is essential to ask yourself two questions.
Question 1.
• Why am I using this technique?
• In the exam be absolutely clear what it is a statistical test can prove and how a statistical test can do this.
Question 2.
• Is the data appropriate to this particular technique?
• Each technique requires data to be arranged in a particular form.
• If they aren’t the technique cannot be used.• If your data is not good in the first place the use
of a complex statistical technique will not help you
“Rubbish in- Rubbish out”
Mean, Mode, Median.
• To be used when faced with a large amount of data
• For example- average temperature of a place every day for two years.
• It makes things far easier when we can summarise it.
• This is relatively easy to do and there are three common methods to achieve this.
1- Mean
• What most people call the average is the mean.• You find it by adding all the numbers together
and then divide by the total number of data values.
• The mean is shown by the symbol- x• The mean is distorted if you have just one
extreme value which can be a problem.• However, it is the most commonly used as it can
be used for further mathematical processing.
Find the mean of these data values-
• 3, 4, 4, 4, 6, 6, 9.
36 = 5.1
7
x = 5.1
2- The Mode.
• The mode is simply the most frequently occurring event.
• If we are using simple numbers then the mode is the most frequently occurring number.
• If we are looking at data on the nominal scale (grouped into categories) the mode is the most common category.
• The mode is very quick to calculate, but it cannot be used for further mathematical processing.
• It is not effected by extreme values.
Find the mode of this data set.
• 3, 4, 4, 4, 6, 9.
Mode (most frequently occurring number)= 4
Find the mode of this nominal data.
Land Use Hectares
Clover 10
Rye 12
Vegetables 15
Fruit 3
Wheat 29
Barley 18
Pasture 17
Mode (Most frequently occurring category)= wheat.
3- The Median.
• The Median is the central value in a series of ranked values.
• If there is an even number of values, the median is the mid point between the two centrally placed values.
• The median is not effected by extreme values but it cannot be used for further mathematical processing.
Find the median of this data set.
3, 4, 4, 4, 6, 9.
Median (central value)= 4.
Now find the median of this data set.
3, 4, 4, 6, 6, 9.
Median (central value)= 5
Spread around the median and mean.
• The median, mean and mode all give us a summary value for a set of data.
• On their own, however, they give us no idea of the spread of data around the summary value, which can be misleading.
• For example…
• I collected the following rainfall data.
• The mean for this data is 20mm.• But that gives an untrue picture of what really happened. • There is a great “deviation about the mean”.• Deviation can be measured statistically as follows.
Year Rainfall (mm)
1990 0
1991 0
1992 3
1993 0
1994 97
Spread around the median: the interquartile range.
• The Interquartile range is a measure of the spread of the values around their median.
• The greater the spread the higher the interquartile range.
Method.
• Stage 1- Place the variables in rank order, smallest to largest.
• Stage 2- Find the upper quartile. This is found by taking the 25% highest values and finding the mid-point between the lowest of these and the next lowest number.
• Stage 3- Find the lower quartile. This is obtained by taking the 25% lowest values and finding the mid-point between the highest of these and the next highest value.
• Stage 4- Find the difference between the upper and lower quartiles. This is the interquartile range, a crude index of the spread of the values around the median.
• The higher the range the greater the spread.
Over to you.
• Copy out the data on the next slide• Then find the interquartile range, remembering
to follow all the four stages.
Month Average temperature
January 4
February 5
March 7
April 9
May 12
June 15
July 17
August 17
September 15
October 11
November 7
December 5
Answer
• Ranked the data looks like this.4 5 5 7 7 9 11 12 15 15 17 17
Lower Quartile Median Upper Quartile 6 10 15
Interquartile range: (15-6) = 9.
Spread about the mean: Standard deviation.
• If we want to obtain some measure of the spread of our data about its mean we calculate its standard deviation.
• Two sets of figures can have the same mean but very different standard deviations.
• Stage 1- Tabulate the values (x) and their squares (x ² ). Add these values (∑x and ∑x ² ).
• Find the mean of all the values of x (x ) and square it (x ² ).
• Stage 3- Calculate the formula
= ∑x² - x ²
n
Method.
= standard deviation.
= the square root of.
∑ = the sum of.
n = the number of values.
x = the mean of the values.
Over to you.
• Number of vehicles passing a traffic count point.
• Calculate the standard deviation of the following data.
Day Number of vehicles.
1 50
2 75
3 80
4 92
5 60
6 70
7 63
8 42
9 75
10 82
Answer.x x²
50 2 500
75 5 625
80 6 400
92 8 464
60 3 600
70 4 900
63 3 969
42 1 764
75 5 625
82 6 724
Answer
• ∑ X = 689• ∑ x² = 49 571.• x = 689 divided by 10 = 68.9• x ² = (68.9) ² = 4747.2• = ∑x² - x ² = 49 571 – 4747.2
n 10
= 14.5
Phew!!!!!!
• The higher the standard deviation, the greater the spread of data around the mean.
• The standard deviation is the best of the measures of spread as it takes into account all of the values under consideration.
Homework.
• Research the following tests of significance to find out their meaning.
1. The Mann-Whitney U test.
2. The Chi- Squared (x²) test.
top related