1 1A Types of Data Data is information of some kind. Working with categorical data Frequency distribution tables A frequency distribution table shows how many times a particular observation has occurred. The frequency of any observation is the number of times that observation occurs and is given by the height of its column in a bar chart. The relative frequency of any observation is its frequency as a fraction of the total number of data entries. The percentage frequency is the relative frequency expressed as a percentage. Data Categorical Non numerical data Nominal eg. Favourite fruit ‐ Mangoes ‐ Apples ‐ Bananas Ordinal eg. Opinion of death sentence ‐ Strongly agree ‐ Agree ‐ Not sure ‐ Disagree ‐ Strongly disagree Numerical Numerical data Discrete Whole number responses eg. Number of children in a school ‐ 382 Continuous Can have decimals or fractions within answer. eg. Height of class members 175.5cm, 165.0 cm, 180.5 cm.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
1A Types of Data
Data is information of some kind.
Working with categorical data
Frequency distribution tables
A frequency distribution table shows how many times a particular observation has occurred.
The frequency of any observation is the number of times that observation occurs and is given by the
height of its column in a bar chart.
The relative frequency of any observation is its frequency as a fraction of the total number of data
entries.
The percentage frequency is the relative frequency expressed as a percentage.
Data
CategoricalNon numerical data
Nominaleg. Favourite fruit
‐Mangoes
‐ Apples
‐ Bananas
Ordinaleg. Opinion of death sentence
‐ Strongly agree
‐ Agree
‐ Not sure
‐ Disagree
‐ Strongly disagree
NumericalNumerical data
DiscreteWhole number responses
eg. Number of children in a school ‐ 382
ContinuousCan have decimals
or
fractions within answer.
eg. Height of class members
175.5cm, 165.0 cm, 180.5 cm.
2
Example 1
As part of a survey, a group of 30 teachers was asked to respond to the statement: ‘There is
essentially no difference between the reasoning patterns used by boys’ and girls’. The teachers
were asked to respond by writing T if they thought that the statement was true, F if they thought
that the statement was false and U if they were unsure. The results were collated as follows.
T F F F T F
T U T F T U
U F T F T T
T U U F T F
F F U T U T
(a) Summarise the results using a frequency distribution table.
(b) Represent the data by using a bar chart.
(c) Find the frequency of teachers who thought that the statement was true.
(d) Find the relative frequency of teachers who thought that the statement was true.
(e) Find the percentage frequency of teachers who thought the statement was true.
3
Dot plot (line plot)
A dot plot can be used as an alternative to a frequency distribution table as a method of
summarising data.
The alternative categories are written below the horizontal line and dots are placed in vertical
columns above each category, above the horizontal line.
Example 2
A group of 20 students were asked their reading preference.
comic novel newspaper novel newspaper
magazine magazine newspaper novel other
magazine magazine magazine newspaper comic
novel other magazine newspaper newspaper
(a) Represent the data in a dot plot.
(b) What type of data is represented by the graph?
4
1B Numerical data
Each observation or data point is known as a score.
Grouping data
Numerical data may be presented as either grouped or ungrouped.
Example: Ungrouped data: the number of cinema visits during the month by 20 students.
Number of visits 0 1 2 3 4
Frequency 6 7 4 2 1
When there is a large amount of data or if the data are spread over a wide range it is useful
to group the scores into groups or classes.
Example: Grouped data: number of passengers on each of 20 bus trips.
Number of passengers 5‐9 10‐14 15‐19 20‐24 25‐29
Frequency 1 6 8 4 1
When making the decision to summarise raw data by grouping it on a frequency distribution
table, the choice of class size is important. As a general rule try to choose a class size, so 5
to 10 groups are formed.
Example 1: The number of nails in a sample of 40 nail boxes.
130 122 118 139 126 128 119 124 122 123
132 138 129 139 116 123 126 128 131 142
137 134 126 129 127 118 130 132 134 132
137 124 134 134 120 137 141 118 125 129
5
Histograms and polygons
A histogram is similar to a bar chart but has the essential following features:
Gaps are never left between the columns.
If the chart is colour/shaded, it is in one colour.
Frequency is always plotted on the vertical axis.
For ungrouped data the horizontal scale is marked so that the data labels appear
under the centre of each column. For grouped data the horizontal scale is marked so
that the end points of each class appear under the edges of each column.
Usually we start the first column one column width from the vertical axis.
A polygon is a line graph which is drawn by joining the centres of the tops of each column of
the histogram. The polygon starts and finishes on the horizontal axis a half column space
from the group boundary of the first and last columns.
6
Describing the distribution of data
Normal distribution
The most common score is located at the centre.
Negatively skewed
The most common score is located to the right hand side of the data.
Positively skewed
The most common score is located to the left hand side of the data.
Bimodal data
This is more than one score that is most frequent.
Spread data
The data are spread over a wide range.
Clustered data
Most of the data are confined to a small range.
7
Example 3: The following data shows the number of siblings of each of the 30 students in a
particular class.
Number of siblings 0 1 2 3 4
Frequency 7 14 6 2 1
(a) Draw a histogram of the data.
(b) What is the frequency of the students with 2 siblings?
(c) What was the relative frequency of the students with 2 siblings?
(d) What was the percentage frequency of the students with 2 siblings?
8
Another method of drawing the histogram using the CAS calculator:
Menu
Data
Summary Plot
XList – select “numsib” as the scale on the x‐axis
Summary Plot select “freq” as the scale on the y‐axis
Display on: select New Page then press OK
9
Example 4: The following data give the weights (in kg) of a sample of 25 Atlantic salmon
(a) Represent the data on a frequency distribution table.
(b) Draw a histogram of the data.
(c) Add a polygon to the histogram
(d) What word could you use to describe the pattern of the distribution of the data?
Choose one of the following: normal, positively skewed, negatively skewed, bimodal,
clustered or spread.
10
1C Cumulative data
Cumulative frequency The cumulative frequency is the number of records equal to and less than a particular score. The cumulative frequency of a particular score is obtained by adding the frequency of that score to the sum of the frequency of all preceding score i.e. the running total.
Height (cm) Frequency Cumulative frequency
170 ‐ 3
175 ‐ 6
180 ‐ 12
185 ‐ 10
190 ‐ 8
195 ‐ 1
Ogives An ogive (cumulative frequency polygon) is a line graph of the cumulative frequency results. An ogive is appropriate only for displaying grouped data. Percentiles A percentile is the score below which a particular percentage of the distribution of data lines.
11
Example 5: Forty sample pieces of rope are tested in an effort to determine their breaking strain. The maximum load that could be attached to each was recorded. (a) Add a cumulative frequency column to the table.
Breaking strain (kg) Frequency Cumulative Frequency
40 ‐ 2
45 ‐ 6
50 ‐ 8
55 ‐ 10
60 ‐ 9
65 ‐ 4
70 ‐ 1
(b) Represent the data using an ogive. (c) What number of sample pieces broke under a strain of less than 52 kg?
(d) Find the 75th percentile and write a sentence to explain what it means.
(e) The manufacturer of the rope wishes to label the rope with an appropriate breaking
strain. What should the rope be rated at if the manufacturer wants 90% of all ropes to
be at least as strong as the labelled rate?
12
1D Measures of central tendency
The mean, median and mode are three methods that allow us to obtain a score that is
typical or central to a set of data.
The mean
The mean is the average score in the set of data.
The median
The median of a set of scores is the middle score when the data are arranged in ascending
order.
th score
Example: 0, 1, 2, 3, 3, 4, 4, 4, 5, 5
The mode
The mode of a group of scores is the score that occurs most often.
Example 6: The following data give the number of hours spent on homework by 8 students.
2, 2, 3, 0, 1, 1, 5, 1
(a) Determine the mean of the data.
(b) Determine the median of the data.
(c) Determine the mode of the data.
13
Example 7: Example of Ungrouped data
No. of visits 0 1 2 3 4
Frequency 6 7 4 2 1
Find:
(a) Determine the mean of the data
(b) Determine the median of data.
(c) Determine the mode of the data.
1st step: Redraw the table with two extra columns.
No. of visits (x) Frequency (f) f × x Cumulative
Frequency (C. F.)
0
1
2
3
4
Total
14
15
Example 8: Grouped data
The frequency below shows the area (in m2) of 23 blocks in a suburban subdivision.