1 The Institute of Chartered Accountants of Sri Lanka Postgraduate Diploma in Business Finance and Strategy Quantitative Methods for Business Studies Handout 02: Presentation and Analysis of data Tables and Charts for Categorical Data. One way tabulation It is a table with two columns. One column lists the categories, and the other for the frequencies or percentages with which the items in the categories occur. Example: Two way tabulation When the data are tabulated according to two characteristics at a time, it is said to be double tabulation or two-way tabulation. Example: Complex Tabulation When the data are tabulated according to many characteristics, it is said to be complex tabulation. Example
14
Embed
The Institute of Chartered Accountants of Sri Lanka 2...Frequency Polygon A frequency polygon is a line graph drawn by joining the mid-points of the tops of each rectangle in a histogram
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
The Institute of Chartered Accountants of Sri Lanka
Postgraduate Diploma in Business Finance and Strategy
Quantitative Methods for Business Studies
Handout 02: Presentation and Analysis of data
Tables and Charts for Categorical Data. One way tabulation It is a table with two columns. One column lists the categories, and the other for the frequencies
or percentages with which the items in the categories occur.
Example:
Two way tabulation When the data are tabulated according to two characteristics at a time, it is said to be double tabulation
or two-way tabulation.
Example:
Complex Tabulation
When the data are tabulated according to many characteristics, it is said to be complex tabulation.
Example
2
Pie Charts
A pie chart is a circle with wedges cut of varying sizes marked out like slices of a pie or pizza. The
relative sizes of the wedges correspond to the relative frequencies of the categories.
Simple bar Chart
In bar charts, each category of data is depicted by a bar, the height of which represents the
frequency or percentage of observations falling into a category.
Multiple bar chart
In a multiple bar chart two or more sets of inter-related data are represented facilitating comparison
between more than one phenomena.
Stacked Bar chart
In a stacked bar chart, data series are stacked one on top of the other in vertical columns.
How Do You Spend the Holiday's
45%
38%
5%
5%7%
At home with family
Travel to visit family
Vacation
Catching up on work
Other
3
Pictogram
A pictogram is a statistical graphic in which the size of the picture is intended to represent the
frequencies or size of the values being represented.
Organizing Numerical data. As the number of observations gets large, it becomes more and more difficult to focus on the major
features in a set of data. We need ways to organize the observations so that we can better
understand what information the data is conveying. Large sets of data are presented under groups
to facilitate better presentation.
Presenting Numerical data in tables and graphs
Frequency Distribution
A frequency distribution is a table in which the data is arranged into conveniently established,
numerically ordered class groupings or categories.
Frequency Distribution for Ungrouped Data
When the number of values in a variable is small, an ungrouped frequency distribution would be appropriate.
Exercise
20 families were surveyed to find how many children they had. The data obtained were as follows.
0,2,3,1,1,3,4,2,0,3,4,2,2,1,0,4,1,2,2,3 Construct a frequency distribution. Frequency Distribution for Grouped Data
Sometimes it is impractical to prepare a frequency distribution table using ungrouped data. This is particularly true when there is a large number of observations. In cases such cases it is better to collect the observations into groups or classes with clearly defined upper and lower limits. The following steps have to be followed to construct a frequency distribution for a grouped set of data.
Find the range of the distribution. (ie difference between highest and lowest ) Select class intervals of a convenient size.
For most distributions about 6 to 12 classes will be sufficient. Usually interval widths of 5, 10, 20 and so on are suitable.
Mark the number of values falling within each class interval using tally marks and construct the frequency distribution table.
4
Important definitions relating to frequency distribution.
Lower class limit
These are the smallest numbers that can actually belong to different classes.
Upper class limit
These are the largest numbers that can actually belong to different classes
Class width
This is the difference between two consecutive lower class limits or upper class limits.
Cumulative Frequency distributions
The cumulative frequency of a class is the sum of frequencies for that class and all previous classes.
Relative Frequency Distribution
The relative frequency of a class can be found by dividing the class frequency by the total of all
frequencies. This can also be given as a percentage.
Relative frequency = class frequency / total frequency
Frequency Histogram A histogram is a column or bar graph of a frequency table. However, unlike bar graphs, there are no gaps between adjacent bars.
Frequency Polygon A frequency polygon is a line graph drawn by joining the mid-points of the tops of each rectangle in a histogram with straight lines. To construct a frequency polygon frequencies should be plotted against the mid points of each interval and joined with a straight line graph. Ogive or Cumulative Frequency Graph
An ogive is the graphical presentation of a cumulative frequency distribution. Ogives are used when it is considered more useful to determine the number (or proportion) of data items that fall above or below a particular value rather than within a given interval. There are 2 types of ogives - “less than” and “greater than” ogives. The following steps have to be followed to construct a less than ogive.
1 Compute the cumulative frequencies of the distribution in ascending order. 2 Prepare a graph with the cumulative frequency on the vertical axis and the class intervals
on the horizontal axis. 3 After plotting the first point, the respective cumulative frequencies must be plotted
against the upper limits of each class. 4 Join all the points with straight lines.
Exercise
1. The manager of Jerry’s salon recently asked his last 50 customers to punch a time card when
they first arrived at the shop and to punch out right after they paid for their hair cut. He then
used the data on the cards to measure how long it took Jerry and his hair dressers to cut hair in
order to schedule their appointment intervals. The following data were tabulated.
5
50 21 36 35 35 27 38 51 28 35
32 32 27 25 24 38 43 46 29 45
39 27 36 38 35 31 28 38 33 46
35 31 38 48 23 25 43 31 32 38
43 32 18 43 52 52 49 53 46 19
i. Form the frequency distribution and percentage distribution.
ii. Plot the histogram
iii. Plot the frequency polygon.
iv. Form the cumulative percentage distribution.
v. Plot the cumulative percentage polygon.
vi. On the basis of the results (a)-(e), comment on the time gap to be kept between two
consecutive appointments for haircuts.
2. Moore Travel, a nationwide travel agency, offers special rates on certain Caribbean cruises to
senior citizens. The president of Moore Travel wants additional information on the ages of
those people taking cruises. A random sample of 40 customers taking a cruise last year revealed
these ages:
77 18 63 84 38 54 50 59 54 56
36 26 50 34 44 41 58 58 53 51
62 43 52 53 63 62 62 65 61 52
60 60 45 66 83 71 63 58 61 71
i. Organize the data into a frequency distribution, using 7 classes and 15 as the
lower limit of the first class.
ii. Where do the data tend to cluster?
iii. Determine the relative frequency distribution.
iv. Describe the distribution.
The Stem and Leaf Display
A stem-and-leaf diagram, also called a stem-and-leaf plot, is a diagram that quickly summarizes
data while maintaining the individual data points. In such a diagram, the "stem" is a column of the
unique elements of data after removing the last digit. The final digits ("leaves") of each column
are then placed in a row next to the appropriate column and sorted in numerical order.
Exercise
1. A bank wants to find the number of times a particular automated teller machine (ATM) is
used each day. The following is the number of times it was used during each of the last 30
days. Develop a stem-and-leaf display. Summarize the data on the number of times the
machine was used: 83 64 84 76 84 54 75 59 70 61
63 80 84 73 68 52 65 90 52 77
6
95 36 78 61 59 84 47 87 60 66
How many times was the ATM used on a typical day?
What were the largest and the smallest number of times the ATM was used?
Around what values did the number of times the ATM was used, tend to cluster?
2. The back to back stem and leaf plot below shows the LDL cholesterol levels (in milligram per
deciliter mg/dL) of two groups of people, smokers and non-smokers. The digits in the stem
represents the hundreds and tens and the digit in the leaf represents the ones. For or example
11|8 = 118 and so on.
i. People with a cholesterol level of 129 or less are said to have a near ideal level of
cholesterol. How many people, in each group, have a near ideal level of cholesterol?
ii. People with a cholesterol level between 130 and 159 inclusive are said to be in the border
high. How many people, in each group, are in the border high?
iii. People with a cholesterol level between 160 and 189 inclusive are said to have a high
level of cholesterol. How many people, in each group, have a high level of cholesterol?
iv. People with a cholesterol level of 190 or above are said to have a very high level of
cholesterol. How many people, in each group, have a very high level of cholesterol?
v. Comparing the two groups, which group has more people with a higher level of
cholesterol?
7
Measures of Central Tendency and Dispersion
Measures of Central Tendency A measure of central tendency is a value at the center or middle of a data set. It condenses the
mass of data into one single value and enables us to get an idea of the entire set of data. It also
enables comparison of two or more sets of data.
Mean
The mean is computed by dividing the sum of the values of each and every observation by the
total number of observations.
Mean for population:
Mean for sample =
Where,
= symbol representing summation,
x = the set of values in the sample set,
n = number of values in the sample set.
When mean is given for grouped data, the midpoint of every class interval is considered to be
representing the values within the class. The mean is given as,
f
fx_x
Where, f = the frequency of each class
x = mid point of each class
Median
Median is the middle value in a range of values arranged in sequence by size.
When the data is arranged in the ascending or descending order,
Median = the [(n+1)/2]th value.
The median for a grouped set of data is the (n/2)th term. To calculate the median for grouped
data, the cumulative frequency distribution has to be constructed first and then the following
formula can be applied.
Median = L +
n2
C
fi
Where,
L = real lower limit of the median class,
n = number of items of data,
n
X
X
n
i
i 1
N
XN
i
i 1
8
C = cumulative frequency of the class prior to the median class,
f = frequency of the median class,
i = class interval width.
Mode
Mode is the most frequently occurring figure in an ungrouped set of data.
For grouped data, mode can be estimated using the following formula.
Mode = L + d1 i
d1 + d2
Where,
L = real lower limit of the modal class,
d1 = difference in frequencies between the modal class and the proceeding class.
d2 = difference in frequencies between the modal class and the following class.
i = class interval width.
Exercises
1. Refer the hair cutting data with respect to Jerry’s salon given on page 4. Compute the
mean, median and mode.
The weighted mean
Weighted mean is used in situations where scores vary in their degree of importance. In such
situations, different weights are attached to different scores. A weight is a value corresponding
to how much the score is counted. Given a list of scores x1, x2,…xn and corresponding list of
weights w1, w2, ….wn the weighted mean is obtained by the formulae
Weighted mean = Σ(w . x)
Σw
Exercise
The final score of a course is computed as a weighted mean with the weights 10% for mid term
test, 30% for assignment and 60% for the written exam. A student scored 80 marks for the mid
term test, 70 for the assignment and 60 for the written exam. Find the final mark obtained for
the course.
Measures of Non Central Tendency Quartiles.
Quartiles are employed particularly when summerising or describing the properties of large
sets of numerical data. The quartiles are descriptive measures that splits the ordered data into
four quarters.
In an ungrouped set of scores, Q1 = [(n+1)/4]th value and Q3 = [3(n+1)/4]th value.
IQR for grouped data:
9
Q1 (lower quartile) = L + n/4 -C i
FQ1
Q2 (median) = L + n/2 -C i
fQ2
Q3 (upper quartile) = L + 3n/4 -C i
FQ3
where
L = real lower limit of the quartile class ,
n = total number of observations in the entire data set,
C = cumulative frequency in the class immediately before the quartile class,
fQ = frequency of the relevant quartile class,
i = the length of the real class interval of the relevant quartile class.
Exercises
1. A manufacturer of flashlight batteries took a sample of 13 batteries from a day’s production
and used them continuously until they were drained. The number of hours they were used