Top Banner
Lecture 2: Frequency Distribution and Graphical Representation Donglei Du ([email protected]) Faculty of Business Administration, University of New Brunswick, NB Canada Fredericton E3B 9Y2 Donglei Du (UNB) ADM 2623: Business Statistics 1 / 35
35

Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Mar 06, 2018

Download

Documents

buinguyet
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Lecture 2: Frequency Distribution and Graphical

Representation

Donglei Du([email protected])

Faculty of Business Administration, University of New Brunswick, NB Canada FrederictonE3B 9Y2

Donglei Du (UNB) ADM 2623: Business Statistics 1 / 35

Page 2: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Table of contents

1 Quantitative data: table/graphic representationTable representation for quantitative data: Frequency Distribution TableGraphical Representation for quantitative data: histogram and polygonStem-and-leaf display for small/medium sized quantitative data

2 Qualitative data: table/graphic representationGraphical Representation for quantitative data: bar chart and pie chart

Donglei Du (UNB) ADM 2623: Business Statistics 2 / 35

Page 3: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Layout

1 Quantitative data: table/graphic representationTable representation for quantitative data: Frequency Distribution TableGraphical Representation for quantitative data: histogram and polygonStem-and-leaf display for small/medium sized quantitative data

2 Qualitative data: table/graphic representationGraphical Representation for quantitative data: bar chart and pie chart

Donglei Du (UNB) ADM 2623: Business Statistics 3 / 35

Page 4: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Frequency Distribution

A Frequency Distribution is a grouping of data into mutually exclusiveand exhaustive classes showing the number of observations in eachclass.

Donglei Du (UNB) ADM 2623: Business Statistics 4 / 35

Page 5: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

An example

Example: Consider the guessed weights (lbm) collected in our firstclass on Sept. 5, 2013 from 62 students (the e-version of this datawill be available online on my website).

140 135 140 160 175 150 152 155 155 165 145 150 154 160 143160 170 155 140 160 160 175 140 145 150 150 152 159 160 165145 155 150 150 165 148 152 155 155 160 172 180 141 147 155165 170 160 140 150 150 152 155 130 155 163 170 139 165 180180 190

Problem: Let us organize it into a frequency distribution table.

Donglei Du (UNB) ADM 2623: Business Statistics 5 / 35

Page 6: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Five steps procedure to construct a frequency distribution

Step 1. Decide how many classes you wish to use.

Step 2. Determine the class width

Step 3. Set up the individual class limits

Step 4. Tally the items into the classes

Step 5. Count the number of items in each class

Donglei Du (UNB) ADM 2623: Business Statistics 6 / 35

Page 7: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Step 1. Decide how many classes you wish to use

Rule of Thumb: Use the 2 to the kth rule.

Suppose there are n points in the data: Choose k so that 2 raised tothe power of k is greater than n; namley

k ≥ logn2 .

For this example, n = 62, so k = 6 because

26 = 64 ≥ 62;

orlog622 ≈ 5.954196 ր 6.

Donglei Du (UNB) ADM 2623: Business Statistics 7 / 35

Page 8: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Step 2. Determine the class width

Generally, the class width should be the same size for all classes.

C =

max−min

k

For this example,

C =

190− 130

6

= 10.

Donglei Du (UNB) ADM 2623: Business Statistics 8 / 35

Page 9: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Step 3. Set up the individual class limits

We only need to know the lower limit of the first class L.

L =

min−C ∗ k − (max−min)

2

.

For this example,

L =

130−10 ∗ 6− (190− 130)

2

= 130.

Donglei Du (UNB) ADM 2623: Business Statistics 9 / 35

Page 10: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Frequency Distribution Table for the weight example

Solution: Tallying and counting in Steps 4 and 5 result in thefollowing frequency distribution table.

class frequency

[130,140) 3

[140,150) 12

[150,160) 23

[160,170) 14

[170,180) 6

[180,190] 4

Donglei Du (UNB) ADM 2623: Business Statistics 10 / 35

Page 11: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Some terminologies associated with the table

Data organized into a frequency distribution table also called grouped

data.

Class frequency: The number of observations in each class.

Class relative frequency: The percent of observations in each class.

Class cumulative frequency: The total observations up to certain class

Class Midpoint: A point that divides a class into two equal parts, i.e.the average of the upper and lower class limits.

Class interval (a.k.a. class width or class size): The class interval isobtained by subtracting the lower limit of a class from the lower limitof the next class.

Donglei Du (UNB) ADM 2623: Business Statistics 11 / 35

Page 12: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Terminologies associated with the table

class freq relative freq. cumulative freq. mid point

[130, 140) 3 0.05 3 135

[140, 150) 12 0.19 15 145

[150, 160) 23 0.37 38 155

[160, 170) 14 0.23 52 165

[170, 180) 6 0.10 58 175

[180, 190] 4 0.06 62 185

Donglei Du (UNB) ADM 2623: Business Statistics 12 / 35

Page 13: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Histogram

A Histogram is a graph in which the classes are marked on thehorizontal axis and the class frequencies on the vertical axis. Theclass frequencies are represented by the heights of the bars and thebars are drawn adjacent to each other.

Donglei Du (UNB) ADM 2623: Business Statistics 13 / 35

Page 14: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

histogram for the weight example

Histogram of sec_01A

sec_01A

Frequency

130 140 150 160 170 180 190

05

10

15

20

Donglei Du (UNB) ADM 2623: Business Statistics 14 / 35

Page 15: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Example

Example: the above example

The R code:

> weight <- read.csv("weight.csv")

> sec_01A<-weight$Weight.01A.2013Fall

> m<-min(sec_01A)

> M<-max(sec_01A)+1

> hist(sec_01A, breaks=seq(m,M,10),right=FALSE)

Donglei Du (UNB) ADM 2623: Business Statistics 15 / 35

Page 16: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Polygon

A frequency polygon consists of line segments connecting the pointsformed by the class midpoint and the class frequency.

Donglei Du (UNB) ADM 2623: Business Statistics 16 / 35

Page 17: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

frequency polygon for the weight example

130 140 150 160 170 180 190

05

10

15

20

mid_point_01A

freq0

Donglei Du (UNB) ADM 2623: Business Statistics 17 / 35

Page 18: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Example

Example: the above example

The R code:

> weight <- read.csv("weight.csv")

> sec_01A<-weight$Weight.01A.2013Fall

> m<-min(sec_01A)

> M<-max(sec_01A)+1

> breaks_sec_01A <- seq(m,M, by=10)

> weight.cut <- cut(sec_01A, breaks_sec_01A, right=FALSE)

> weight.freq <- table(weight.cut)

> freq0<-c(0,weight.freq, 0)

> mid_point_01A<-seq(m+5, M-5, by=10)

> mid_point_01A<-c(m, mid_point_01A, M)

> plot(mid_point_01A, freq0)

> lines(mid_point_01A, freq0)

Donglei Du (UNB) ADM 2623: Business Statistics 18 / 35

Page 19: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Ogive: cumulative frequency polygon

An ogive consists of line segments connecting the points formed bythe class upper limits and the class frequency.

A cumulative frequency polygon is used to determine how many orwhat proportion of the data values are below or above a certain value.

Donglei Du (UNB) ADM 2623: Business Statistics 19 / 35

Page 20: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Ogive for the weight example

130 140 150 160 170 180 190

010

20

30

40

50

60

breaks_sec_01A

cumfreq0

Donglei Du (UNB) ADM 2623: Business Statistics 20 / 35

Page 21: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Example

Example: the above example

The R code:

> weight <- read.csv("weight.csv")

> sec_01A<-weight$Weight.01A.2013Fall

> m<-min(sec_01A)

> M<-max(sec_01A)+1

> breaks_sec_01A <- seq(m,M, by=10)

> weight.cut <- cut(sec_01A, breaks_sec_01A, right=FALSE)

> weight.freq <- table(weight.cut)

> weight.cumfreq <- cumsum(weight.freq)

> cumfreq0 = c(0, weight.cumfreq)

> plot(breaks_sec_01A, cumfreq0)

> lines(breaks_sec_01A, cumfreq0)

Donglei Du (UNB) ADM 2623: Business Statistics 21 / 35

Page 22: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

A note

Single value cannot be recovered from the frequency distribution, thatis, information is lost in this process.

The distribution of the data within each groups is unclear.

Question: Are there methods that preserve all information?

Yes! Medium-sized dataNo! Large-sized data

Donglei Du (UNB) ADM 2623: Business Statistics 22 / 35

Page 23: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Stem-and-leaf display

A statistical technique for displaying a set of data, and eachnumerical value is divided into two parts:

the leading digits become the stemthe trailing digits become the leaf.

One advantage of the stem-and-leaf display over a frequencydistribution is that we retain the value of each observation!

Another is the distribution of the data within each groups is clear.

Donglei Du (UNB) ADM 2623: Business Statistics 23 / 35

Page 24: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

How to develop a stem-and-leaf display

Step 1: (Identify the stem) This can be done as follows:

Find the lowest value, record the leading digit.Find the next score with the second highest leadingdigit.Repeat the above until all data are examined

Step 2: (Identify the leaf) list the remaining leaf values based on thestems.

Donglei Du (UNB) ADM 2623: Business Statistics 24 / 35

Page 25: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

stem-and-leaf display for the weight example

The stem-and-leaf display for the weight example

The decimal point is 1 digit(s) to the right of the |:

13 | 05914 | 00000135557815 | 0000000022224555555555916 | 0000000035555517 | 00025518 | 00019 | 0

Donglei Du (UNB) ADM 2623: Business Statistics 25 / 35

Page 26: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Example

Example: the above example

The R code:

> weight <- read.csv("weight.csv")

> sec_01A<-weight$Weight.01A.2013Fall

> stem(sec_01A)

Donglei Du (UNB) ADM 2623: Business Statistics 26 / 35

Page 27: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Layout

1 Quantitative data: table/graphic representationTable representation for quantitative data: Frequency Distribution TableGraphical Representation for quantitative data: histogram and polygonStem-and-leaf display for small/medium sized quantitative data

2 Qualitative data: table/graphic representationGraphical Representation for quantitative data: bar chart and pie chart

Donglei Du (UNB) ADM 2623: Business Statistics 27 / 35

Page 28: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Bar chart

A two dimensional graph which consists of a series of usuallynon-adjacent rectangles, where the horizontal axis is marked by theclasses, and the vertical axis is marked by the class frequencies.

We can put several bar charts together to form stacked (top-below)or clustered bar charts (side-by-side).

Donglei Du (UNB) ADM 2623: Business Statistics 28 / 35

Page 29: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

An example

Example: Here are the five worst passwords: a study shows thefollowing result among 1,000 persons surveyed:

password number of people used

123456 500

12345 300

123456789 100

Password 60

iloveyou 40

Problem: Find the bar chart for the above example

Donglei Du (UNB) ADM 2623: Business Statistics 29 / 35

Page 30: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Bar chart for the password example

123456 12345 123456789 Password iloveyou

0100

200

300

400

500

Donglei Du (UNB) ADM 2623: Business Statistics 30 / 35

Page 31: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Example

Example: the above example

The R code:

> pw <- data.frame (password = c(’123456’, ’12345’,

’123456789’, ’Password’, ’iloveyou’), numberofusers

= c(500, 300, 100, 60, 40))

> barplot(pw$numberofusers, names.arg =pw$password )

Donglei Du (UNB) ADM 2623: Business Statistics 31 / 35

Page 32: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Pie chart

A partitioned circle in which the area of each sector is proportional tothe relative frequency of each category. Usually the number of sectionis no more than 6 or 7 for clear illustration.

Pie Chart is useful for displaying a Relative Frequency Distribution

Donglei Du (UNB) ADM 2623: Business Statistics 32 / 35

Page 33: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

How to develop a pie chart

In order to have the area of each sector to be proportional to therelative frequency, it is equivalent for the angle of each sector to beproportional to the relative frequency.

To develop a pie char, note that 1 percent corresponds to 3.6 degree.

Donglei Du (UNB) ADM 2623: Business Statistics 33 / 35

Page 34: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Pie chart for the password example

123456

12345 123456789

Password

iloveyou

Donglei Du (UNB) ADM 2623: Business Statistics 34 / 35

Page 35: Lecture2: FrequencyDistributionandGraphical Representationddu/2623/Lecture_notes/Lecture2_student.pdf · Lecture2: FrequencyDistributionandGraphical Representation ... 160 170 155

Example

Example: the above example

The R code:

> pw <- data.frame (password = c(’123456’, ’12345’,

’123456789’, ’Password’, ’iloveyou’), numberofusers

= c(500, 300, 100, 60, 40))

> pie(pw$numberofusers, labels =pw$password )

Donglei Du (UNB) ADM 2623: Business Statistics 35 / 35