8/17/2019 BIOSTAT Chapter2
1/57
Chapter 2:Frequency
Distributions
1
8/17/2019 BIOSTAT Chapter2
2/57
Frequency Distributions
After collecting data, the first task for aresearcher is to organize and simplify thedata so that it is possible to get a generaloverview of the results. This is the goal ofdescriptive statistical techniques.
One method for simplifying and organizing data
is to construct a frequency distribution.
2
8/17/2019 BIOSTAT Chapter2
3/57
Frequency Distributions (cont.)
A frequency distribution is an organizedtabulation showing exactly how manyindividuals are located in each category on
the scale of measurement. A frequency distribution presents an
organized picture of the entire set ofscores, and it shows where eachindividual is located relative to others inthe distribution.
3
8/17/2019 BIOSTAT Chapter2
4/57
A table that organizes data values into classesor intervals along with number of values that
fall in each class fre!uency, f ".
1. Ungrouped requency !istribution " fordata sets with few different values. #ach
value is in its own class.
$. %rouped requency !istribution& for datasets with many different values, which
are grouped together in the classes.
FREQUENCY DISTRIBUTIONS(CONT.)
8/17/2019 BIOSTAT Chapter2
5/57
Grouped and Ungrouped
Frequency Distributions
CoursesTaken
Frequency, f
1 25
2 38
3 217
4 1462
5 932
6 15
Ungrouped
Age ofVoters
Frequency, f
18-3 22
31-42 58
43-54 62
55-66 413
67-78 158
78-9 32
%rouped
8/17/2019 BIOSTAT Chapter2
6/57
Ungrouped requency !istributions
!u"#er of $eas %n a $ea $o&'a"()e '%*e+ 5
5 5 4 6 4
3 7 6 3 5
6 5 4 5 5
6 2 3 5 5
5 5 7 4 3
4 5 4 5 6
5 1 6 2 6
6 6 6 6 4
4 5 4 5 3
5 5 7 6 5
$eas (er (o& Freq, f $eas (er (o&
Freq,f
1 1
2 2
3 5
4 9
5 18
6 12
7 3
8/17/2019 BIOSTAT Chapter2
7/57
Frequency Distribution Tables
A frequency distribution table consists of at leasttwo columns # one listing categories on the scale ofmeasurement $" and another for fre!uency f".
%n the $ column, values are listed from the highest
to lowest, without s&ipping any. 'or the fre!uency column, tallies are determined
for each value (how often each X value occursin the data set). (hese tallies are the fre!uenciesfor each $ value.
(he sum of the fre!uencies should e!ual n.
)
8/17/2019 BIOSTAT Chapter2
8/57
Grouped Frequency
Distribution Soeties! ho"e#er! a set o$ scores co#ersa "ide range o$ #alues. %n these situations!a list o$ all the & #alues "ould be quite long
' too long to be a siple presentation o$the data.
To reedy this situation! a grouped
frequency di!ri"u!ion table is used.
8/17/2019 BIOSTAT Chapter2
9/57
Grouped Frequency Distribution(cont.)
%n a grouped table! the & colun listsgroups o$ scores! called c#$ in!er%$#!rather than indi#idual #alues.
These inter#als all ha#e the sae "idth!usually a siple nuber such as 2! *! +,!and so on.
-ach inter#al begins "ith a #alue that is aultiple o$ the inter#al "idth. The inter#al"idth is selected so that the table "ill ha#eapproiately ten inter#als.
8/17/2019 BIOSTAT Chapter2
10/57
/ey Concepts:
Data in its original $or and structureare called r$& d$!$' ungrouped d$!$.
E$p#e* Conider !+e fo##o&ing d$!$on ,- &oen !e!o!erone eru #e%e#
(core) e$ured in g'd#.
/0 12 1, 34 34 ,2 /4 ,1 ,3 50
02 3, 50 24 5, ,5 55 ,5 /3 /4
,- 56 56 /4 12 ,0 34 15 /, ,-
,0 35 21 55 // 32 21 ,2 32 30
8/17/2019 BIOSTAT Chapter2
11/57
0 To con!ruc! $ frequencydi!ri"u!ion of !+e gi%en r$& d$!$7&e 8r! 8nd !+e +ig+e! eru
#e%e# $nd prep$re $ co#un of !+e!+ee #e%e# "eginning fro !+e+ig+e! %$#ue $nd ending $! !+e#o&e! one. Since !+e +ig+e!eru #e%e# i 02 $nd !+e #o&e! i247 &e +$%e*
8/17/2019 BIOSTAT Chapter2
12/57
Seru9e%e#
f Seru9e%e#
f
1 % 32 %%%
4* % *1 %%
45 % ** %
4 %% *5 %
61 % *+ %%66 % 51 %%
63 % 54 %
65 % 53 %
62 %%% 5* %%
31 % 5 %%
3* % 5, %%
35 % 4 %%
3 %% 2 %
8/17/2019 BIOSTAT Chapter2
13/57
7hen these data are placed into a syste "herein theyare organi8ed! then these parta9e the nature o$grouped d$!$. This procedure o$ organi8ing data into
groups is called a frequency di!ri"u!ion !$"#e(FDT).
E$p#e*
The $ollo"ing presents a $requency distribution table o$
the grouped data o$ the urine aylase (scores) o$ +*patients in aylase unitshour.
SCORES FREQUENCY
+,'+1 *
2,'21 5
,'1
5,'1 2
*,'*1 +
+*
8/17/2019 BIOSTAT Chapter2
14/57
Coponen! of $ Frequency T$"#e
C#$ In!er%$#: these are the nubers de;ning theclass< consists o$ the end nubers called the c#$#ii! naely the upper #ii! and the #o&er #ii!.
C#$ frequency: sho"s the nuber o$ obser#ation
$alling in the class C#$ Bound$rie: these are the so called true class
liits! classi;ed as:
9o&er C#$ Bound$ry (9CB): de;ned as the
iddle #alue o$ the lo"er class liit o$ the classand the upper class liit o$ the preceding class
Upper C#$ Bound$ry: de;ned as theiddle #alue bet"een the upper class liit o$
the class and the lo"er liit o$ the net class
8/17/2019 BIOSTAT Chapter2
15/57
C#$ i;e: the di=erence bet"een t"o consecuti#eupper liits or t"o consecuti#e lo"er liits
C#$$r
8/17/2019 BIOSTAT Chapter2
16/57
8/17/2019 BIOSTAT Chapter2
17/57
Steps in ConstructingFDT:
Step +) Deterine the nuber o$ classes. For ;rstapproiation! it is suggested to use the STUR>ESAROI=ATION FOR=U9A.
62.244 #og n
"here $pproi$!e nu"er of c#$e
n nu"er of c#$
# +
8/17/2019 BIOSTAT Chapter2
18/57
E$p#e* Ge no& con!ruc! !+e FDTof !+e !e!o!erone eru #e%e# of ,-&oen $ +o&n in !+e r$& d$!$*
8/17/2019 BIOSTAT Chapter2
19/57
Step 2) Deterine the range @:
"here R $iu %$#ue:iniu %$#ue
R $:in
R 02:24
R 36
8/17/2019 BIOSTAT Chapter2
20/57
Step ) Deterine the approiate class si8e C usingthe $orula
C R'
where R= range & K= Sturges Approximation
Formula
No!e* %t is usually con#enient to round o= C to anearest "hole nuber.
C R' C 36'3
C 6-.63/
C6-
8/17/2019 BIOSTAT Chapter2
21/57
Step 5) Deterine the lo"est class inter#al (or the;rst class). This class should include the iniu#alue in the data set. For uni$ority! let us agreethat $or our purposes! the lo"er liit o$ the lo"est
class inter#al should start at the iniu #alue.
9e! u decide !o !$r! $! !+e iniu%$#ue. T+u !+e #o&e! c#$ i !+ec#$ 24:,6.
8/17/2019 BIOSTAT Chapter2
22/57
Step *) Deterine all class liits by adding theclass si8e C to the liits o$ the pre#ious class.
The classes constructed by adding +, each class
liit. Thus "e ha#e:
24 ,6
,4 56
54 36
34 /6
/4 16
14 06
04 6-6
Step 3: Tally the scores or obser#ation $alling in each class
8/17/2019 BIOSTAT Chapter2
23/57
Step 3: Tally the scores or obser#ation $alling in each class.
C#$e
24 : ,6
,4 : 56
54 : 36
34 : /6
/4 : 16
14 : 06
04 :6-6
T$##y
%%%%
%%%%'%%%%
%%%%
%%%%'%%%
%%%%'%%%
%%%%
%
Frequency
5
0
,
1
1
5
6
N,-
8/17/2019 BIOSTAT Chapter2
24/57
T+e fo##o&ing !$"#e preen! !+e cop#e!e frequencydi!ri"u!ion !$"#e indic$!ing !+e c#$ "ound$rie7 !+ec#$ $r
8/17/2019 BIOSTAT Chapter2
25/57
Frequency Distribution
Graphs %n a frequency di!ri"u!ion gr$p+! the score
categories (& #alues) are listed on the & ais
and the $requencies are listed on the A ais. G+en !+e core c$!egorie coni! of
nueric$# core fro $n in!er%$# or r$!ioc$#e7 !+e gr$p+ +ou#d "e ei!+er $
+i!ogr$ or $ po#ygon.
8/17/2019 BIOSTAT Chapter2
26/57
Bistogras
%n a +i!ogr$! a bar is centered abo#e eachscore (or class inter#al) so that the height o$the bar corresponds to the $requency and the
"idth etends to the real liits! so thatadacent bars touch.
8/17/2019 BIOSTAT Chapter2
27/57
8/17/2019 BIOSTAT Chapter2
28/57
olygons
%n a po#ygon! a dot is centered abo#e eachscore so that the height o$ the dot
corresponds to the $requency. The dots arethen connected by straight lines. Enadditional line is dra"n at each end to bringthe graph bac9 to a 8ero $requency.
2*
8/17/2019 BIOSTAT Chapter2
29/57
8/17/2019 BIOSTAT Chapter2
30/57
?ar graphs
7hen the score categories (& #alues) areeasureents $ro a noinal or an
ordinal scale! the graph should be a bargraph.
E "$r gr$p+ is ust li9e a histograecept that gaps or spaces are le$t
bet"een adacent bars.
3+
8/17/2019 BIOSTAT Chapter2
31/57
8/17/2019 BIOSTAT Chapter2
32/57
@elati#e $requency
any populations are so large that it isipossible to 9no" the eact nuber o$
indi#iduals ($requency) $or any speci;ccategory.
%n these situations! population distributionscan be sho"n using re#$!i%e frequency
instead o$ the absolute nuber o$ indi#iduals$or each category.
32
8/17/2019 BIOSTAT Chapter2
33/57
8/17/2019 BIOSTAT Chapter2
34/57
Sooth cur#e
%$ the scores in the population are easured onan inter#al or ratio scale! it is custoary to
present the distribution as a oo!+ cur%e rather than a agged histogra or polygon.
The sooth cur#e ephasi8es the $act that thedistribution is not sho"ing the eact $requency$or each category.
3
8/17/2019 BIOSTAT Chapter2
35/57
8/17/2019 BIOSTAT Chapter2
36/57
Frequency distribution
graphs Frequency distribution graphs are use$ul
because they sho" the entire set o$ scores.
Et a glance! you can deterine the highestscore! the lo"est score! and "here the scoresare centered.
The graph also sho"s "hether the scores areclustered together or scattered o#er a "ide
range.
3-
8/17/2019 BIOSTAT Chapter2
37/57
Shape
E graph sho"s the +$pe o$ the distribution.
E distribution is ye!ric$# i$ the le$t sideo$ the graph is (roughly) a irror iage o$ theright side.
ne eaple o$ a syetrical distribution isthe bell'shaped noral distribution.
n the other hand! distributions are
8/17/2019 BIOSTAT Chapter2
38/57
ositi#ely andIegati#elyS9e"ed Distributions
%n a poi!i%e#y
8/17/2019 BIOSTAT Chapter2
39/57
8/17/2019 BIOSTAT Chapter2
40/57
ercentiles! ercentile
@an9s!and %nterpolation The relati#e location o$ indi#idual scores
"ithin a distribution can be described bypercentiles and percentile ran9s.
The percen!i#e r$n
8/17/2019 BIOSTAT Chapter2
41/57
ercentiles! ercentile
@an9s!and %nterpolation (cont.) To ;nd percentiles and percentile ran9s! t"o ne"
coluns are placed in the $requency distribution table:ne is $or cuulati#e $requency (c$) and the other is $orcuulati#e percentage (cJ).
-ach cuulati#e percentage identi;es the percentileran9 $or the upper real liit o$ the corresponding scoreor class inter#al. 7hen scores or percentages do notcorrespond to upper real liits or cuulati#epercentages! you ust use interpolation to deterinethe corresponding ran9s and percentiles. In!erpo#$!ion is a atheatical process based on the assuption thatthe scores and the percentages change in a regular!linear $ashion as you o#e through an inter#al $ro oneend to the other.
1
8/17/2019 BIOSTAT Chapter2
42/57
%nterpolation
7hen scores or percentages do notcorrespond to upper real liits orcuulati#e percentages! you ust useinterpolation to deterine thecorresponding ran9s and percentiles.
In!erpo#$!ion is a atheatical processbased on the assuption that the scores
and the percentages change in a regular!linear $ashion as you o#e through aninter#al $ro one end to the other.
2
8/17/2019 BIOSTAT Chapter2
43/57
8/17/2019 BIOSTAT Chapter2
44/57
Ste'and'>ea$ Displays
E !e:$nd:#e$f dip#$y pro#ides a #eryeKcient ethod $or obtaining anddisplaying a $requency distribution.
-ach score is di#ided into a !e consisting o$ the ;rst digit or digits! and a
#e$f consisting o$ the ;nal digit. Finally! you go through the list o$ scores!
one at a tie! and "rite the lea$ $or eachscore beside its ste.
The resulting display pro#ides anorgani8ed picture o$ the entire distribution. The nuber o$ lea$s beside each stecorresponds to the $requency! and theindi#idual lea$s identi$y the indi#idualscores.
8/17/2019 BIOSTAT Chapter2
45/57
D i ti St ti ti
8/17/2019 BIOSTAT Chapter2
46/57
Descripti#e Statistics
Class E''%Ls o$ + Students
+,2 ++*
+24 +,1++ 41
14 +,3
+5, ++1
1 16++,
Class ?''%Ls o$ + Students
+26 +32
++ +,13 +++
4, +,1
1 46
+2, +,*+,1
'a"()e ))ustrat%on+
.%c. /rou( %s '"arter0
Each individual may be different. If you try to understand agroup by remembering the qualities of each member, you
become overwhelmed and fail to understand the group.
8/17/2019 BIOSTAT Chapter2
47/57
Descripti#e Statistics
Which group is smarter now?
Class A--Average IQ Class B--Average IQ
110.54 110.23
They’re roughly the same!
With a summary descriptive statistic, it ismuch easier to answer our question.
8/17/2019 BIOSTAT Chapter2
48/57
ther Graphs
Beide Hi!ogr$7 !+ere $re o!+ere!+od of gr$p+ing qu$n!i!$!i%ed$!$*
0 S!e $nd 9e$f #o!0 Do! #o!
0 Tie Serie
8/17/2019 BIOSTAT Chapter2
49/57
Ste and >ea$ lots
Repreen! d$!$ "y ep$r$!ing e$c+ d$!$ %$#ue in!o !&op$r!* !+e !e (uc+ $ !+e #ef!o! digi!) $nd !+e #e$f (uc+$ !+e rig+!o! digi!)
Larson/Farber 4th ed. 49
8/17/2019 BIOSTAT Chapter2
50/57
Constructing Ste and >ea$
lots0 Split each data #alue at the sae place #alue to $or the
!e and a #e$f . (7ant *'2, stes).
Errange all possible stes #ertically so there are noissing stes.
7rite each lea$ to the right o$ its ste! in order.
Create a 9ey to recreate the data.
Mariations o$ ste plots:1. Split stems
2. Back to back stem plots.
+
C t ti St d
8/17/2019 BIOSTAT Chapter2
51/57
Constructing a Ste'and'>ea$ lot
1
8/17/2019 BIOSTAT Chapter2
52/57
Dot lots
Do! p#o!
0 Consists o$ a graph in "hich each data #alue isplotted as a point along a scale o$ #alues
igure $'(
i i
8/17/2019 BIOSTAT Chapter2
53/57
Tie Series(aired data)
Tie Serie
Data set is coposed o$ quantitati#e entriesta9en at regular inter#als o#er a period o$ tie.
e.g.! The aount o$ precipitation easuredeach day $or one onth.
Use a !ie erie c+$r! to graph.
tie
L u a n t i t a t i #
e
d a
t a
Time')eries %raph
8/17/2019 BIOSTAT Chapter2
54/57
Time )eries %raph
/umber of 0creens at rive#%n ovies
(heaters
igure $'*
8/17/2019 BIOSTAT Chapter2
55/57
Graphing Lualitati#e Data Sets
ie C+$r!
E circle is di#ided intosectors that representcategories.
+areto hart
A vertical bar graph in which theheight of each bar represents
fre!uency or relative fre!uency.
Categories
F r e q u e n
c y
C t ti i Ch t
8/17/2019 BIOSTAT Chapter2
56/57
Constructing a ie Chart
Find !+e !o!$# $p#e i;e.
Con%er! !+e frequencie !o re#$!i%e frequencie (percen!).
-
ar%ta) 'tatus Frequency, f( %n "%))%ons
e)at%e frequency
!eer arr%e& 553
arr%e& 1277
%&oe& 139
%orce& 228
(otal4 215.)
55325 or 25
2197≈
1277
2197≈
139
2197≈
228
2197≈
8/17/2019 BIOSTAT Chapter2
57/57
Constructing areto Charts
0 Create a bar $or each category! "here the heighto$ the bar can represent $requency or relati#e$requency.
The bars are o$ten positioned in order o$
decreasing height! "ith the tallest bar positionedat the le$t.