8/8/2019 IDA Central Tendency
1/20
Initial Data AnalysisCentral Tendency
8/8/2019 IDA Central Tendency
2/20
Ou tlineWhat is central tendency?Classic meas u res
Mean, Median, ModeWhats an average?Properties of statistics
Su fficiencyEfficiency
BiasResistance
Resistant meas u res
8/8/2019 IDA Central Tendency
3/20
Meas u res of Central TendencyWhile distrib u tions provide an overall pict u re of some data set, it is sometimes desirable to representsome property of the entire data set u sing a singlestatisticThe first descriptive statistic we will disc u ss arethose u sed to indicate where the center of thedistrib u tion lies.
The expected val u eIt is not a val u e that has to be in the dataset itself There are different meas u res of central tendency,each with their own advantages and disadvantages
8/8/2019 IDA Central Tendency
4/20
The ModeThe mode is simply the val u e of the relevant variable thatocc u rs most often (i.e., has the highest freq u ency) in thesample
Note that if yo u have done a freq u ency histogram, yo u canoften identify the mode simply by finding the val u e with thehighest bar.
However, that will not work when grou ping was performed prior to plotting the histogram (altho u gh yo u can still u se the
histogram to identify the modal gro u p, j u st not the modalval u e).
Modes in partic u lar are probably best applied to nominal data
8/8/2019 IDA Central Tendency
5/20
ModeAdvantages
Very q u ick and easy to determineIs an act u al val u e of the data
Not affected by extreme scores
Disadvantages
Sometimes not very informative (e.g. cigarettes smoked ina day)Can change dramatically from sample to sampleMight be more than one (which is more representative?)
8/8/2019 IDA Central Tendency
6/20
The Median
M edian Location = N + 12
The median is the point corresponding to the score that lies inthe middle of the distrib u tion (i.e., there are as many data
points above the median as there are below the median).
To find the median, the data points m u st first be sorted intoeither ascending or descending n u merical order.The position of the median val u e can then be calc u lated u singthe following form u la:
8/8/2019 IDA Central Tendency
7/20
MedianAdvantage:
Resistant to o u tliers
Disadvantage:May not be so informative:(1, 1, 2, 2, 2, 2, 5, 6, 9, 9, 10 )
Does the val u e of 2 really represent this sample as awhole very well?
8/8/2019 IDA Central Tendency
8/20
The Mean
X X
N
!
X The most commonly u sed meas u re of centraltendency is called the mean (denoted for
a sample, and for a pop u lation).
The mean is the same of what many of u s call
the average, and it is calc u lated in thefollowing manner:
8/8/2019 IDA Central Tendency
9/20
Mode vs. Median vs. MeanWhen there is only one mode and distrib u tionis fairly symmetrical the three meas u res (as
well as others to be disc u ssed) will havesimilar val u es
However, when the u nderlying distrib u tion isnot symmetrical, the three meas u res of centraltendency can be q u ite different.
8/8/2019 IDA Central Tendency
10/20
Some Vis u al DemosHere is a demonstration 1 that allows yo u to change afreq u ency histogram while sim u ltaneo u sly noting theeffects of those changes on the mean vers u s the
median.
As yo u u se the demo, yo u sho u ld fairly easily beable to think abo u t how these changes are alsoaffecting the mode
Note that the order wo u ld go Mode Median thenMean in the direction the tail is pointing.
8/8/2019 IDA Central Tendency
11/20
Whats an average?Weve been referring to the mean witho u t qu alification, b u t infact there are many types of averages, and that is only oneThe mean we typically u se is the ar ithmetic mean
Along with the g eomet r ic mean and har monic mean, they arethe Pythagorean means .In their calc u lation, the Arithmetic mean is greater than or eq u al tothe Geometric mean, which is greater than or eq u al to the harmonicmean
The geometric mean for n val u es is to m u ltiply them all andtake the n th root of that n u mber The harmonic mean can be seen as the reciprocal 1 of thearithmetic mean of the reciprocals of all the val u es of thevariable in q u estion 2
8/8/2019 IDA Central Tendency
12/20
More meansThe geometric mean is partic u larly appropriate for exponential type of data
E.g. H u man pop u lation over a period of time
The harmonic mean is good for things like rates andratios where an arithmetic mean wo u ld act u ally beincorrect 1, bu t whenever yo u see an AN O VA withu neq u al sample sizes, the far and away mostcommon proced u re u ses the harmonic mean of sample sizes
As a res u lt, an u nbalanced design will have less statistical power beca u se the average sample size will tend towardthe least sample
8/8/2019 IDA Central Tendency
13/20
More meansWeighted averagesSometimes we will want to weight a meas u re of
some variable by the val u es of some other variableE.g. If each person gets a score on several items and wewant an average of the tot al sco r e fo r ea ch pe r son a cr ossthe items , we might weight them by 1/variance to give themore consistent scorers more importance in thecalc u lation
The arithmetic mean is a weighted average in whichall weights 1.
8/8/2019 IDA Central Tendency
14/20
Properties of a Statistic: Sampling
Distrib u tionIn order to examine the propertiesof a statistic we often want to take
repeated samples from some pop u lation of data and calc u latethe relevant statistic on eachsample.We can then look at thedistrib u tion of the statistic acrossthese samples and ask a variety of qu estions abo u t it.
8/8/2019 IDA Central Tendency
15/20
Properties of a StatisticSu fficiency
A su fficient statistic is one that makes u se of all of the information inthe sample to estimate its corresponding parameter
For example, this property makes the mean more attractive as a meas u re
of central tendency compared to the mode or median.Unbiasedness
A statistic is said to be an u nbiased estimator if its expected val u e(i.e., the mean of a n u mber of sample means) is eq u al to the
pop u lation parameter it is estimating.As one can see u sing the resampling proced u re, the mean can be shown
to be an u nbiased estimator
8/8/2019 IDA Central Tendency
16/20
Properties of a StatisticEfficiency
The efficiency of a statistic is reflected in the variance that is observedwhen one examines the statistic over independently chosen samples
Standard error
The smaller the variance, the more efficient the statistic is said to beResistance
The resistance of an estimator refers to the degree to which thatestimate is effected by extreme val u es i.e. o u tliersSmall changes in the data res u lt in only small changes in estimateFinite-sample breakdown point
Measu
re of resistance to contaminationThe smallest proportion of observations that, when altered s u fficiently,can render the statistic arbitrarily large or small
Median n/2Trimmed mean whatever the trimming amo u nt isMean 1/n
8/8/2019 IDA Central Tendency
17/20
Resistant meas u res of central tendencyTrimmed mean
Created by trimming some percentage of thehigh and low ends of the dataThe median is act u ally a trimmed estimate
Windsorized meanM-estimators
Extreme val u es are given less weight than those closer tothe center of the distrib u tion.May be more rob u st than mean or median for certaintypes of f u nky data
8/8/2019 IDA Central Tendency
18/20
Practical ExampleAdminister the BDI to 10 randomly selected UNT st u dents8 of the st u dents score less than 25, two scored greater than 45.8, 12, 6, 16, 10, 20, 22, 25, 47, 55
Median 18Mean 22.1
Which is more acc u rate regarding generalization to the typicalUNT st u dent? O ne that incl u des:
Two people that perhaps reversed their ratings on the items?A score that was miskeyed ( u sing the n u mber pad they hit a 4 insteadof 1 leading to a score of 47)?Two people who do not have English as their native lang u age?Two people that did not answer honestly?Two people that are act u ally clinically depressed?O ne that is clinically depressed, one that j u st wants to be different?
8/8/2019 IDA Central Tendency
19/20
Practical ExampleWhile many think of o u tliers as representing the complexity of h u mannatu re1 the iss u e more revolves aro u nd inadeq u ate data collection todetect why the score is what it is and problematic pop u lation description
E.g. my definition of typical UNT st u dent, if s u ch a thing co u ld be said toexist at all, is not one that is on s u icide watchHowever, the previo u s problem most likely represents an attempt togeneralize to something that doesnt exist.Better pop u lations to try and represent: UNT Texans, UNT Psych gradstu dents, UNT international st u dents, UNT st u dents who have visited C & Tin the last semester (in which case those wo u ld probably not be o u tliers) etc.
Application to c u rrent events: Do yo u really think there is a middle
America, a female vote etc. to which the presidential candidates aretrying to appeal? There are demographics, very specific ones yes, b u tthose connotations do little to note the specifics.
8/8/2019 IDA Central Tendency
20/20
Su mmaryFavoritism for the arithmetic mean is the res u lt of familiarityonly 1, and u ntil yo u came to this co u rse yo u wo u ld have beenhard-pressed to explain yo u r preference o u tside of arg u ments
from au
thorityThe AM is to be val u ed for some properties it has relative toother meas u res (s u fficiency, efficiency, u nbiased), and alsorejected for the same reason (least amo u nt of resistance)In many cases its entirely ina pp r op r ia te to u se the AM as itwo u ld be a distorted view of cent ral tendencyWhich statistics yo u u se to represent yo u r data sho u ld beconsidered as m u ch as the meas u res themselves.