1 Review of Measures of Central Tendency, Dispersion & Association • Graphical Excellence • Measures of Central Tendency – Mean, Median, Mode • Measures of Dispersion – Variance, Standard Deviation, Range • Measures of Association – Covariance, Correlation Coefficient • Relationship of basic stats to OLS
44
Embed
Review of Measures of Central Tendency, Dispersion & Association
Review of Measures of Central Tendency, Dispersion & Association. Graphical Excellence Measures of Central Tendency Mean, Median, Mode Measures of Dispersion Variance, Standard Deviation, Range Measures of Association Covariance, Correlation Coefficient Relationship of basic stats to OLS. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Review of Measures of Central Tendency, Dispersion & Association
• Graphical Excellence• Measures of Central Tendency
– Mean, Median, Mode• Measures of Dispersion
– Variance, Standard Deviation, Range• Measures of Association
– Covariance, Correlation Coefficient• Relationship of basic stats to OLS
• Graphs and descriptive statistics when used properly can summarize lines of data effectively for the reader. What’s a good approximation of the age of students in this class?
• We use graphs and basic stats (Mean, Variance, Covariance) etc to highlight trends and to motivate the research question.
• We use other tools for analysis – Regression, Case Study, Content Analysis etc.
4
What story does this graph tell? What questions does the graph raise?
• The graph presents large data sets concisely and coherently – label your axes
• The ideas and concepts to be delivered are clearly understood to the viewer – state the units used (EX: $ or $ in Mil. etc.)
6
What’s the problem here?
30.5026.09
43.51
77.67
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
1950
1991
19501991
7
Graphical Excellence
• The display induces the viewer to address the substance of the data and not the form of the graph. – Select the appropriate type of graph (bar chart for levels, scatter plot for trends etc.)
• There is no distortion of what the data reveal. – Make sure the axes are not stretched or compressed to make a point
8
Do New Stadiums Bring People in?
2.18 3.01 2.04
00.5
11.5
22.5
33.5
Average Annual Attendance
Mill
ions
MLB Average New Stadiums Old Stadiums
9
Do New Stadiums Bring People in?
2.18
3.01
2.040
5
10
15
20
Average Annual Attendance
Mill
ions
MLB Average New Stadiums Old Stadiums
10
Things to be cautious about when observing a graph:
– Is there a missing scale on one axis.
– Do not be influenced by a graph’s caption.– Are changes presented in absolute values only,
or in percent form too.
11
Numerical Descriptive Measures
• Measures of Central Tendency– Mean, Median, Mode
• Measures of Dispersion– Variance, Standard Deviation
• Measures of Association– Covariance, Correlation Coefficient
12
nx
x in
1i
– This is the most popular and useful measure of central location
Sum of the measurementsNumber of measurementsMean =
Sample mean Population mean
Nx i
N1i
Sample size Population size
nx
x in
1i
Arithmetic mean
13
6xxxxxx
6x
x 654321i6
1i
• Example 1The mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is given by
7 3 9 4 64.5
• Example 2Suppose the telephone bills of example 2.1 represent population of measurements. The population mean is
200x...xx
200x 20021i
2001i 42.19 15.30 53.21
43.59
2
14
• Example 3When many of the measurements have the same value, the measurement can be summarized in a frequency table. Suppose the number of children in a sample of 16 employees were recorded as follows:
NUMBER OF CHILDREN 0 1 2 3NUMBER OF EMPLOYEES 3 4 7 2
16 employees
5.116
)3(2)2(7)1(4)0(316
x...xx16
xx 1621i
161i
15
26,26,28,29,30,32,60,31
Odd number of observations
26,26,28,29,30,32,60
Example 4Seven employee salaries were recorded (in 1000s) : 28, 60, 26, 32, 30, 26, 29.Find the median salary.
– The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude.
Suppose one employee’s salary of $31,000was added to the group recorded before.Find the median salary.
Even number of observations
26,26,28,29, 30,32,60,3126,26,28,29, 30,32,60,31There are two middle values!
First, sort the salaries.Then, locate the value in the middle
First, sort the salaries.Then, locate the values in the middle26,26,28,29, 30,32,60,3129.5,
The median
16
– The mode of a set of measurements is the value that occurs most frequently.
– Set of data may have one mode (or modal class), or two or more modes.
The modal classFor large data setsthe modal class is much more relevant than the a single-value mode.
The mode
17
– Example 5• The manager of a men’s store observes the waist
size (in inches) of trousers sold yesterday: 31, 34, 36, 33, 28, 34, 30, 34, 32, 40.
• The mode of this data set is 34 in.
This information seems valuable (for example, for the design of a new display in the store), much more than “ the median is 33.2 in.”.
18
Relationship among Mean, Median, and Mode
• If a distribution is symmetrical, the mean, median and mode coincide
• If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ.
A positively skewed distribution(“skewed to the right”)
MeanMedian
Mode
19
`
• If a distribution is symmetrical, the mean, median and mode coincide
• If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ.
A positively skewed distribution(“skewed to the right”)
MeanMedian
Mode MeanMedian
Mode
A negatively skewed distribution(“skewed to the left”)
20
Measures of variability(Looking beyond the average)
• Measures of central location fail to tell the whole story about the distribution.
• A question of interest still remains unanswered:
How typical is the average value of all the measurements in the data set?
How much spread out are the measurements about the average value?
or
21
Observe two hypothetical data sets
The average value provides a good representation of thevalues in the data set.
Low variability data set
High variability data set
The same average value does not provide as good presentation of thevalues in the data set as before.
This is the previous data set. It is now changing to...
22
– The range of a set of measurements is the difference between the largest and smallest measurements.
– Its major advantage is the ease with which it can be computed.
– Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points.
? ? ?
But, how do all the measurements spread out?
Smallestmeasurement
Largestmeasurement
The range cannot assist in answering this questionRange
The range
23
– This measure of dispersion reflects the values of all the measurements.
– The variance of a population of N measurements x1, x2,…,xN having a mean is defined as
– The variance of a sample of n measurementsx1, x2, …,xn having a mean is defined as
N
)x( 2i
N1i2
x
1n
)xx(s
2i
n1i2
The variance
Excel uses Varp formula
Excel uses Var formula
24
Consider two small populations:Population A: 8, 9, 10, 11, 12Population B: 4, 7, 10, 13, 16
1098
74 10
11 12
13 16
8-10= -2
9-10= -111-10= +1
12-10= +2
4-10 = - 6
7-10 = -3
13-10 = +3
16-10 = +6
Sum = 0
Sum = 0
The mean of both populations is 10...
…but measurements in Bare much more dispersedthen those in A.
Thus, a measure of dispersion is needed that agrees with this observation.
Let us start by calculatingthe sum of deviations
A
B
The sum of deviations is zero in both cases,therefore, another measure is needed.
25
1098
74 10
11 12
13 16
8-10= -2
9-10= -111-10= +1
12-10= +2
4-10 = - 6
7-10 = -3
13-10 = +3
16-10 = +6
Sum = 0
Sum = 0
A
B
The sum of deviations is zero in both cases,therefore, another measure is needed.
The sum of squared deviationsis used in calculating the variance.See example next.
26
Let us calculate the variance of the two populations
185
)1016()1013()1010()107()104( 222222B
25
)1012()1011()1010()109()108( 222222A
Why is the variance defined as the average squared deviation?Why not use the sum of squared deviations as a measure of dispersion instead?
After all, the sum of squared deviations increases in magnitude when the dispersionof a data set increases!!
27
Which data set has a larger dispersion?
1 3 1 32 5
A B
Data set Bis more dispersedaround the mean
Let us calculate the sum of squared deviations for both data sets
SumA = (1-2)2 +…+(1-2)2 +(3-2)2 +… +(3-2)2= 10
SumB = (1-3)2 + (5-3)2 = 8
5 times 5 times
However, when calculated on “per observation” basis (variance), the data set dispersions are properly ranked
A2 = SumA/N = 10/5 = 2
B2 = SumB/N = 8/2 = 4!
28
– Example 6• Find the mean and the variance of the following
sample of measurements (in years).
3.4, 2.5, 4.1, 1.2, 2.8, 3.7– Solution
n
)x(x
1n1
1n
)xx(s
2i
n1i2
i
n
1i
2i
n1i2
95.26
7.176
7.38.22.11.45.24.36
xx i
61i
A shortcut formula
=[3.42+2.52+…+3.72]-[(17.7)2/6] = 1.075 (years)2
29
– The standard deviation of a set of measurements is the square root of the variance of the measurements.
– Example 4.9• Rates of return over the past 10 years for two mutual
funds are shown below. Which one have a higher level of risk?Fund A: 8.3, -6.2, 20.9, -2.7, 33.6, 42.9, 24.4, 5.2, 3.1, 30.05Fund B: 12.1, -2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, -1.3, 11.4
2
2
:deviationandardstPopulation
ss:deviationstandardSample
30
– Solution– Let us use the Excel printout that is run from the
Mean 16 Mean 12Standard Error 5.295 Standard Error 3.152Median 14.6 Median 11.75Mode #N/A Mode #N/AStandard Deviation 16.74 Standard Deviation 9.969Sample Variance 280.3 Sample Variance 99.37Kurtosis -1.34 Kurtosis -0.46Skewness 0.217 Skewness 0.107Range 49.1 Range 30.6Minimum -6.2 Minimum -2.8Maximum 42.9 Maximum 27.8Sum 160 Sum 120Count 10 Count 10
Fund A should be consideredriskier because its standard deviation is larger
31
– The coefficient of variation of a set of measurements is the standard deviation divided by the mean value.
– This coefficient provides a proportionate measure of variation.
CV :variation oft coefficien Population
xs
cv :variation oft coefficien Sample
A standard deviation of 10 may be perceivedas large when the mean value is 100, but only moderately large when the mean value is 500
The coefficient of variation
32
Interpreting Standard Deviation• The standard deviation can be used to
– compare the variability of several distributions
– make a statement about the general shape of a distribution.
33
Measures of Association
• Two numerical measures are presented, for the description of linear relationship between two variables depicted in the scatter diagram.– Covariance - is there any pattern to the way two
variables move together? – Correlation coefficient - how strong is the
linear relationship between two variables
34
N
)y)((xY)COV(X,covariance Population yixi
x (y) is the population mean of the variable X (Y)
N is the population size. n is the sample size.
1-n
)y)((xY)cov(X,covariance Sample yixi
The covariance
Excel uses this formula to calculate Cov
NOTE: The formula in Excel does not give you sample covariance
35
• If the two variables move in two opposite directions, (one increases when the other one decreases), the covariance is a large negative number.
• If the two variables are unrelated, the covariance will be close to zero.
• If the two variables move the same direction, (both increase or both decrease), the covariance is a large positive number.
36
– This coefficient answers the question: How strong is the association between X and Y.
yx
)Y,X(COV
ncorrelatio oft coefficien Population
yxss)Y,Xcov(
r
ncorrelatio oft coefficien Sample
The coefficient of correlation
37
COV(X,Y)=0 or r =
+1
0
-1
Strong positive linear relationship
No linear relationship
Strong negative linear relationship
or
COV(X,Y)>0
COV(X,Y)<0
38
• If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship).
• If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship).
• No straight line relationship is indicated by a coefficient close to zero.
39
– Example 7• Compute the covariance and the coefficient of
correlation to measure how advertising expenditure and sales level are related to one another.
• Base your calculation on the data provided in example 2.3
n
xx)xx(
nyx
yx)yy)(xx(
FurmulasShortcut
2n1i2
in
1i2
in
1i
in
1iin
1iii
n1iii
n1i
Advert Sales1 303 405 404 502 355 503 352 25
40
• Use the procedure below to obtain the required summations
• Interpretation– The covariance (10.2679) indicates that
advertisement expenditure and sales levelare positively related
– The coefficient of correlation (.797) indicates that there is a strong positive linear relationship between advertisement expenditure and sales level.