Top Banner
Chapter 3, Numerical Descriptive Measures • Data analysis is objective – Should report the summary measures that best meet the assumptions about the data set • Data interpretation is subjective – Should be done in fair, neutral and clear manner
21

Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Mar 31, 2015

Download

Documents

Kaelyn Gallup
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Chapter 3, Numerical Descriptive Measures

• Data analysis is objective– Should report the summary measures that best

meet the assumptions about the data set

• Data interpretation is subjective– Should be done in fair, neutral and clear

manner

Page 2: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Summary Measures

Arithmetic Mean

Median

Mode

Describing Data Numerically

Variance

Standard Deviation

Coefficient of Variation

Range

Interquartile Range

Geometric Mean

Skewness

Central Tendency Variation Shape

Quartiles

Page 3: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Arithmetic Mean• The arithmetic mean (mean) is the most common measure of

central tendency

• Mean = sum of values divided by the number of values

• Affected by extreme values (outliers)

Sample size

n

XXX

n

XX n21

n

1ii

Observed values

Page 4: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Geometric Mean• Geometric mean

– Used to measure the rate of change of a variable over time

• Geometric mean rate of return– Measures the status of an investment over time

– Where Ri is the rate of return in time period I

n/1n21G )XXX(X

1)]R1()R1()R1[(R n/1n21G

Page 5: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Median: Position and Value• In an ordered array, the median is the “middle”

number (50% above, 50% below)• The location (position) of the median:

• The value of median is NOT affected by extreme values

dataorderedtheinposition2

1npositionMedian

Page 6: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Mode

• A measure of central tendency

• Value that occurs most often

• Not affected by extreme values

• Used for either numerical or categorical data

• There may may be no mode

• There may be several modes

Page 7: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Quartiles• Quartiles split the ranked data into 4 segments

with an equal number of values per segment• Find a quartile by determining the value in the

appropriate position in the ranked data, where

First quartile position: Q1 = (n+1)/4

Second quartile position: Q2 =2 (n+1)/4 (the median position)

Third quartile position: Q3 = 3(n+1)/4

where n is the number of observed values

Page 8: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Same center,

different variation

Measures of VariationVariation

Variance Standard Deviation

Coefficient of Variation

Range Interquartile Range

• Measures of variation give information on the spread or variability of the data values.

Page 9: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Range and Interquartile Rage

• Range– Simplest measure of variation

– Difference between the largest and the smallest observations:

Range = Xlargest – Xsmallest

– Ignores the way in which data are distributed

– Sensitive to outliers

• Interquartile Range– Eliminate some high- and low-valued observations and calculate

the range from the remaining values

– Interquartile range = 3rd quartile – 1st quartile

= Q3 – Q1

Page 10: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

• Average (approximately) of squared deviations of values from the mean

– Sample variance:

Variance

1-n

)X(XS

n

1i

2i

2

Where = arithmetic mean

n = sample size

Xi = ith value of the variable X

X

Page 11: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Standard Deviation• Most commonly used measure of variation

• Shows variation about the mean

• Has the same units as the original data

• It is a measure of the “average” spread around the mean

• Sample standard deviation:

1-n

)X(XS

n

1i

2i

Page 12: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Coefficient of Variation

• Measures relative variation

• Always in percentage (%)

• Shows variation relative to mean

• Can be used to compare two or more sets of data measured in different units

100%X

SCV

Page 13: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Shape of a Distribution

• Describes how data are distributed

• Measures of shape– Symmetric or skewed

Mean = Median Mean < Median Median < Mean

Right-SkewedLeft-Skewed Symmetric

Page 14: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Using the Five-Number Summary to Explore the Shape

• Box-and-Whisker Plot: A Graphical display of data using 5-number summary:

• The Box and central line are centered between the endpoints if data are symmetric around the median

Minimum, Q1, Median, Q3, Maximum

Min Q1 Median Q3 Max

Page 15: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Distribution Shape and Box-and-Whisker Plot

Right-SkewedLeft-Skewed Symmetric

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3

Page 16: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

• If the data distribution is bell-shaped, then the interval:– contains about 68% of the values in the population or

the sample

– contains about 95% of the values in the population or the sample

– contains about 99.7% of the values in the population or the sample

Relationship between Std. Dev. And Shape: The Empirical Rule

1σμ

2σμ

3σμ

Page 17: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Population Mean and Variance

N

μ)(Xσ

N

1i

2i

2

Population variance

N

XXX

N

XN21

N

1ii

Population Mean

Page 18: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Covariance and Coefficient of Correlation

• The sample covariance measures the strength of the linear relationship between two variables (called bivariate data)

• The sample covariance:

• Only concerned with the strength of the relationship

• No causal effect is implied

1n

)YY)(XX()Y,X(cov

n

1iii

Page 19: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

• Covariance between two random variables:• cov(X,Y) > 0 X and Y tend to move in the same direction

• cov(X,Y) < 0 X and Y tend to move in opposite directions

• cov(X,Y) = 0 X and Y are independent

• Covariance does not say anything about the relative strength of the relationship.

• Coefficient of Correlation measures the relative strength of the linear relationship between two variables

YXn

1i

2i

n

1i

2i

n

1iii

SS

)Y,X(cov

)YY()XX(

)YY)(XX(r

Page 20: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

• Coefficient of Correlation:

– Is unit free

– Ranges between –1 (perfect negative) and 1(perfect

positive)

– The closer to –1, the stronger the negative linear

relationship

– The closer to 1, the stronger the positive linear

relationship

– The closer to 0, the weaker any positive linear

relationship

– At 0 there is no relationship at all

Page 21: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.

Correlation vs. Regression• A scatter plot (or scatter diagram) can be used

to show the relationship between two variables

• Correlation analysis is used to measure strength of the association (linear relationship) between two variables– Correlation is only concerned with strength of the

relationship

– No causal effect is implied with correlation