Top Banner
Descriptive Statistics: Numerical Measures Measures of Shape of a Distribution, Relative Location and Outliers Measures of Association between Two Variables Weighted Mean and Grouped Data
42

Descriptive Statistics: Numerical Measures Measures of Shape of a Distribution, Relative Location and Outliers Measures of Association between Two.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Descriptive Statistics: Numerical Measures Measures of Shape of a Distribution,

Relative Location and Outliers

Measures of Association between Two Variables

Weighted Mean and Grouped Data

Page 2: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Shape of A Distribution Depends On the Relative Location, and Outliers

Shape of a Distribution

z-Scores (Standardized Values)

Chebyshev’s Theorem Empirical Rule

Detecting Outliers

Page 3: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Distribution Shape: Skewness

An important measure of the shape of a distribution is called skewness.

The formula for computing the skewness of a data set is somewhat complex.

3

3)(

x

XES

-

Page 4: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Skeweness (S)

Is a measure of the asymmetry of a probability distribution

S=0: Symmetrical S>0: the distribution is right (positively)

skewed S<0: the distribution is left (negatively)

skewed

3

3)(

x

XES

-

Distribution Shape: Skewness

Page 5: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Distribution Shape: Skewness

Symmetric (not skewed)• Skewness is zero.• Mean and median are equal.

Rela

tive F

req

uen

cyR

ela

tive F

req

uen

cy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness = 0

Page 6: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Distribution Shape: Skewness Moderately Skewed Left

Skewness is negative. Mean will usually be less than the median.

Rela

tive F

req

uen

cyR

ela

tive F

req

uen

cy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness = - .31

Page 7: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Distribution Shape: Skewness

Highly Skewed Right• Skewness is positive (often above 1.0).• Mean will usually be more than the median.

Rela

tive F

req

uen

cyR

ela

tive F

req

uen

cy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness = 1.25

Page 8: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Is a measure of the relative location of the observation in a data set.

z-Score (Standardized Value)

zx xsii

Z-Score denotes the number of standard units (deviations) a given data value xi is located from the mean.

As a result, z-score is also called a standardized value.

Page 9: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

A data value less than the sample mean will always have a z-score less than zero.

A data value greater than the sample mean will always have a z-score greater than zero.

A data value equal to the sample mean will always have a z-score of zero.

z-Score (Standardized Value)

zx xsii

Page 10: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Chebyshev’s Theorem

A theorem that shows the position of a certain proportion of observation in any data set relative to the mean of the data when the values are standardized.

A theorem that shows the position of a certain proportion of observation in any data set relative to the mean of the data when the values are standardized.

Page 11: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

At least of the data values must be

within of the mean.

At least of the data values must be

within of the mean.

75%75%

z = 2 standard deviations z = 2 standard deviations

At least of the data values must be

within of the mean.

At least of the data values must be

within of the mean.

89%89%

z = 3 standard deviations z = 3 standard deviations

At least of the data values must be

within of the mean.

At least of the data values must be

within of the mean.

94%94%

z = 4 standard deviations z = 4 standard deviations

The theorem states that, for any data set---

Page 12: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

For a data with a bell-shaped distribution:

of the values of a normal random variable are within of its mean.

of the values of a normal random variable are within of its mean.68.26%68.26%

+/- 1 standard deviation+/- 1 standard deviation

of the values of a normal random variable are within of its mean.

of the values of a normal random variable are within of its mean.95.44%95.44%

+/- 2 standard deviations+/- 2 standard deviations

of the values of a normal random variable are within of its mean.

of the values of a normal random variable are within of its mean.99.72%99.72%

+/- 3 standard deviations+/- 3 standard deviations

Empirical Rule

Page 13: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Empirical Rule

xm – 3s m – 1s

m – 2sm + 1s

m + 2sm + 3sm

68.26%

95.44%99.72%

Page 14: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Z-Scores allow us to Detect Outliers

An outlier is an unusually small or unusually large value in a data set.

A data value with a z-score less than -3 or greater than +3 might be considered an outlier.

It might be the result of:• an incorrect recording• an incorrectly included value in the data set• a correctly recorded data value but an unusual

occurrence

Page 15: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Exploratory Data Analysis

Five-Number Summary and Box Plot

Page 16: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Five-Number Summary

1 Smallest Value

First Quartile

Median

Third Quartile

Largest Value

2

3

4

5

Page 17: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

375375

400400

425425

450450

475475

500500

525525

550550

575575

600600

625625

A box is drawn with its ends located at the first and third quartiles.

Box Plot

A vertical line is drawn in the box at the location of the median (second quartile).

Q1 = 445 Q3 = 525Q2 = 475

Page 18: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Measures of Association between Two Variables

CovarianceCorrelation Coefficient

Page 19: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Covariance

Page 20: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Covariance

Covariance between two random variables ( X and Y) is computed as follows:

Covariance between two random variables ( X and Y) is computed as follows:

forsamples

forpopulations

sx x y ynxy

i i

( )( )

1

xyi x i yx y

N

( )( )

Page 21: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Covariance

Positive values indicate a positive relationship. Positive values indicate a positive relationship.

Negative values indicate a negative relationship. Negative values indicate a negative relationship.

The covariance is a measure of the direction of movement and linear association between two variables. The covariance is a measure of the direction of movement and linear association between two variables.

Page 22: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Correlation Coefficient

Page 23: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Correlation Coefficient

However, it doesn’t indicate the causation. That is, just because two variables are highly correlated, it does not mean that one variable is the cause of the other.

However, it doesn’t indicate the causation. That is, just because two variables are highly correlated, it does not mean that one variable is the cause of the other.

Correlation is a measure of the degree of linear association between two variables. Correlation is a measure of the degree of linear association between two variables.

Page 24: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

The correlation coefficient is computed as follows:

The correlation coefficient is computed as follows:

forsamples

forpopulations

rs

s sxyxy

x y

xyxy

x y

Correlation Coefficient

Page 25: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Correlation Coefficient

Values near +1 indicate a strong positive linear relationship. Values near +1 indicate a strong positive linear relationship.

Values near -1 indicate a strong negative linear relationship. Values near -1 indicate a strong negative linear relationship.

The coefficient can take on values between -1 and +1.

Page 26: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

The correlation coefficient is computed as follows:

The correlation coefficient is computed as follows:

forsamples

rs

s sxyxy

x y

Correlation Coefficient

Page 27: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Weighted Mean Mean, Variance, and Standard Deviation of

Grouped Data

Page 28: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

You are taking five courses. The following table depicts the credit hours associated with each course and your grades. Compute your GPA for the semester?

Courses Credit Hours GradeCalculus 4 BPsychology 3 AMarketing 3 CEconomics 3 DStat 2 A

Page 29: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Weighted Mean

i i

i

wxx

w

where: xi = value of observation i

wi = weight for observation i

Page 30: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

You are taking five courses. The following table depicts the credit hours associated with each course and your grades. Compute your GPA for the semester?

Courses Credit Hours (Wi)

Grade WiX G

Calculus 4 B(3) 12Psychology 3 A(4) 12Marketing 3 C(2) 6Economics 3 D(1) 3Stat 2 A(4) 8

13 41

Page 31: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Weighted Mean

i i

i

wxx

w

where: xi = value of observation i

wi = weight for observation i

Page 32: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Weighted Mean

When the mean is computed by giving each data value a weight that reflects its importance, it is referred to as a weighted mean.

When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value.

Page 33: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Working with Grouped Data

Page 34: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Given below is a sample of monthly rents for 70 efficiency apartments.

Page 35: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Grouped Data

Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Page 36: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Computing the mean and variance of a grouped data

To compute the weighted mean from a grouped data we treat the midpoint of each class as though it were the mean of all items in the class.

We compute a weighted mean of the data using class midpoints and class frequencies as weights.

Similarly, in computing the variance and standard deviation, the class frequencies are used as weights.

Page 37: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Mean for Grouped Datai if M

xn

N

Mf ii

where: fi = frequency of class i Mi = midpoint of class i

Sample Data

Population Data

Page 38: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Given below is a sample of monthly rents for 70 efficiency apartments as grouped data--- in the form of a frequency distribution.

Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Sample Mean for Grouped Data

Page 39: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Sample Mean for Grouped Data

This approximationdiffers by $2.41 fromthe actual samplemean of $490.80.

34,525 493.21

70x

Rent ($) f i

420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Total 70

M i

429.5449.5469.5489.5509.5529.5549.5569.5589.5609.5

f iM i

3436.07641.55634.03916.03566.52118.01099.02278.01179.03657.034525.0

Page 40: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Variance for Grouped Data

sf M xn

i i22

1

( )

22

f M

Ni i( )

For sample data

For population data

Page 41: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

Rent ($) f i

420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Total 70

M i

429.5449.5469.5489.5509.5529.5549.5569.5589.5609.5

Sample Variance for Grouped Data

M i - x

-63.7-43.7-23.7-3.716.336.356.376.396.3116.3

f i(M i - x )2

32471.7132479.596745.97110.11

1857.555267.866337.13

23280.6618543.5381140.18

208234.29

(M i - x )2

4058.961910.56562.1613.76

265.361316.963168.565820.169271.76

13523.36

34,525 493.21

70x

Page 42: Descriptive Statistics: Numerical Measures  Measures of Shape of a Distribution, Relative Location and Outliers  Measures of Association between Two.

3,017.89 54.94s

s2 = 208,234.29/(70 – 1) = 3,017.89

This approximation differs by only $.20 from the actual standard deviation of $54.74.

Sample Variance for Grouped Data

Sample Variance

Sample Standard Deviation