Top Banner
1 Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School
43

1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

Jan 18, 2016

Download

Documents

Vanessa Higgins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

1

Descriptive StatisticsDescriptive Statistics

Ernesto Diaz Faculty – MathematicsRedwood High School

Page 2: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

2

Basic Concepts

In statistics a population, includes all of the items of interest, and a sample, includes some of the items in the population.The study of statistics can be divided into two main areas. Descriptive statistics, has to do with collecting, organizing, summarizing, and presenting data (information). Inferential statistics, has to do with drawing inferences or conclusions about populations based on information from samples.

Page 3: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

3

Basic Concepts

Information that has been collected but not yet organized or processed is called raw data. It is often quantitative (or numerical), but can also be qualitative (or nonnumerical).

Page 4: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

4

Basic Concepts

Quantitative data: The number of siblings in ten different families: 3, 1, 2, 1, 5, 4, 3, 3, 8, 2

Qualitative data: The makes of five different automobiles: Toyota, Ford, Nissan, Chevrolet, HondaQuantitative data can be sorted in mathematical order. The number siblings can appear as 1, 1, 2, 2, 3, 3, 3, 4, 5, 8

Page 5: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

5© 2008 Pearson Addison-Wesley. All rights reserved

Measures of Central Tendency

Mean Median Mode Symmetry in Data Sets

Page 6: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

6© 2008 Pearson Addison-Wesley. All rights reserved

Mean

The mean (more properly called the arithmetic mean) of a set of data items is found by adding up all the items and then dividing the sum by the number of items. (The mean is what most people associate with the word “average.”)

The mean of a sample is denoted (read “x bar”), while the mean of a complete population is denoted (the lower case Greek letter mu).

x

Page 7: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

7© 2008 Pearson Addison-Wesley. All rights reserved

Mean

The mean of n data items x1, x2,…, xn, is given by the formula

.x

xn

We use the symbol for “summation,” (the Greek letter sigma).

1 2 nx x x x

Page 8: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

8

Example-find the mean Find the mean amount of money

parents spent on new school supplies and clothes if 5 parents randomly surveyed replied as follows: $327 $465 $672 $150 $230

$327 $465 $672 $150 $230

5$1844

$368.805

xx

n

Page 9: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

9© 2008 Pearson Addison-Wesley. All rights reserved

Weighted Mean

The weighted mean of n numbers x1, x2,…, xn, that are weighted by the respective factors f1, f2,…, fn is given by the formula

.x f

wf

Page 10: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

10© 2008 Pearson Addison-Wesley. All rights reserved

Median

Another measure of central tendency, which is not so sensitive to extreme values, is the median. This measure divides a group of numbers into two parts, with half the numbers below the median and half above it.

Page 11: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

11© 2008 Pearson Addison-Wesley. All rights reserved

Median

To find the median of a group of items:

Step 1 Rank the items.Step2 If the number of items is odd, the median is the middle item in the list.Step 3 If the number of items is even, the median is the mean of the two middle numbers.

Page 12: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

12© 2008 Pearson Addison-Wesley. All rights reserved

Example: Median

Solution

Ten students in a math class were polled as to the number of siblings in their individual families and the results were: 3, 2, 2, 1, 1, 6, 3, 3, 4, 2.

Find the median number of siblings for the ten students.

In order: 1, 1, 2, 2, 2, 3, 3, 3, 4, 6

Median = (2+3)/2 = 2.5

Page 13: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

13© 2008 Pearson Addison-Wesley. All rights reserved

Mode

The mode of a data set is the value that occurs the most often.

Sometimes, a distribution is bimodal (literally, “two modes”). In a large distribution, this term is commonly applied even when the two modes do not have exactly the same frequency

Page 14: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

14© 2008 Pearson Addison-Wesley. All rights reserved

Example: Mode for a Set

Solution

Ten students in a math class were polled as to the number of siblings in their individual families and the results were: 3, 2, 2, 1, 3, 6, 3, 3, 4, 2.

Find the mode for the number of siblings.

3, 2, 2, 1, 3, 6, 3, 3, 4, 2

The mode for the number of siblings is 3.

Page 15: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

15© 2008 Pearson Addison-Wesley. All rights reserved

Example: Mode for Distribution

SolutionThe mode is 5 since it has the highest frequency (8).

Find the median for the distribution.

Value 1 2 3 4 5

Frequency 4 3 2 6 8

Page 16: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

16

Measures of Position Measures of position are often

used to make comparisons. Two measures of position are

percentiles and quartiles.

Page 17: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

17

To Find the Quartiles of a Set of Data

Order the data from smallest to largest.

Find the median, or 2nd quartile, of the set of data. If there are an odd number of pieces of data, the median is the middle value. If there are an even number of pieces of data, the median will be halfway between the two middle pieces of data.

Page 18: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

18

To Find the Quartiles of a Set of Data continued

The first quartile, Q1, is the median of the lower half of the data; that is, Q1, is the median of the data less than Q2.

The third quartile, Q3, is the median of the upper half of the data; that is, Q3 is the median of the data greater than Q2.

Page 19: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

19

Example: Quartiles The weekly grocery bills for 23

families are as follows. Determine Q1, Q2, and Q3.

170 210 270 270 280330 80 170 240 270225 225 215 310 5075 160 130 74 8195 172 190

Page 20: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

20

Example: Quartiles continued Order the data:

50 75 74 80 81 95 130160 170 170 172 190 210 215225 225 240 270 270 270 280310 330

Q2 is the median of the entire data set which is 190.Q1 is the median of the numbers from 50 to 172 which is 95.Q3 is the median of the numbers from 210 to 330 which is 270.

Page 21: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

21

Measures of Dispersion

Page 22: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

22

Measures of Dispersion

Sometimes we want to look at a measure of dispersion, or spread, of data. Two of the most common measures of dispersion are the range and the standard deviation.

Page 23: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

23

Measures of Dispersion Measures of dispersion are used to

indicate the spread of the data.

The range is the difference between the highest and lowest values; it indicates the total spread of the data.

Page 24: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

24

Example: Range Nine different employees were

selected and the amount of their salary was recorded. Find the range of the salaries.

$24,000 $32,000 $26,500$56,000 $48,000

$27,000$28,500 $34,500

$56,750 Range = $56,750 $24,000 =

$32,750

Page 25: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

© 2008 Pearson Addison-Wesley. All rights reserved

Standard Deviation

One of the most useful measures of dispersion, the standard deviation, is based on deviations from the mean of the data.

Page 26: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

© 2008 Pearson Addison-Wesley. All rights reserved

Example: Deviations from the Mean

Solution

Data Value 1 2 8 11 13

Deviation

–6 –5 1 4 6

Find the deviations from the mean for all data values of the sample 1, 2, 8, 11, 13.

The mean is 7. Subtract to find deviation.

The sum of the deviations for a set is always 0.

13 – 7 = 6

Page 27: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

© 2008 Pearson Addison-Wesley. All rights reserved

Standard Deviation

The variance is found by summing the squares of the deviations and dividing that sum by n – 1 (since it is a sample instead of a population). The square root of the variance gives a kind of average of the deviations from the mean, which is called a sample standard deviation. It is denoted by the letter s. (The standard deviation of a population is denoted the lowercase Greek letter sigma.)

,

Page 28: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

28

Standard Deviation The standard deviation

measures how much the data differ from the mean.

21

x xs

n

Page 29: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

© 2008 Pearson Addison-Wesley. All rights reserved

Calculation of Standard Deviation

The individual steps involved in this calculation are as follows

Step 1 Calculate the mean of the numbers.Step 2 Find the deviations from the mean.Step 3 Square each deviation.Step 4 Sum the squared deviations.Step 5 Divide the sum in Step 4 by n – 1. Step 6 Take the square root of the quotient

in Step 5.

Page 30: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

© 2008 Pearson Addison-Wesley. All rights reserved

Example

Find the standard deviation of the sample 1, 2, 8, 11, 13.

Data Value

1 2 8 11 13

Deviation –6 –5 1 4 6

(Deviation)2

36 25 1 16 36

The mean is 7.

Sum = 36 + 25 + 1 + 16 + 36 = 114

Solution

Page 31: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

© 2008 Pearson Addison-Wesley. All rights reserved

Example

114 11422.8.

6 1 5

Solution (continued)

Divide by n – 1 with n = 6:

Take the square root:

22.8 4.77.

Page 32: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

32

Example Find the standard deviation of the

following prices of selected washing machines: $280, $217, $665, $684, $939, $299Find the mean.

665 217 684 280 939 299 3084514

6 6

xx

n

Page 33: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

33

Example continued, mean = 514

421,5160

180,625425939

28,900170684

22,801151665

46,225-215299

54,756-234280

(-297)2 = 88,209

-297217

(Data - Mean)2 Data - MeanData

Page 34: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

34

Example continued, mean = 514

The standard deviation is $290.35.

421,516

6 1

421,516290.35

5

s

s

Page 35: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

35

Interpreting Measures of Dispersion

A main use of dispersion is to compare the amounts of spread in two (or more) data sets. A common technique in inferential statistics is to draw comparisons between populations by analyzing samples that come from those populations.

Page 36: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

36

Example: Interpreting Measures

Two companies, A and B, sell small packs of sugar for coffee. The mean and standard deviation for samples from each company are given below. Which company consistently provides more sugar in their packs? Which company fills its packs more consistently?

Company A Company B

1.013 tspAx 1.007 tspBx

.0021As .0018Bs

Page 37: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

37

Example: Interpreting Measures

SolutionWe infer that Company A most likely provides more sugar than Company B (greater mean).

We also infer that Company B is more consistent than Company A (smaller standard deviation).

Page 38: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

38© 2008 Pearson Addison-Wesley. All rights reserved

Symmetry in Data Sets

The most useful way to analyze a data set often depends on whether the distribution is symmetric or non-symmetric. In a “symmetric” distribution, as we move out from a central point, the pattern of frequencies is the same (or nearly so) to the left and right. In a “non-symmetric” distribution, the patterns to the left and right are different.

Page 39: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

39© 2008 Pearson Addison-Wesley. All rights reserved

Some Symmetric Distributions

Page 40: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

40© 2008 Pearson Addison-Wesley. All rights reserved

Non-symmetric Distributions

A non-symmetric distribution with a tail extending out to the left, shaped like a J, is called skewed to the left. If the tail extends out to the right, the distribution is skewed to the right.

Page 41: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

41© 2008 Pearson Addison-Wesley. All rights reserved

Some Non-symmetric Distributions

Page 42: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

© 2008 Pearson Addison-Wesley. All rights reserved

Chebyshev’s Theorem

For any set of numbers, regardless of how they are distributed, the fraction of them that lie within k standard deviations of their mean (where k > 1) is at least

2

11

.k

Page 43: 1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.

© 2008 Pearson Addison-Wesley. All rights reserved

Example: Chebyshev’s Theorem

What is the minimum percentage of the items in a data set which lie within 3 standard deviations of the mean?

Solution

2

1 1 81 1 .889 88.9%.

9 93

With k = 3, we calculate