Top Banner
IBS Statistics Year 1 [email protected] I.007
42

Lesson03

Jan 27, 2015

Download

Education

Ning Ding

Statistics for International Business School, Hanze University of Applied Science, Groningen, The Netherlands
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lesson03

IBS Statistics Year 1

[email protected]

Page 2: Lesson03

What we are going to learn?

• Review

• Chapter 3: Dispersion• Range• Variance (SD2)• Standard Deviation (SD)• Coefficient of variation (CV)

• Chapter 4: Displaying and exploring data• Dotplot• Stem-leaf• Boxplot• Skewness

Page 3: Lesson03

ReviewDiscrete counting or Continuous measuring

• Class size

• Age

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness • Temperature

• Sales volume

• Salary

• Height

• Weight

• Shoe size (NL)

Page 4: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Review

P46. N.30 Ch.2

25 = 32, 26 = 64, suggests 6 classes

i = 88.33> 571- 416

Use interval of 100

Constructing Frequency Distribution: Quantitative Data

45 observations

Page 5: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Review

P46. N.30 Ch.2

0relative

Class interval =

100

Page 6: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Review

P87 N.60 Ch.3

SCCoast, an Internet provider in the Southeast, developed the following frequency distribution on the age of Internet users. Describe the central tendency:

X = 2410 / 60 = 40.17 (years)

Central Tendency : Mean, Mode, Median

Mean: Average Median: Midpoint Mode: Most Frequency

Mode = 45 (years) Median = ? (years)

Page 7: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Review

Lm=(60+1)/2=30.5 Value:40 50

Location: 28 48

30.5

30.5-2848-28 =

M-4050-40

Median= 41.25

Step 1: Define the location of the median Step 2: Calculate the median

P87 N.60 Ch.3

M

Page 8: Lesson03

Chapter 3 Dispersion

Dispersion

Range

Interquartile Range

Variance (SD2) and Standard Deviation (SD)

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Coefficient of variation (CV)

Page 9: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Dispersion: – tells us about the spread of the data. – Help us to compare the spread in two or more

distributions.

Mean is not reliable

Chapter 3 Dispersion

Page 10: Lesson03

RangeRange:is the difference between the largest and the smallest

value in a data set.

Example:To find the range in 3,5,7,3,11

Range = 11-3 = 8

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 11: Lesson03

Variance

Population Variance:• is the mean of the squared difference between each

value and the mean. • overcomes the weakness of the range by using all the

values in the population.

Sample Variance:

Nμ)-Σ(X

=σ2

2

1-n)X-Σ(X

=s2

2

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 12: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Population Variance:N

μ)-Σ(X=σ

22

27

EXAMPLE – Variance and Standard Deviation

The number of traffic citations issued during the last five months in Beaufort County, South Carolina, is 38, 26, 13, 41, and 22. What is the population variance?

Step 1: Get the mean

Step 2: Find the difference between each observation and the mean

Step 3: Square the difference and sum up Step 4: Divided by N

Variance

Page 13: Lesson03

Population Standard Deviation:is the square root of the population variance.

Sample Standard Deviation:is the square root of the sample variance.

2σ=σ

2s=s

Standard DeviationReview

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 14: Lesson03

Example:The hourly wages earned by a sample of five students are:

€7, €5, €11, €8, €6. Find the variance and standard deviation.

Step 1: Get the meanStep 1: Get the mean

Step 2: Sum up the squared differences

Step 2: Sum up the squared differences

Step 3: Divided by N-1Step 3: Divided by N-1

Step 4: Square root itStep 4: Square root it

7.40=537

=nΣX

=X

( ) ( ) ( )

5.30=1-5

21.2=

1-57.4-6+...+7.4-7

=1-nX-XΣ

=s222

2

s = €2.30

The variance is €5.30; the standard deviation is €2.30.

Standard DeviationReview

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 15: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Schiphol Utrecht20 40 50 60 80 20 49 50 51 80

Compare

• The number of coffee sales in Utrecht Starbucks is more closely clustered around the mean of 50 than for the sales number in Schiphol

Starbucks.

Standard Deviation

Page 16: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Standard Deviation of Grouped Data

Step 2: Use f * (M-Xmean)2

Step 3: Sum up

P87 N.60 Ch.3

Step 1: Find the Midpoint

Step 4: Divided by N-1 709860-1

Step 5: Square root it 709860-1

= 10.97

Page 17: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Coefficient of VariationThis is the ratio of the standard deviation to the mean:

The coefficient of variation describes the magnitude sample values and the variation within them. 

The following times were recorded by the quarter-mile and mile runners of a university

track team (times are in minutes).

Quarter-Mile Times: 0.92 0.98 1.040.90 0.99

Mile Times: 4.52 4.35 4.60 4.704.50

After viewing this sample of running times, one of the coaches commented that the quarter milers turned in the more consistent times. Calculate the appropriate measure to check this and comment on the coach’s statement.We can compare the dispersion with the coefficient of variation because they have different “magnitudes”.

Page 18: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

The following times were recorded by the quarter-mile and mile runners of a university

track team (times are in minutes).

Quarter-Mile Times: 0.92 0.98 1.040.90 0.99

Mile Times: 4.52 4.35 4.60 4.704.50

After viewing this sample of running times, one of the coaches commented that the quarter milers turned in the more consistent times. Calculate the appropriate measure to check this and comment on the coach’s statement.We can compare the dispersion with the coefficient of variation because they have different “magnitudes”.

Coefficient of variation of Q-Mile Times is: 0.05639/0.966=0.05837==>6%Coefficient of variation of Mile Times is: 0.12954/4.534=0.02857==>3%No, the mile-time team showed more consistent times.

Page 19: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Chapter 4 Displaying and Exploring Data

Dot plots:

Page 20: Lesson03

Stem-and-Leaf Displays:Each numerical value is divided into two parts. The leading

digit(s) becomes the stem and the trailing digit the leaf. The stems are located along the vertical axis, and the leaf values are stacked against each other along the horizontal axis.

Leaf

Stem

Chapter 4 Displaying and Exploring DataReview

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 21: Lesson03

Stem-and-Leaf Displays:

Chapter 4 Displaying and Exploring DataReview

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 22: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Quartiles, Deciles, and PercentilesAlternative ways of describing spread of data include determining thelocation of values that divide a set of observations into equal parts.

Chapter 4 Displaying and Exploring Data

Page 23: Lesson03

Chapter 4 Displaying and Exploring Data

Quartiles, Deciles, and Percentiles

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 24: Lesson03

Chapter 4 Displaying and Exploring Data

Quartiles, Deciles, and Percentiles

95 1 25 100

93 1 24 96

88 2 23 92

85 3 21 84

79 1 18 72

75 4 17 68

70 6 13 52

65 2 7 28

62 1 5 20

58 1 4 16

54 2 3 12

50 1 1 4

N = 25

Raw PercentileScore Frequency Frequency Rank

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 25: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Quartiles, Deciles, and Percentiles

43 61 9175101 104

Example:

Chapter 4 Displaying and Exploring Data

The first quartile is ?

Page 26: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Organize the data from lowest to largest valueStep 1:Step 1:

43 61 9175101 104

L25 = (n+1) = (6+1) =1.75 Step 2:Step 2: P100

25100

P1 P2 P3 P4 P5 P6

Draw two linesStep 3:Step 3:

43 6161-43 = 18

P1.75

P1 P20.75

Quartiles, Deciles, and Percentiles

Chapter 4 Displaying and Exploring Data

Page 27: Lesson03

Draw two linesStep 3:Step 3:

43 6161-43 = 18

P1 P20.75 * 18 = 13.5

43+13.5 = 56.5

The first quartile is 56.5.

Quartiles, Deciles, and Percentiles

Chapter 4 Displaying and Exploring DataReview

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 28: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Listed below, ordered from smallest to largest, are the number of visits last week.

The median is 58.

a. Determine the median number of calls.

Q1 = 51.25 Q3 = 66.00

b. Determine the first and third quartiles.

P110. N.14 Ch.4

Exercise

Page 29: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Listed below, ordered from smallest to largest, are the number of visits last week.

D1 = 45.30 D9 = 76.40

c. Determine the first decile and the ninth decile.

P33 = 53.53

d. Determine the 33rd percentile.

P110. N.14 Ch.4

Exercise

Page 30: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Box Plots

A graphical display, based on quartiles to visualize a set of data.

Chapter 4 Displaying and Exploring Data

minimumminimum Q1Q1 MedianMedian Q3Q3 maximummaximum

Page 31: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Box Plots

Chapter 4 Displaying and Exploring Data

minimumminimum Q1Q1 MedianMedian Q3Q3 maximummaximum

Page 32: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Box Plots & Cumulative Frequency Distribution

Chapter 4 Displaying and Exploring Data

minimumminimum Q1Q1 MedianMedian Q3Q3 maximummaximum

Page 33: Lesson03

Chapter 4 Displaying and Exploring Data

minimumminimum Q1Q1 MedianMedian Q3Q3 maximummaximum

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 34: Lesson03

skewed

Skewness:

Another characteristic of a set of data is the shape.

• symmetric, • positively skewed, • negatively skewed, • bimodal.

Chapter 4 Displaying and Exploring DataReview

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 35: Lesson03

Zero skewness

mode=median=mean

Zero skewness

mode=median=mean

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Chapter 4 Displaying and Exploring Data

Page 36: Lesson03

positive skewness

Mode median mean

positive skewness

Mode median mean

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Chapter 4 Displaying and Exploring Data

Page 37: Lesson03

negative skewness

Mode median mean

negative skewness

Mode median mean

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Chapter 4 Displaying and Exploring Data

Page 38: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Chapter 4 Displaying and Exploring Data

Page 39: Lesson03

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Skewness: • symmetric, • positively skewed, • negatively skewed, • bimodal.

Chapter 4 Displaying and Exploring Data

Page 40: Lesson03

ExerciseA sample of 28 time shares in the Orlando, Florida, area revealed the following daily charges for a one-bedroom suite. For convenience the data are ordered from smallest to largest. Construct a box plot to represent the data. Comment on the distribution. Be sure to identify the first and third quartiles and the median.

• The median is $253.

• About 25% of the semi-private rooms are less than $214 and 25% above $304.

• The distribution is negatively skewed.

P113. N.18 Ch.4

Review

Chapter 3: Dispersion–Range–Variance (SD2)–Standard Deviation (SD)

–Coefficient of variation (CV)

Chapter 4: Displaying and exploring data–Dotplot–Stem-leaf–Boxplot–Skewness

Page 41: Lesson03

• Review

• Chapter 3: Dispersion• Range• Variance (SD2)• Standard Deviation (SD)• Coefficient of variation (CV)

• Chapter 4: Displaying and exploring data• Dotplot• Stem-leaf• Boxplot• Skewness

What we have learnt today?

Page 42: Lesson03